A methodological framework for examining sustainability risk: the four data domains, the analytical traps, the KPIs that aren't on a P&L, and the public datasets that let you reconstruct an entity's environmental performance from outside.
A sustainability audit is the disciplined examination of an entity's environmental performance against three things: its regulatory obligations, its stated commitments, and the trajectory implied by its current operations. It is not the same as a materiality assessment, and the two are routinely confused.
A materiality assessment determines which sustainability topics affect an entity's financial prospects. It is forward-looking, framed in terms of risk to enterprise value, and the output is a list of disclosure topics. The dominant frameworks are SASB Standards, IFRS S1 and S2, and the European Sustainability Reporting Standards.
A sustainability audit, by contrast, is operational. It examines what the entity actually did over a reporting period, against what it should have done, and translates the gap into financial terms. The output is a defensible statement about whether the entity met its obligations, what its forward exposure is, and which of its disclosed numbers can be trusted.
The two pair naturally. Materiality tells the auditor where to look. The audit confirms what is actually there. An auditor who runs the second without first running the first will spend disproportionate time on topics that don't move enterprise value. An analyst who relies only on the first, and never tests the underlying data, is building a thesis on disclosed numbers that may be unverifiable.
This page lays out the working discipline: the four data domains an audit examines, the interpretation problems that distinguish non-financial data from financial data, the KPIs that matter beyond what shows up on a P&L, and a worked example that ties everything together using NYC's published datasets.
Most sustainability disclosures fall into one of four categories. Each has its own units, its own measurement standard, and its own characteristic blind spots.
Each domain produces data of different quality. Energy use is the most reliable, because utility meters produce continuous, audited records. Scope 1 GHG is mostly reliable, since it is calculated from energy use using established emission factors. Scope 2 is sensitive to methodology: location-based and market-based reporting can produce numbers that differ by an order of magnitude for the same operations. Scope 3 is largely modeled, often using industry-average emission factors applied to spend data, and should be read with substantial uncertainty bounds.
Financial statements have a single boundary, a single vintage, and a single accounting standard. Sustainability data has none of those things, and treating it as if it does is the most common analytical mistake.
The GHG Protocol allows entities to report under either an operational control boundary (everything they operate) or an equity share boundary (everything proportional to ownership). The same entity can produce two emissions totals that differ by 30 to 50 percent depending on which boundary it elects. Audits must verify which boundary an entity uses and confirm it is applied consistently across reporting years.
An entity may report Scope 1 and 2 for fiscal year 2025 alongside Scope 3 for fiscal year 2023, because Scope 3 typically requires twelve to eighteen months for value-chain data to settle. This is not concealment, but it does mean a single disclosure is rarely a single snapshot. Auditors should pull the as-of date for each metric separately.
Where measured data is unavailable, entities use proxies: industry-average emission factors applied to spend, or modeled emissions per dollar of revenue. These are defensible but they obscure operational performance. A company that reduces actual emissions by 20 percent while revenue holds steady will show flat reported emissions if it is using a spend-based proxy. The audit signal is the gap between modeled and measured numbers, where both are available.5
A building's EUI rises in cold winters and falls in hot summers without any change in performance, because heating systems consume more energy than cooling systems for the same temperature differential. Weather-normalized EUI, computed against a typical meteorological year, is the comparable metric. Occupancy-adjusted metrics matter for office buildings post-2020: a building running at 60 percent occupancy with 90 percent of its pre-pandemic energy use is performing worse than the raw EUI suggests.
A voluntary CDP response, a CSRD filing, an SEC climate disclosure, and an LL84 benchmarking submission can each contain different values for the same underlying operations. The differences come from boundary choices, methodology elections, and the definitional scope of each framework. The first job of the audit is to determine which number is the operational truth and which numbers are framework artifacts.
Revenue, EBITDA, and operating margin tell you nothing about transition risk. The KPIs that do are the ones that translate environmental exposure into financial language without losing the underlying physics.
| KPI | What it measures | What it reveals |
|---|---|---|
| Carbon intensity per revenuetCO2e / $M revenue | Total emissions divided by revenue. | Whether emissions reductions are real or just a function of business shrinkage. Decoupling from revenue growth is the signal. |
| Climate capex share% of total capex | Capital spending on retrofits, electrification, and physical adaptation as a share of total capex. | Whether stated transition commitments are being funded. A company with net-zero targets and 2 percent climate capex is unserious. |
| Penalty exposure% of operating margin | Annualized regulatory penalty risk under existing laws (LL97, EPR, carbon pricing). | Whether sustainability is a margin problem or a tail risk. Above 5 percent is material. |
| Disclosed-vs-modeled gap% deviation | Difference between an entity's disclosed emissions and an independent model based on industry averages. | Quality of the reporting infrastructure. Large positive or negative gaps both warrant scrutiny. |
| EUI trajectory% YoY change, weather-adjusted | Year-over-year change in weather-normalized energy use intensity. | Whether building performance is actually improving or whether reported gains are weather-driven. |
| Stranding distanceyears to cap breach | Years of operation at current carbon intensity before the building or asset exceeds its applicable regulatory cap. | Forward-looking exposure under LL97 or analogous regimes. Useful for valuation discounts on commercial real estate. |
None of these are exotic. Each can be calculated from publicly available data plus the entity's own financial filings. The discipline is in computing them consistently and using them as a cross-check against narrative disclosures, not as a substitute.
A 250,000 square foot pre-war office building in Midtown Manhattan. The owner has not retrofitted recently. The audit question: what is this building's forward LL97 exposure, and what does the gap look like in dollar terms?
This example uses one building. The same methodology applies to a portfolio, an industry, or a counterparty. The data sources change, the unit economics change, but the discipline of pulling primary records, comparing to peer norms, computing a gap, and translating to dollars holds across every sustainability audit.
An audit is only as good as the data behind it. These are the public datasets that allow an outside party to reconstruct most of an NYC entity's environmental performance, and the limitations that go with each.
| Dataset | What it covers | What it can't tell you |
|---|---|---|
| NYC Open Data: LL84 Benchmarking (2023 to Present) | Site EUI, Source EUI, ENERGY STAR score, GHG intensity for all NYC buildings 25,000 sf and larger. One row per BIN (Building Identification Number) per reporting year. | Self-reported through Portfolio Manager. Roughly 10 to 15 percent of submissions contain entry errors that DOB later flags. Pre-cleaning required before analytic use. |
| DOB Sustainability Laws Covered Buildings List | The official list of buildings subject to LL97, with assigned occupancy class and emissions limits. Updated each March. | Does not include actual emissions; that data lives in the BEAM portal and is not fully public at the building level. |
| EPA ENERGY STAR Portfolio Manager | National benchmarking infrastructure. Reference medians for over 80 building types, climate-normalized. | Building-specific data is private to the owner; only aggregate medians are public. |
| Dataset | What it covers | What it can't tell you |
|---|---|---|
| EPA FLIGHT (GHGRP) | Facility-level GHG emissions for all US facilities emitting 25,000+ tCO2e/yr. Broken out by source category and gas. | Threshold-gated: small and mid-size facilities are not in the dataset. Does not include Scope 2 or Scope 3 separately. |
| EIA Form 861 | Utility-level retail sales, customer counts, and revenue by state and customer class. Annual. | Not building-level. Useful for sector-level analysis and for estimating utility-specific emission factors. |
| EPA eGRID | Power plant emission rates by NERC region and state. The standard reference for grid emission factors used in Scope 2 calculations. | Updated on a 12 to 18 month lag. Recent grid changes (renewables additions, plant retirements) are not reflected immediately. |
| Dataset | What it covers | What it can't tell you |
|---|---|---|
| NYSERDA program data | Participation in commercial efficiency programs, Beneficial Electrification incentives, and retrofit financing. Aggregate and project-level reports. | Lags actual project completion by 6 to 12 months. |
| NYS DEC Statewide GHG Emissions Report | Statewide GHG inventory by sector and source. Used as the baseline for the Climate Leadership and Community Protection Act targets. | State-level, not entity-level. |
| Dataset | What it covers | What it can't tell you |
|---|---|---|
| CDP Climate Change Responses | Voluntary corporate climate disclosures. Scope 1, 2, 3, governance, targets, and risk assessments. Searchable by company. | Self-reported and unaudited. Coverage is best for large public companies; small private companies are largely absent. |
| SBTi Target Database | Companies with science-based emissions targets, validated by the Science Based Targets initiative. Includes target year and scope. | Target setting, not progress. Many companies on the list are off-track. |
| Dataset | What it covers | What it can't tell you |
|---|---|---|
| DSNY Operational Reports | Tonnage of refuse, recycling, and organics collected citywide and by district. Annual. | Residential focus. Commercial waste data is more limited and largely lives with the carters. |
| DEP Water Quality and Use Reports | Citywide water consumption, sewer load, and quality monitoring. | Building-level water use data is in LL84 but is incomplete; not all buildings are required to report water. |
Pulling these datasets is the easy part. The hard part is normalization: aligning vintages, harmonizing units, mapping building IDs to entities, and reconciling emission factors across sources. A clean cross-source dataset is itself an analytical asset.
What separates a useful sustainability audit from a checkbox audit is not the dataset, the framework, or the software. It is the disposition the auditor brings to the work.
Treat disclosed numbers as hypotheses, not facts. Every metric in a sustainability report represents a chain of decisions about boundaries, vintages, and methodology. The audit's job is to identify which decisions were made and whether they hold up under stress. Numbers that appear precise to four significant figures should be tested at the second.
Model alternative scenarios as a matter of course. A single point estimate of forward emissions or penalty exposure is rarely defensible. The discipline is to model a base case, an upside, and a downside, with the assumptions that drive the spread documented explicitly. Where data is modeled rather than measured, this is the only intellectually honest output.
Be specific about uncertainty. Scope 3 emissions estimates often carry 50 percent confidence intervals. Reporting them as point estimates implies a precision that does not exist. Reporting them as ranges, with the methodological basis named, allows downstream readers to weight the data appropriately.
Cross-check disclosed against modeled. The most diagnostic single test in a sustainability audit is the gap between an entity's disclosed emissions and an independent model. Large gaps in either direction warrant investigation. A company that reports much lower emissions than the model predicts may have genuinely better operations, or may be using boundary choices to exclude inconvenient sources. A company that reports much higher emissions may be conservative, or may be including activities that competitors exclude.
Document the gap between the spirit and the letter. An entity can be technically compliant with every disclosure framework and still be evading the substance of what those frameworks are trying to capture. The audit serves the underlying question, not the framework. When the framework and the question diverge, the auditor's job is to flag the divergence, not paper over it.
Sustainability auditing is a practice that has not yet stabilized in the way that financial auditing has. The standards are still evolving, the datasets are still incomplete, and the methodology is still being argued out in literature and in practice. The auditors who will matter most over the next decade are the ones who can hold rigor and humility at the same time: rigorous in method, humble about the limits of what the data can support.
All datasets, methodologies, and figures referenced are publicly available. Specific values used in the worked example are illustrative.