Auditing the things money doesn't measure.

A methodological framework for examining sustainability risk: the four data domains, the analytical traps, the KPIs that aren't on a P&L, and the public datasets that let you reconstruct an entity's environmental performance from outside.

Financial accounting tells you what was earned. Sustainability accounting tells you what was burned, emitted, used, and obligated.

132 kBtu/sf
Median Site EUI for NYC office buildings under LL84
Kontokosta / NYC LL84 dataset
2.8x
Source-to-site energy multiplier for grid electricity (typical)
EPA ENERGY STAR Portfolio Manager
~85%
Share of typical retailer's emissions that sit upstream in Scope 3
CDP / WRI GHG Protocol
$268/ton
Translation rate from a ton of unmodeled CO2 to a regulatory liability under LL97
NYC DOB
On this page
  1. 01The discipline: what a sustainability audit actually is
  2. 02The four data domains
  3. 03Why non-financial data is harder to read
  4. 04KPIs beyond the P&L
  5. 05A worked example: auditing one NYC building
  6. 06The dataset map
  7. 07The analytical disposition

The discipline.

A sustainability audit is the disciplined examination of an entity's environmental performance against three things: its regulatory obligations, its stated commitments, and the trajectory implied by its current operations. It is not the same as a materiality assessment, and the two are routinely confused.

A materiality assessment determines which sustainability topics affect an entity's financial prospects. It is forward-looking, framed in terms of risk to enterprise value, and the output is a list of disclosure topics. The dominant frameworks are SASB Standards, IFRS S1 and S2, and the European Sustainability Reporting Standards.

A sustainability audit, by contrast, is operational. It examines what the entity actually did over a reporting period, against what it should have done, and translates the gap into financial terms. The output is a defensible statement about whether the entity met its obligations, what its forward exposure is, and which of its disclosed numbers can be trusted.

The two pair naturally. Materiality tells the auditor where to look. The audit confirms what is actually there. An auditor who runs the second without first running the first will spend disproportionate time on topics that don't move enterprise value. An analyst who relies only on the first, and never tests the underlying data, is building a thesis on disclosed numbers that may be unverifiable.

This page lays out the working discipline: the four data domains an audit examines, the interpretation problems that distinguish non-financial data from financial data, the KPIs that matter beyond what shows up on a P&L, and a worked example that ties everything together using NYC's published datasets.

The four data domains.

Most sustainability disclosures fall into one of four categories. Each has its own units, its own measurement standard, and its own characteristic blind spots.

Domain 01
Energy use
Total energy consumed by an entity's operations, broken out by fuel type. The headline metric is Energy Use Intensity (EUI): total energy divided by gross floor area. Site EUI counts energy delivered to the building. Source EUI accounts for upstream generation and transmission losses, with a typical multiplier of about 2.8 to 3.0 for grid electricity.1
Units: kBtu/sf/yr · kWh · therms · MMBtu
Domain 02
GHG emissions
Greenhouse gas emissions broken into three scopes. Scope 1: direct emissions from sources the entity owns or controls. Scope 2: indirect emissions from purchased electricity, heat, or steam. Scope 3: all other indirect emissions across the value chain, including purchased goods and services, business travel, and the use of sold products. Scope 3 typically dominates totals and is the most poorly measured.2
Units: metric tons CO2-equivalent (tCO2e)
Domain 03
Water and waste
Water consumption and waste generation, both typically reported as intensity ratios (per square foot, per employee, per dollar of revenue). Diversion rates measure the share of waste sent to recycling, composting, or reuse rather than landfill. Composition matters: a 17 percent organics share in the waste stream behaves differently economically than a 17 percent paper share.3
Units: gal/sf · tons/yr · diversion rate (%)
Domain 04
Building performance
Composite metrics that combine energy data with normalization for building characteristics. The ENERGY STAR score rates a building from 1 to 100 against peers of similar use type, climate zone, and operational profile. A score of 50 is the median, 75 or above qualifies for ENERGY STAR certification. NYC's letter-grade label (A through D, plus N) is derived from this score.4
Units: 1 to 100 score · A/B/C/D letter grade

Each domain produces data of different quality. Energy use is the most reliable, because utility meters produce continuous, audited records. Scope 1 GHG is mostly reliable, since it is calculated from energy use using established emission factors. Scope 2 is sensitive to methodology: location-based and market-based reporting can produce numbers that differ by an order of magnitude for the same operations. Scope 3 is largely modeled, often using industry-average emission factors applied to spend data, and should be read with substantial uncertainty bounds.

Why non-financial data is harder to read.

Financial statements have a single boundary, a single vintage, and a single accounting standard. Sustainability data has none of those things, and treating it as if it does is the most common analytical mistake.

Boundaries are negotiable

The GHG Protocol allows entities to report under either an operational control boundary (everything they operate) or an equity share boundary (everything proportional to ownership). The same entity can produce two emissions totals that differ by 30 to 50 percent depending on which boundary it elects. Audits must verify which boundary an entity uses and confirm it is applied consistently across reporting years.

Vintages don't line up

An entity may report Scope 1 and 2 for fiscal year 2025 alongside Scope 3 for fiscal year 2023, because Scope 3 typically requires twelve to eighteen months for value-chain data to settle. This is not concealment, but it does mean a single disclosure is rarely a single snapshot. Auditors should pull the as-of date for each metric separately.

Proxy data masks performance

Where measured data is unavailable, entities use proxies: industry-average emission factors applied to spend, or modeled emissions per dollar of revenue. These are defensible but they obscure operational performance. A company that reduces actual emissions by 20 percent while revenue holds steady will show flat reported emissions if it is using a spend-based proxy. The audit signal is the gap between modeled and measured numbers, where both are available.5

Seasonality and occupancy distort raw numbers

A building's EUI rises in cold winters and falls in hot summers without any change in performance, because heating systems consume more energy than cooling systems for the same temperature differential. Weather-normalized EUI, computed against a typical meteorological year, is the comparable metric. Occupancy-adjusted metrics matter for office buildings post-2020: a building running at 60 percent occupancy with 90 percent of its pre-pandemic energy use is performing worse than the raw EUI suggests.

Disclosed numbers are not the same as reported numbers

A voluntary CDP response, a CSRD filing, an SEC climate disclosure, and an LL84 benchmarking submission can each contain different values for the same underlying operations. The differences come from boundary choices, methodology elections, and the definitional scope of each framework. The first job of the audit is to determine which number is the operational truth and which numbers are framework artifacts.

KPIs beyond the P&L.

Revenue, EBITDA, and operating margin tell you nothing about transition risk. The KPIs that do are the ones that translate environmental exposure into financial language without losing the underlying physics.

KPIWhat it measuresWhat it reveals
Carbon intensity per revenuetCO2e / $M revenue Total emissions divided by revenue. Whether emissions reductions are real or just a function of business shrinkage. Decoupling from revenue growth is the signal.
Climate capex share% of total capex Capital spending on retrofits, electrification, and physical adaptation as a share of total capex. Whether stated transition commitments are being funded. A company with net-zero targets and 2 percent climate capex is unserious.
Penalty exposure% of operating margin Annualized regulatory penalty risk under existing laws (LL97, EPR, carbon pricing). Whether sustainability is a margin problem or a tail risk. Above 5 percent is material.
Disclosed-vs-modeled gap% deviation Difference between an entity's disclosed emissions and an independent model based on industry averages. Quality of the reporting infrastructure. Large positive or negative gaps both warrant scrutiny.
EUI trajectory% YoY change, weather-adjusted Year-over-year change in weather-normalized energy use intensity. Whether building performance is actually improving or whether reported gains are weather-driven.
Stranding distanceyears to cap breach Years of operation at current carbon intensity before the building or asset exceeds its applicable regulatory cap. Forward-looking exposure under LL97 or analogous regimes. Useful for valuation discounts on commercial real estate.

None of these are exotic. Each can be calculated from publicly available data plus the entity's own financial filings. The discipline is in computing them consistently and using them as a cross-check against narrative disclosures, not as a substitute.

A worked example.

A 250,000 square foot pre-war office building in Midtown Manhattan. The owner has not retrofitted recently. The audit question: what is this building's forward LL97 exposure, and what does the gap look like in dollar terms?

Audit walkthrough
Modeling LL97 exposure for one Midtown office building
Step 1: Pull the LL84 record Look up the building on NYC Open Data's LL84 benchmarking dataset. Pull Site EUI, Source EUI, and ENERGY STAR score for the most recent reporting year. For this example, assume the building reports a Site EUI of 78 kBtu/sf/yr and an ENERGY STAR score of 38.6
Step 2: Compare to peer median The NYC LL84 office median Site EUI is approximately 132 kBtu/sf based on the Kontokosta study of the cleaned LL84 dataset. The example building at 78 kBtu/sf is well below the median, but the ENERGY STAR score of 38 (below the 50th percentile) suggests its peer-normalized performance is worse than the raw EUI implies. Translation: the building is small enough or low-occupancy enough that its absolute energy use is modest, but its energy use per unit of activity is higher than peers.7
Step 3: Pull the LL97 cap The building's LL97 emissions cap depends on its occupancy class. For a Class B office (Energy Star Portfolio Manager type), the 2024 to 2029 emissions limit is roughly 0.00846 metric tons CO2e per square foot per year, falling to 0.00453 for 2030 to 2034. For a 250,000 sf building, that translates to caps of about 2,115 tCO2e/yr (current) and 1,133 tCO2e/yr (2030).8
Step 4: Estimate current emissions Convert the building's energy use to emissions using NYC-specific emission factors. Approximate factors: 0.000288 tCO2e per kWh of electricity (NYISO-zoned grid), 0.00531 tCO2e per therm of natural gas. A typical 250,000 sf office at 78 kBtu/sf runs roughly 1,950 MWh of electricity and 28,000 therms of gas, yielding approximately 710 tCO2e/yr Scope 1+2.
Step 5: Identify the gap At 710 tCO2e against a 2024 to 2029 cap of 2,115 tCO2e, the building is comfortably under its current cap. Against a 2030 to 2034 cap of 1,133 tCO2e, it is also under, but with much less headroom. The audit signal is not the absolute level but the trajectory: at current operations, the building is fine through 2029. If natural gas use rises by 60 percent (a realistic outcome of expanding occupancy or extending hours), the building breaches the 2030 cap.
Step 6: Translate to dollars Each ton of overage costs $268. A 100-ton breach in 2030 is a $26,800 annual penalty, recurring every year operations remain over the cap. Capitalized at a 10 percent discount rate, the present value of an unaddressed 100-ton overage is roughly $268,000. That number, not the abstract emissions delta, is what shows up in a credit memo or a lease negotiation.
Audit output This building has no current LL97 exposure but limited 2030 headroom. A retrofit recommendation should be modeled against capex, payback under the current cap, and expected operating-pattern changes (occupancy growth, hours extension). The audit conclusion is forward-looking, financially translated, and falsifiable: a re-audit in 2027 either confirms the trajectory or revises it.

This example uses one building. The same methodology applies to a portfolio, an industry, or a counterparty. The data sources change, the unit economics change, but the discipline of pulling primary records, comparing to peer norms, computing a gap, and translating to dollars holds across every sustainability audit.

The dataset map.

An audit is only as good as the data behind it. These are the public datasets that allow an outside party to reconstruct most of an NYC entity's environmental performance, and the limitations that go with each.

Building energy and emissions

DatasetWhat it coversWhat it can't tell you
NYC Open Data: LL84 Benchmarking (2023 to Present) Site EUI, Source EUI, ENERGY STAR score, GHG intensity for all NYC buildings 25,000 sf and larger. One row per BIN (Building Identification Number) per reporting year. Self-reported through Portfolio Manager. Roughly 10 to 15 percent of submissions contain entry errors that DOB later flags. Pre-cleaning required before analytic use.
DOB Sustainability Laws Covered Buildings List The official list of buildings subject to LL97, with assigned occupancy class and emissions limits. Updated each March. Does not include actual emissions; that data lives in the BEAM portal and is not fully public at the building level.
EPA ENERGY STAR Portfolio Manager National benchmarking infrastructure. Reference medians for over 80 building types, climate-normalized. Building-specific data is private to the owner; only aggregate medians are public.

Federal emissions and energy

DatasetWhat it coversWhat it can't tell you
EPA FLIGHT (GHGRP) Facility-level GHG emissions for all US facilities emitting 25,000+ tCO2e/yr. Broken out by source category and gas. Threshold-gated: small and mid-size facilities are not in the dataset. Does not include Scope 2 or Scope 3 separately.
EIA Form 861 Utility-level retail sales, customer counts, and revenue by state and customer class. Annual. Not building-level. Useful for sector-level analysis and for estimating utility-specific emission factors.
EPA eGRID Power plant emission rates by NERC region and state. The standard reference for grid emission factors used in Scope 2 calculations. Updated on a 12 to 18 month lag. Recent grid changes (renewables additions, plant retirements) are not reflected immediately.

State and program-level

DatasetWhat it coversWhat it can't tell you
NYSERDA program data Participation in commercial efficiency programs, Beneficial Electrification incentives, and retrofit financing. Aggregate and project-level reports. Lags actual project completion by 6 to 12 months.
NYS DEC Statewide GHG Emissions Report Statewide GHG inventory by sector and source. Used as the baseline for the Climate Leadership and Community Protection Act targets. State-level, not entity-level.

Voluntary corporate disclosure

DatasetWhat it coversWhat it can't tell you
CDP Climate Change Responses Voluntary corporate climate disclosures. Scope 1, 2, 3, governance, targets, and risk assessments. Searchable by company. Self-reported and unaudited. Coverage is best for large public companies; small private companies are largely absent.
SBTi Target Database Companies with science-based emissions targets, validated by the Science Based Targets initiative. Includes target year and scope. Target setting, not progress. Many companies on the list are off-track.

Waste and water

DatasetWhat it coversWhat it can't tell you
DSNY Operational Reports Tonnage of refuse, recycling, and organics collected citywide and by district. Annual. Residential focus. Commercial waste data is more limited and largely lives with the carters.
DEP Water Quality and Use Reports Citywide water consumption, sewer load, and quality monitoring. Building-level water use data is in LL84 but is incomplete; not all buildings are required to report water.

Pulling these datasets is the easy part. The hard part is normalization: aligning vintages, harmonizing units, mapping building IDs to entities, and reconciling emission factors across sources. A clean cross-source dataset is itself an analytical asset.

The analytical disposition.

What separates a useful sustainability audit from a checkbox audit is not the dataset, the framework, or the software. It is the disposition the auditor brings to the work.

Treat disclosed numbers as hypotheses, not facts. Every metric in a sustainability report represents a chain of decisions about boundaries, vintages, and methodology. The audit's job is to identify which decisions were made and whether they hold up under stress. Numbers that appear precise to four significant figures should be tested at the second.

Model alternative scenarios as a matter of course. A single point estimate of forward emissions or penalty exposure is rarely defensible. The discipline is to model a base case, an upside, and a downside, with the assumptions that drive the spread documented explicitly. Where data is modeled rather than measured, this is the only intellectually honest output.

Be specific about uncertainty. Scope 3 emissions estimates often carry 50 percent confidence intervals. Reporting them as point estimates implies a precision that does not exist. Reporting them as ranges, with the methodological basis named, allows downstream readers to weight the data appropriately.

Cross-check disclosed against modeled. The most diagnostic single test in a sustainability audit is the gap between an entity's disclosed emissions and an independent model. Large gaps in either direction warrant investigation. A company that reports much lower emissions than the model predicts may have genuinely better operations, or may be using boundary choices to exclude inconvenient sources. A company that reports much higher emissions may be conservative, or may be including activities that competitors exclude.

Document the gap between the spirit and the letter. An entity can be technically compliant with every disclosure framework and still be evading the substance of what those frameworks are trying to capture. The audit serves the underlying question, not the framework. When the framework and the question diverge, the auditor's job is to flag the divergence, not paper over it.

Sustainability auditing is a practice that has not yet stabilized in the way that financial auditing has. The standards are still evolving, the datasets are still incomplete, and the methodology is still being argued out in literature and in practice. The auditors who will matter most over the next decade are the ones who can hold rigor and humility at the same time: rigorous in method, humble about the limits of what the data can support.

Notes
  1. EPA ENERGY STAR Portfolio Manager Technical Reference, "Source Energy" methodology; Sustainability Atlas, "Building Energy Performance Benchmarks," February 2026. Source-to-site multipliers vary by grid mix; 2.8 to 3.0 reflects the range used in EPA scoring as of 2025.
  2. World Resources Institute / WBCSD GHG Protocol, Corporate Standard. Scope definitions and category breakdowns under Scope 3 are detailed in the GHG Protocol Scope 3 Standard.
  3. GrowNYC waste composition analysis; DSNY operational reporting on the 14M-ton citywide annual waste stream.
  4. EPA ENERGY STAR scoring methodology. NYC's Local Law 33 letter-grade label uses ENERGY STAR scores as the basis: A is 85+, B is 70-84, C is 55-69, D is below 55, and N applies where a score cannot be calculated.
  5. CDP, "Closing the Gap" series on disclosed-vs-actual emissions; academic literature on Scope 3 estimation methods (Stadler et al., 2018; Klaaßen and Stoll, 2021).
  6. NYC Open Data, "Energy and Water Data Disclosure for Local Law 84." The example values are illustrative; pull actual values for any specific building before drawing conclusions.
  7. Kontokosta, C., "LL84 Energy Benchmarking" technical report. The cleaned NYC LL84 dataset of office buildings (n = 948) showed median Site EUI of approximately 132 kBtu/sf and median Source EUI of approximately 213 kBtu/sf.
  8. NYC Department of Buildings, LL97 Article 320 emissions limits by occupancy classification. The figures used in this example are approximations for illustration; actual caps are assigned per building based on its specific Energy Star Portfolio Manager use type.

All datasets, methodologies, and figures referenced are publicly available. Specific values used in the worked example are illustrative.