Worldly_LogoTM_Urgency@4x-1

Higg FEM Outlier Detection

Methodology

Last update: February 2025

Summary

The document outlines the methodology for detecting unusual or potentially incorrect data points, also referred to as outliers or anomalies, in Higg FEM assessments. Worldly's "Data Check" flags outliers once a facility has completed 98% of their Higg FEM, and asks the facility to double-check their answer to ensure it is correct. The facility can choose to correct their answer, or can dismiss the notification and put in a note for why they believe that the number is correct. 

Detecting outliers first relies on basic sanity checks, like ensuring energy usage falls between a reasonable range. For example, it would be highly unlikely for a facility to have energy usage less than an average household's energy consumption, and also very unlikely for a facility to have energy usage above 10 billion MJ. Outlier detection then uses statistical methods to identify values that deviate significantly from typical patterns seen across similar facilities, which we cover in this document.

There are two types of outliers: erroneous ones caused by mistakes like typos or incorrect units, and true outliers that represent rare but legitimate events. The outlier detection method described in this document successfully identified 86% of known bad data examples in testing. The model is intentionally conservative and requires human expertise to determine whether flagged values represent actual errors or unusual but valid data points.

Definition of Data Outliers

We refer to data points with unexpected or extreme values in the context of the entire data set as anomalous data points or outliers. These points often deviate significantly from typical patterns or trends in the data set and can provide important insights or signal potential issues.

Outliers can generally be categorized into two broad types:

  1. Erroneous outliers are data points that arise from errors during data collection, recording, calculation, or entry. They represent inaccuracies and do not reflect true underlying patterns in the data. For example, a typographical error that records a person’s age as 250 years instead of 25 would be an erroneous outlier, as it does not correspond to a realistic value. Similarly, recording an energy value in the wrong units without conversion (e.g., a value in megajoules, MJ, assigned units of kilowatt-hours, kWh) also leads to erroneous outliers. Such “bad data” must be identified and addressed – typically by removal or correction (e.g., via verification or replacement with imputed values) – to maintain the integrity of the analysis.

  2. True outliers are legitimate data points that are statistically rare but not erroneous. They occur naturally in the data and reflect real phenomena or occurrences. True outliers often provide valuable information about unusual but valid behaviors or events. For example, an anomalous increase in facility energy use might correspond to increased cooling implemented during a heat wave. True outliers can reveal critical trends, risks, or opportunities and should not be dismissed without analysis.

Distinguishing between true and erroneous outliers can be challenging. Context and domain expertise are essential, as the same data point could be either valid or erroneous depending on circumstances. For example, an extreme temperature might indicate a genuine heatwave or a faulty sensor. Additionally, in a normal (or near-normal) distribution, about 0.3% of data points – roughly 3 in every 1000 – are statistically expected to have values three or more standard deviations away from the mean. These extreme values, though rare, are entirely consistent with the characteristics of the distribution and are not indicative of errors. Without validation or additional information, there’s a risk of discarding data that could lead to valuable insights or failing to address errors that distort analytical results.

At this time, the outlier detection methodology described herein is focused on conservatively identifying outliers, without distinguishing whether they are true or erroneous. Future enhancements to the methodology will include processes aimed at differentiating between true and erroneous outliers.

Initial Data Set

Cascale and Worldly used the complete set of FEM23 assessments as the initial data set for this analysis. The analysis of this data set helped established outlier thresholds for use with the FEM24 assessments, both in situ and post hoc. These thresholds will also be used to perform a post-hoc outlier analysis of the complete FEM23 data set to retroactively identify and prioritize candidates for outlier mitigation (e.g., verification, exclusion, replacement, etc).

Approach to Identifying Outliers

To identify outliers, a standard statistical method known as the Interquartile Range (IQR) is often used. This method compares values to thresholds based on the middle portion of the data. However, IQR gives the best results when a data set follows the familiar “bell curve” distribution. FEM data does not follow the traditional bell curve distribution, since it includes data points clustered in a narrow range of small values and larger values distributed over a much broader range. To address this, Worldly and Cascale used a modified IQR method that adjusts how the middle portion is calculated by better taking into account the shape of the FEM data distribution. It balances the influence of the tightly grouped small values with the more dispersed large values, and recalculates the key thresholds for identifying outliers. This approach, designed for non-bell curve data, significantly reduces the number of points that are incorrectly flagged as outliers (“false positives”), making the results more appropriate for application to the FEM data.

Here is what the modified IQR threshold looks like compared to the standard IQR threshold across a sample set of responses from the Higg FEM assessment. 

Screenshot 2025-02-07 at 11.11.38 AM

By accounting for a distribution that does not follow a traditional bell curve, the modified IQR will flag fewer false positives for potential outliers. 

Cascale and Worldly used an IQR factor of 1.5 for the threshold for energy, and 1.7 for water. Think of an IQR like a sensitivity dial for detecting unusual values. When analyzing data, you want to identify values that are unusually high or low. The IQR factor determines how "unusual" a value needs to be before it gets flagged. A higher factor (like 2.0) is less sensitive and only flags the most extreme values, while a lower factor (like 1.0) is more sensitive and flags more values as unusual. The standard factor of 1.5 hits a sweet spot - it's sensitive enough to catch most problematic data points but not so sensitive that it flags too many normal values. The IQR factor of 1.5 is a well-established statistical value that effectively balances identifying true outliers while minimizing false positives. For normally distributed data, this factor captures about 99.3% of data points, similar to the common 3-sigma rule. As statistician John Tukey famously explained, "Two was too big and one was too small."

While the team considered larger values like 1.75 and 2.0 for the FEM data analysis, they found no compelling reason to deviate from the standard 1.5 factor after testing. This decision aligned with both statistical best practices and the practical needs of the FEM data analysis.

Single Year-on-Year (YoY) Change

This is a simple calculation of the percentage change between FEM23 and FEM22 for each facility. At the current time, the YOY outlier thresholds are not sensitive to small, gradual changes expected as part of long-term trends from one year to the next (e.g., a shallow long-term trend in changing annual energy values).  

Thus, the YOY comparison will generally flag only anomalously large values that are inconsistent with any trend in the data resulting from annual changes associated with normal operations of a given facility. Note that for water, FEM22 and FEM21 were used due to substantial change to the water prompts in FEM23 which made a direct comparison of FEM23 and FEM22 infeasible. See tables before for the % change thresholds for flagging YoY change outliers.  

Total Energy Outlier Thresholds

Threshold Category

Threshold Group

Threshold Value (MJ)

Total Energy (upper thresholds)

finalProductAssembly_total_mj

218,000,000
Total Energy (upper thresholds)

hardComponentTrimProduction_total_mj

198,000,000
Total Energy (upper thresholds)

materialProduction_total_mj

1,059,500,000
Total Energy (upper thresholds)

printingProductDyeingAndLaundering_total_mj

645,000,000
Total Energy (upper thresholds)

rawMaterialProcessing_Collection

660,000,000
Total Energy (upper thresholds)

domestic_total_mj

64,500,000
Total Energy (upper thresholds)

vehicle_total_mj

56,000,000

Facility summary (upper thresholds)

assessmentResponseSum mj upper 1,035,500,000

Energy per Production Unit

(upper thresholds)

finalProductAssembly (per unit)

810

Energy per Production Unit

(upper thresholds)

hardComponentTrimProduction (per unit)

870

Energy per Production Unit

(upper thresholds)

materialProduction (per unit)

405

Energy per Production Unit

(upper thresholds)

printingProductDyeingAndLaundering (per unit)

1,755

Energy per Production Unit

(upper thresholds)

rawMaterialProcessing_Collection (per unit)

270

Single Year-on-Year Change

(upper and lower thresholds)

Upper (threshold determined from assessment-level sums)

565%

Single Year-on-Year Change

(upper and lower thresholds)

Lower (threshold determined from assessment-level sums)

-100%

 

Total Water Outlier Thresholds

Threshold Category

Threshold Group

Threshold Value (Liters)

Total Water (upper thresholds) finalProductAssembly_total_liters upper 320,800,000
Total Water (upper thresholds) hardComponentTrimProduction_total_liters upper 220,600,000
Total Water (upper thresholds) materialProduction_total_liters upper 1,907,900,000
Total Water (upper thresholds) printingProductDyeingAndLaundering_total_liters upper 1,379,700,000
Total Water (upper thresholds) rawMaterialProcessing_Collection upper 617,700,000
Total Water (upper thresholds) domestic_total_liters upper 366,700,000
Facility summary (upper thresholds) assessmentResponseSum liters upper 1,957,500,000
Water per Production Unit (upper thresholds) finalProductAssembly per unit upper 2,288
Water per Production Unit (upper thresholds) hardComponentTrimProduction per unit upper 1,471
Water per Production Unit (upper thresholds) materialProduction per unit upper 1,059
Water per Production Unit (upper thresholds) printingProductDyeingAndLaundering per unit upper 5,642
Water per Production Unit (upper thresholds) rawMaterialProcessing_Collection per unit upper 587
Single Year-on-Year Change (upper and lower thresholds) Upper (threshold determined from assessment-level sums) 524%
Single Year-on-Year Change (upper and lower thresholds) Lower (threshold determined from assessment-level sums) -100%