Skip to content

Latest commit

 

History

History
151 lines (101 loc) · 8.02 KB

File metadata and controls

151 lines (101 loc) · 8.02 KB

Foundational Industry Energy Dataset (FIED)

Summary

This is an effort by the National Renewable Energy Laboratory (NREL) and Argonne National Laboratory (ANL) to create an experimental foundational industry dataset for energy and emissions analysis and modeling. The code draws from various publicly-available data, primarily from the U.S. EPA, to compile a data set on unit-level energy use and characterization for U.S. industrial facilities in 2017.

The FIED, and the accompanying technical report, can be downloaded from its Open Energy Data Initiative submission.

Getting Started

Manual Data Downloads

Due to the nature of how they are provided, several data sets must be manually downloaded before the code can be run sucessfully. These data sets and their director locations are:

  1. Source Classification Codes (SCCs)

  2. 2017 National Emissions Inventory (NEI)

  3. GHGRP Emissions by Unit and Fuel Type

Environment

fied_environment.yml is the conda environment used when creating the foundational dataset. Its key dependencies include:

  • python=3.9.18=h6244533_0
  • pandas=1.2.0=py39h2e25243_1
  • numpy=1.23.4=py39hbccbffa_1
  • geopandas=0.12.1=pyhd8ed1ab_1
  • openpyxl=3.0.10=py39h2bbff1b_0

Compiling the FIED

In addition to manually downloading the above datasets, executing the calulations and data compilation requires two steps after activating the fied environment.

  1. ./frs/frs_extraction.py. This will download, extract, and format EPA FRS data. The resulting csv should be saved in data/FRS/.
  2. fied_compilation.py. This will execute all of the remaining steps for compiling the foundational data set.

So, from the terminal or Anaconda prompt:

conda activate fied

python ./frs/frs_extraction.py

python fied_compilation.py

Directory Navigation

The underlying submodules and data are organized as follows:

  • analysis: Methods for analyzing and generating figures of the final dataset.
  • data: Most folders are created locally for organizing raw data. Contains a directory list.
  • energy: Not currently used. For future estimation of facility energy use based on alternative approaches.
  • frs: Methods for downloading and formatting EPA Facility Registry Service data.
  • geocoder: Methods for collecting missing geographical information for facilities.
  • ghgrp: Methods for estimating energy use from GHG emissions reported under EPA's Greenhouse Gas Reporting Program. Based on previous projects, such as the Industry Energy Data Book.
  • nei: Methods for downloading and formatting data from EPA's National Emissions Inventory and for using these data to characterize combustion units.
  • qpc: Methods for downloading and formatting operating hours reported under the Census Bureau's Quaterly Survey of Plant Capacity Utlization.
  • scc: Methods to download and apply EPA's Source Classification Codes for characterizing units.
  • tests: Testing. Currently very limited.
  • tools: Methods that act as various tools used across submodules.

Overivew of FIED Data Fields

Data fields are compiled and described in FIED_datafields.yml. All facilities in the data set are represented by their unique registryID, which is their EPA Facility Registry Service ID.

Many of these data fields were included in original EPA data sources. See the FRS data dictionary for more information.

Identity

In addition to registryID, other identifying fields include

  • eisFacilityID: EPA ID assigned to facilities reporting to the Emissions Inventory System (EIS).
  • ghgrpID: EPA ID assigned to facilities reporting under the Greenhouse Gas Reporting Program (GHGRP).
  • name: Name of facility.
  • locationDescription: Description of the facility location.
  • naicsCode: The facility's North American Industrial Classification System (NAICS) code.
  • naicsCodeAdditional: A facility may have additional NAICS codes assigned (e.g., different reporting systems may have different NAICS assigned).

Geography

Various levels of geographic identifiers are included, such as

Units and Processes

Individual units are characterized (e.g., unit type, capacity, energy, throughput) where possible. Individual units may be associated with multiple processes.

  • designCapacity: design capacity of unit.
  • eisUnitID: U.S. EPA Emissions Inventory System (EIS) unit ID.
  • unitName: unit name.
  • unitType: reported or inferred unit type.
  • unitTypeStd: standardized unitType.
  • processDescription: description of process. Processes may have more than one unit associated with them.
  • eisProcessID: U.S. EPA Emissions Inventory System (EIS) process ID. Processes may have more than one unit associated with them.

Energy

Depending on the estimation approach, a unit may have a single estimate of energy use, or a range of energy estimates (i.e., minimum, median, upper quartile). Energy estimates based on the NEI are presented as a range.

  • energyMJ: energy estimate in MJ
  • energyMJ0: minimum of energy estimate, in MJ.
  • energyMJq2: median of energy estimate, in MJ.
  • energyMJq3: upper quartile of energy estimate, in MJ.
  • fuelType: combusted fuel type as reported by original data source.
  • fuelTypeStd: combusted fuel type, standardized.
  • energyEstimateSource: source of underlying data used to make energy estimate. Some energy values are provided directly by GHGRP data.

Greenhouse Gas (GHG) Emissions

  • ghgsTonneCO2e: GHG emissions estimate (or reported data) in metric tonnes CO2 equivalents.
  • ghgsTonneCO2eQ0: minimum of GHG emissions estimate in metric tonnes CO2 equivalents.
  • ghgsTonneCO2eQ2: median GHG emissions estimate in metric tonnes CO2 equivalents.
  • ghgsTonneCO2eQ2: upper quartile of GHG emissions estimate in metric tonnes CO2 equivalents.
  • ghgsEstimateSource: source of underlying data used to make energy estimate. GHGRP emissions data are used directly, as are some NEI data.

Other

We've attempted to include additional descriptive fields where possible. These tend to be sparsely populated at this time.

  • hucCode8: Hydrolic Unit Code. Not currently implemented.
  • weeklyOpHours: Average weekly operating hours by quarter, including 95% confidence interval ranges.
  • sensitiveInd: Indicates whether or not the associated data is enforcement sensitive.
  • envJusticeCode: The code that identifies the type of environmental justice concern affecting the facility or enforcement action.
  • smallBusInd: Code indicating whether or not a business is requesting relief under EPA’s Small Business Policy, which applies to businesses having less than 100 employees.
  • througputTonne: Estimated mass throughput.