Start converting 'scrape' to 'process' (#1)

mikebannis · brtietz · web-flow · commit b40db9685b35 · 2023-08-17T09:00:10.000-06:00
* Start converting 'scrape' to 'process'

* Finish converting 'scrape' to 'process'

---------

Co-authored-by: Brian Mirletz &lt;brian.mirletz@nrel.gov&gt;
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,104 @@
+# Custom list
+.DS_Store
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# dotenv
+.env
+
+# virtualenv
+.venv
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 Python files and Jupyter notebooks for processing the Annual Technology Baseline (ATB) electricity data and determining LCOE and other metrics. All documentation and data for the ATB is available at the [ATB website](https://atb.nrel.gov).
 
 ## Installation and Requirements
-The pipeline requires [Python](https://www.python.org) 3.10 or newer. Dependancies can be installed using `pip`: 
+The pipeline requires [Python](https://www.python.org) 3.10 or newer. Dependancies can be installed using `pip`:
 
 ```
 $ pip install -r requirements.txt
@@ -30,27 +30,27 @@ is the path and filename to the ATB electricity data workbook `xlsx` file.
 Process all techs and export to a flat file named `flat_file.csv`:
 
 ```
-$ python -m lcoe_calculator.full_scrape --save-flat flat_file.csv {PATH-TO-DATA-WORKBOOK}
+$ python -m lcoe_calculator.process_all --save-flat flat_file.csv {PATH-TO-DATA-WORKBOOK}
 ```
 
 Process only land-based wind and export pivoted data and meta data:
 
 ```
-$ python -m lcoe_calculator.full_scrape --tech LandBasedWindProc \
+$ python -m lcoe_calculator.process_all --tech LandBasedWindProc \
 	--save-pivoted pivoted_file.csv --save-meta meta_file.csv {PATH-TO-DATA-WORKBOOK}
 ```
 
 Process only pumped storage hydropower and copy data to the clipboard so it may be pasted into a spreadsheet:
 
 ```
-$ python -m lcoe_calculator.full_scrape --tech PumpedStorageHydroProc \
+$ python -m lcoe_calculator.process_all --tech PumpedStorageHydroProc \
 	--clipboard {PATH-TO-DATA-WORKBOOK}
 ```
 
-Help for the scraper and the names of available technologies can be viewed by running:
+Help for the processor and the names of available technologies can be viewed by running:
 
 ```
-$ python -m lcoe_calculator.full_scrape --help
+$ python -m lcoe_calculator.process_all --help
 ```
 
 ## Debt Fraction Calculator
diff --git a/debt_fraction_calculator/debt_fraction_calc.py b/debt_fraction_calculator/debt_fraction_calc.py
@@ -5,15 +5,14 @@
 # (see https://github.com/NREL/ATB-calc).
 #
 """
-Workflow to calculate debt fractions based on scraped data
+Workflow to calculate debt fractions based on ATB data
 
 Developed against PySAM 4.0.0
 """
 from typing import TypedDict, List, Dict, Type
 import pandas as pd
 import click
 
-
 import PySAM.Levpartflip as levpartflip
 
 from lcoe_calculator.extractor import Extractor
diff --git a/example_notebooks/Full work flow.ipynb b/example_notebooks/Full work flow.ipynb
@@ -31,7 +31,7 @@
     "from datetime import datetime as dt\n",
     "\n",
     "sys.path.insert(0, os.path.dirname(os.getcwd()))\n",
-    "from lcoe_calculator.full_scrape import FullScrape\n",
+    "from lcoe_calculator.process_all import ProcessAll\n",
     "from lcoe_calculator.tech_processors import (ALL_TECHS,\n",
     "    OffShoreWindProc, LandBasedWindProc, DistributedWindProc,\n",
     "    UtilityPvProc, CommPvProc, ResPvProc, UtilityPvPlusBatteryProc,\n",
@@ -79,8 +79,8 @@
     "# Or process a single technology\n",
     "techs = LandBasedWindProc\n",
     "\n",
-    "# Initiate the scraper with the workbook location and desired technologies\n",
-    "scraper = FullScrape(atb_electricity_workbook, techs)"
+    "# Initiate the processor with the workbook location and desired technologies\n",
+    "processor = ProcessAll(atb_electricity_workbook, techs)"
    ]
   },
   {
@@ -90,7 +90,10 @@
    "metadata": {},
    "source": [
     "## Run the pipeline\n",
-    "Now that the scraper knows where the data workbook is and which technologies were interested in we can kick it off. Depending on the number of requested technologies, this can take a couple minutes. Note that calculated LCOE and CAPEX is automatically compared to the values in the workbook. Not all technologies have LCOE and CAPEX."
+    "Now that the processor knows where the data workbook is and which technologies we are interested in, we\n",
+    "can kick it off. Depending on the number of requested technologies, this can take a couple minutes.\n",
+    "Note that calculated LCOE and CAPEX is automatically compared to the values in the workbook. Not all\n",
+    "technologies have LCOE and CAPEX."
    ]
   },
   {
@@ -103,7 +106,7 @@
    "outputs": [],
    "source": [
     "start = dt.now()\n",
-    "scraper.scrape()\n",
+    "processor.process()\n",
     "print('Processing completed in ', dt.now() - start)"
    ]
   },
@@ -124,16 +127,16 @@
    "outputs": [],
    "source": [
     "# Save data to as a CSV\n",
-    "scraper.to_csv('atb_data.csv')\n",
+    "processor.to_csv('atb_data.csv')\n",
     "\n",
     "# Save flattened data to as a CSV\n",
-    "scraper.flat_to_csv('atb_data_flat.csv')\n",
+    "processor.flat_to_csv('atb_data_flat.csv')\n",
     "\n",
     "# Save meta data to as a CSV\n",
-    "scraper.meta_data_to_csv('atb_meta_data.csv')\n",
+    "processor.meta_data_to_csv('atb_meta_data.csv')\n",
     "\n",
     "# Copy data to the clipboard so it can be pasted in a spreadsheet \n",
-    "scraper.data.to_clipboard()"
+    "processor.data.to_clipboard()"
    ]
   },
   {
@@ -152,7 +155,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "data = scraper.data\n",
+    "data = processor.data\n",
     "\n",
     "# Show available parameters\n",
     "print('Available parameters')\n",
@@ -184,7 +187,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.12"
   },
   "vscode": {
    "interpreter": {
diff --git a/example_notebooks/Process ATB electricity technology.ipynb b/example_notebooks/Process ATB electricity technology.ipynb
@@ -31,7 +31,6 @@
     "from datetime import datetime as dt\n",
     "\n",
     "sys.path.insert(0, os.path.dirname(os.getcwd()))\n",
-    "from lcoe_calculator.full_scrape import FullScrape\n",
     "\n",
     "# Electricity technology processors\n",
     "from lcoe_calculator.tech_processors import (\n",
diff --git a/lcoe_calculator/README.md b/lcoe_calculator/README.md
@@ -1,14 +1,15 @@
 # ATB Calculator
-This code scrapes the Excel ATB data workbook, calculates LCOE and CAPEX
-for all technologies as needed, and exports data in flat or flat + pivoted formats.
+This code extracts data from the ATB Excel workbook, then calculates LCOE and CAPEX for all
+technologies as needed, and exports data in flat or flat and pivoted formats.
 
-**Note:** You will likely have to give Python access to interact with Excel. A window will automatically ask for permission the first time this script is run.
+**Note:** You will likely have to give Python access to interact with Excel. A window will
+automatically ask for permission the first time this script is run.
 
 ## Files
 Files are listed in roughly descending order of importance and approachability.
 
- - `full_scrape.py` - Class that performs full scrape with built in command line interface. See the README in the root of this repo for CLI examples.
- - `tech_processors.py` - Classes to scrape and process individual technologies. Any new ATB technologies should be added to this file.
+ - `process_all.py` - Class that performs processing for all ATB technologies with a built-in command line interface. See the README in the root of this repo for CLI examples.
+ - `tech_processors.py` - Classes to process individual technologies. Any new ATB technologies should be added to this file.
  - `base_processor.py` - Base processor class that is subclassed to process individual technologies.
  - `config.py` - Constant definitions including the base year and scenario names
  - `extractor.py` - Code to pull values from the workbook
diff --git a/lcoe_calculator/process_all.py b/lcoe_calculator/process_all.py
diff --git a/lcoe_calculator/tech_processors.py b/lcoe_calculator/tech_processors.py