Skip to content

Commit b40db96

Browse files
mikebannisbrtietz
andauthored
Start converting 'scrape' to 'process' (#1)
* Start converting 'scrape' to 'process' * Finish converting 'scrape' to 'process' --------- Co-authored-by: Brian Mirletz <[email protected]>
1 parent a994510 commit b40db96

File tree

8 files changed

+158
-52
lines changed

8 files changed

+158
-52
lines changed

.gitignore

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Custom list
2+
.DS_Store
3+
4+
# Byte-compiled / optimized / DLL files
5+
__pycache__/
6+
*.py[cod]
7+
*$py.class
8+
9+
# C extensions
10+
*.so
11+
12+
# Distribution / packaging
13+
.Python
14+
env/
15+
build/
16+
develop-eggs/
17+
dist/
18+
downloads/
19+
eggs/
20+
.eggs/
21+
lib/
22+
lib64/
23+
parts/
24+
sdist/
25+
var/
26+
wheels/
27+
*.egg-info/
28+
.installed.cfg
29+
*.egg
30+
31+
# PyInstaller
32+
# Usually these files are written by a python script from a template
33+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
34+
*.manifest
35+
*.spec
36+
37+
# Installer logs
38+
pip-log.txt
39+
pip-delete-this-directory.txt
40+
41+
# Unit test / coverage reports
42+
htmlcov/
43+
.tox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
.hypothesis/
51+
52+
# Translations
53+
*.mo
54+
*.pot
55+
56+
# Django stuff:
57+
*.log
58+
local_settings.py
59+
60+
# Flask stuff:
61+
instance/
62+
.webassets-cache
63+
64+
# Scrapy stuff:
65+
.scrapy
66+
67+
# Sphinx documentation
68+
docs/_build/
69+
70+
# PyBuilder
71+
target/
72+
73+
# Jupyter Notebook
74+
.ipynb_checkpoints
75+
76+
# pyenv
77+
.python-version
78+
79+
# celery beat schedule file
80+
celerybeat-schedule
81+
82+
# SageMath parsed files
83+
*.sage.py
84+
85+
# dotenv
86+
.env
87+
88+
# virtualenv
89+
.venv
90+
venv/
91+
ENV/
92+
93+
# Spyder project settings
94+
.spyderproject
95+
.spyproject
96+
97+
# Rope project settings
98+
.ropeproject
99+
100+
# mkdocs documentation
101+
/site
102+
103+
# mypy
104+
.mypy_cache/

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Python files and Jupyter notebooks for processing the Annual Technology Baseline (ATB) electricity data and determining LCOE and other metrics. All documentation and data for the ATB is available at the [ATB website](https://atb.nrel.gov).
44

55
## Installation and Requirements
6-
The pipeline requires [Python](https://www.python.org) 3.10 or newer. Dependancies can be installed using `pip`:
6+
The pipeline requires [Python](https://www.python.org) 3.10 or newer. Dependancies can be installed using `pip`:
77

88
```
99
$ pip install -r requirements.txt
@@ -30,27 +30,27 @@ is the path and filename to the ATB electricity data workbook `xlsx` file.
3030
Process all techs and export to a flat file named `flat_file.csv`:
3131

3232
```
33-
$ python -m lcoe_calculator.full_scrape --save-flat flat_file.csv {PATH-TO-DATA-WORKBOOK}
33+
$ python -m lcoe_calculator.process_all --save-flat flat_file.csv {PATH-TO-DATA-WORKBOOK}
3434
```
3535

3636
Process only land-based wind and export pivoted data and meta data:
3737

3838
```
39-
$ python -m lcoe_calculator.full_scrape --tech LandBasedWindProc \
39+
$ python -m lcoe_calculator.process_all --tech LandBasedWindProc \
4040
--save-pivoted pivoted_file.csv --save-meta meta_file.csv {PATH-TO-DATA-WORKBOOK}
4141
```
4242

4343
Process only pumped storage hydropower and copy data to the clipboard so it may be pasted into a spreadsheet:
4444

4545
```
46-
$ python -m lcoe_calculator.full_scrape --tech PumpedStorageHydroProc \
46+
$ python -m lcoe_calculator.process_all --tech PumpedStorageHydroProc \
4747
--clipboard {PATH-TO-DATA-WORKBOOK}
4848
```
4949

50-
Help for the scraper and the names of available technologies can be viewed by running:
50+
Help for the processor and the names of available technologies can be viewed by running:
5151

5252
```
53-
$ python -m lcoe_calculator.full_scrape --help
53+
$ python -m lcoe_calculator.process_all --help
5454
```
5555

5656
## Debt Fraction Calculator

debt_fraction_calculator/debt_fraction_calc.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,14 @@
55
# (see https://github.com/NREL/ATB-calc).
66
#
77
"""
8-
Workflow to calculate debt fractions based on scraped data
8+
Workflow to calculate debt fractions based on ATB data
99
1010
Developed against PySAM 4.0.0
1111
"""
1212
from typing import TypedDict, List, Dict, Type
1313
import pandas as pd
1414
import click
1515

16-
1716
import PySAM.Levpartflip as levpartflip
1817

1918
from lcoe_calculator.extractor import Extractor

example_notebooks/Full work flow.ipynb

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
"from datetime import datetime as dt\n",
3232
"\n",
3333
"sys.path.insert(0, os.path.dirname(os.getcwd()))\n",
34-
"from lcoe_calculator.full_scrape import FullScrape\n",
34+
"from lcoe_calculator.process_all import ProcessAll\n",
3535
"from lcoe_calculator.tech_processors import (ALL_TECHS,\n",
3636
" OffShoreWindProc, LandBasedWindProc, DistributedWindProc,\n",
3737
" UtilityPvProc, CommPvProc, ResPvProc, UtilityPvPlusBatteryProc,\n",
@@ -79,8 +79,8 @@
7979
"# Or process a single technology\n",
8080
"techs = LandBasedWindProc\n",
8181
"\n",
82-
"# Initiate the scraper with the workbook location and desired technologies\n",
83-
"scraper = FullScrape(atb_electricity_workbook, techs)"
82+
"# Initiate the processor with the workbook location and desired technologies\n",
83+
"processor = ProcessAll(atb_electricity_workbook, techs)"
8484
]
8585
},
8686
{
@@ -90,7 +90,10 @@
9090
"metadata": {},
9191
"source": [
9292
"## Run the pipeline\n",
93-
"Now that the scraper knows where the data workbook is and which technologies were interested in we can kick it off. Depending on the number of requested technologies, this can take a couple minutes. Note that calculated LCOE and CAPEX is automatically compared to the values in the workbook. Not all technologies have LCOE and CAPEX."
93+
"Now that the processor knows where the data workbook is and which technologies we are interested in, we\n",
94+
"can kick it off. Depending on the number of requested technologies, this can take a couple minutes.\n",
95+
"Note that calculated LCOE and CAPEX is automatically compared to the values in the workbook. Not all\n",
96+
"technologies have LCOE and CAPEX."
9497
]
9598
},
9699
{
@@ -103,7 +106,7 @@
103106
"outputs": [],
104107
"source": [
105108
"start = dt.now()\n",
106-
"scraper.scrape()\n",
109+
"processor.process()\n",
107110
"print('Processing completed in ', dt.now() - start)"
108111
]
109112
},
@@ -124,16 +127,16 @@
124127
"outputs": [],
125128
"source": [
126129
"# Save data to as a CSV\n",
127-
"scraper.to_csv('atb_data.csv')\n",
130+
"processor.to_csv('atb_data.csv')\n",
128131
"\n",
129132
"# Save flattened data to as a CSV\n",
130-
"scraper.flat_to_csv('atb_data_flat.csv')\n",
133+
"processor.flat_to_csv('atb_data_flat.csv')\n",
131134
"\n",
132135
"# Save meta data to as a CSV\n",
133-
"scraper.meta_data_to_csv('atb_meta_data.csv')\n",
136+
"processor.meta_data_to_csv('atb_meta_data.csv')\n",
134137
"\n",
135138
"# Copy data to the clipboard so it can be pasted in a spreadsheet \n",
136-
"scraper.data.to_clipboard()"
139+
"processor.data.to_clipboard()"
137140
]
138141
},
139142
{
@@ -152,7 +155,7 @@
152155
"metadata": {},
153156
"outputs": [],
154157
"source": [
155-
"data = scraper.data\n",
158+
"data = processor.data\n",
156159
"\n",
157160
"# Show available parameters\n",
158161
"print('Available parameters')\n",
@@ -184,7 +187,7 @@
184187
"name": "python",
185188
"nbconvert_exporter": "python",
186189
"pygments_lexer": "ipython3",
187-
"version": "3.11.4"
190+
"version": "3.10.12"
188191
},
189192
"vscode": {
190193
"interpreter": {

example_notebooks/Process ATB electricity technology.ipynb

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@
3131
"from datetime import datetime as dt\n",
3232
"\n",
3333
"sys.path.insert(0, os.path.dirname(os.getcwd()))\n",
34-
"from lcoe_calculator.full_scrape import FullScrape\n",
3534
"\n",
3635
"# Electricity technology processors\n",
3736
"from lcoe_calculator.tech_processors import (\n",

lcoe_calculator/README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
# ATB Calculator
2-
This code scrapes the Excel ATB data workbook, calculates LCOE and CAPEX
3-
for all technologies as needed, and exports data in flat or flat + pivoted formats.
2+
This code extracts data from the ATB Excel workbook, then calculates LCOE and CAPEX for all
3+
technologies as needed, and exports data in flat or flat and pivoted formats.
44

5-
**Note:** You will likely have to give Python access to interact with Excel. A window will automatically ask for permission the first time this script is run.
5+
**Note:** You will likely have to give Python access to interact with Excel. A window will
6+
automatically ask for permission the first time this script is run.
67

78
## Files
89
Files are listed in roughly descending order of importance and approachability.
910

10-
- `full_scrape.py` - Class that performs full scrape with built in command line interface. See the README in the root of this repo for CLI examples.
11-
- `tech_processors.py` - Classes to scrape and process individual technologies. Any new ATB technologies should be added to this file.
11+
- `process_all.py` - Class that performs processing for all ATB technologies with a built-in command line interface. See the README in the root of this repo for CLI examples.
12+
- `tech_processors.py` - Classes to process individual technologies. Any new ATB technologies should be added to this file.
1213
- `base_processor.py` - Base processor class that is subclassed to process individual technologies.
1314
- `config.py` - Constant definitions including the base year and scenario names
1415
- `extractor.py` - Code to pull values from the workbook

0 commit comments

Comments
 (0)