Adding Billing Rates to the config.py and rewriting references#159
Adding Billing Rates to the config.py and rewriting references#159jimmysway wants to merge 6 commits intoCCI-MOC:mainfrom
Conversation
39e759b to
b1b589e
Compare
openshift_metrics/config.py
Outdated
| RATE_GPU_H100_SU = os.getenv("RATE_GPU_H100_SU", "6.04") | ||
|
|
||
| # Legacy rate (for backward compatibility) | ||
| GPU_A100_RATE = os.getenv("GPU_A100_RATE", "1.803") |
There was a problem hiding this comment.
Where is this constant used?
There was a problem hiding this comment.
There was a problem hiding this comment.
I couldn't find it. A text search across your PR doesn't show any reference to GPU_A100_RATE
openshift_metrics/config.py
Outdated
| SU_NAMES = ["GPUV100", "GPUA100", "GPUA100SXM4", "GPUH100", "CPU"] | ||
| RESOURCE_NAMES = ["vCPUs", "RAM", "GPUs"] |
There was a problem hiding this comment.
Yes, exactly. Let's not touch these now for now. The whole SU types and their names will require some more thought.
There was a problem hiding this comment.
@jimmysway This comment is not addressed. These constants are not accessed anywhere in your PR
openshift_metrics/merge.py
Outdated
| parser.add_argument("--rate-cpu-su", type=Decimal, default=Decimal(RATE_CPU_SU)) | ||
| parser.add_argument( | ||
| "--rate-gpu-v100-su", type=Decimal, default=Decimal(RATE_GPU_V100_SU) | ||
| ) | ||
| parser.add_argument( | ||
| "--rate-gpu-a100sxm4-su", type=Decimal, default=Decimal(RATE_GPU_A100SXM4_SU) | ||
| ) | ||
| parser.add_argument( | ||
| "--rate-gpu-a100-su", type=Decimal, default=Decimal(RATE_GPU_A100_SU) | ||
| ) | ||
| parser.add_argument( | ||
| "--rate-gpu-h100-su", type=Decimal, default=Decimal(RATE_GPU_H100_SU) | ||
| ) |
There was a problem hiding this comment.
If the point of this PR is to have configuration done through env vars, I would suggest removing these parser arguments since they are now redundant.
openshift_metrics/config.py
Outdated
| RATE_CPU_SU = os.getenv("RATE_CPU_SU", "0.013") | ||
| RATE_GPU_V100_SU = os.getenv("RATE_GPU_V100_SU", "1.214") | ||
| RATE_GPU_A100SXM4_SU = os.getenv("RATE_GPU_A100SXM4_SU", "2.078") | ||
| RATE_GPU_A100_SU = os.getenv("RATE_GPU_A100_SU", "1.803") | ||
| RATE_GPU_H100_SU = os.getenv("RATE_GPU_H100_SU", "6.04") |
There was a problem hiding this comment.
@naved001 Do we still want keep the old behavior of having the rates None if not provided?
There was a problem hiding this comment.
Yeah, good point. If someone were to forget setting I'd rather the code fail than use some default values for rates.
|
@jimmysway have you had a chance to address the comments in this PR? |
…and rewriting code references. Removed CLI arguments for setting rates
b1b589e to
2967310
Compare
|
Sorry I had them sitting in my local branch. I just merged with upstream |
|
@jimmysway There are merge conflicts in the PR |
openshift_metrics/config.py
Outdated
| SU_NAMES = ["GPUV100", "GPUA100", "GPUA100SXM4", "GPUH100", "CPU"] | ||
| RESOURCE_NAMES = ["vCPUs", "RAM", "GPUs"] |
There was a problem hiding this comment.
@jimmysway This comment is not addressed. These constants are not accessed anywhere in your PR
openshift_metrics/config.py
Outdated
| USE_NERC_RATES_ENV = os.getenv("USE_NERC_RATES") | ||
| USE_NERC_RATES = USE_NERC_RATES_ENV.lower() == "true" if USE_NERC_RATES_ENV else None |
There was a problem hiding this comment.
I would suggest just having USE_NERC_RATES and removing USE_NERC_RATES_ENV. If the user does not set USE_NERC_RATES, that should be equivalent to setting the env var to false:
| USE_NERC_RATES_ENV = os.getenv("USE_NERC_RATES") | |
| USE_NERC_RATES = USE_NERC_RATES_ENV.lower() == "true" if USE_NERC_RATES_ENV else None | |
| USE_NERC_RATES = os.getenv("USE_NERC_RATES", "false").lower() == "true" |
Do you have a specific reason for why these two env vars needs to be defined seperately?
openshift_metrics/config.py
Outdated
| RATE_GPU_H100_SU = os.getenv("RATE_GPU_H100_SU") | ||
|
|
||
| # Legacy rate (for backward compatibility) | ||
| GPU_A100_RATE = os.getenv("GPU_A100_RATE") |
There was a problem hiding this comment.
I still don't see where this constant is used
openshift_metrics/merge.py
Outdated
| if USE_NERC_RATES is None: | ||
| raise ValueError( | ||
| "USE_NERC_RATES environment variable must be set to 'true' or 'false'" | ||
| ) |
There was a problem hiding this comment.
Following from my comment above, this if clause should also be removed. I think USE_NERC_RATES not being set should be interpreted as False
openshift_metrics/merge.py
Outdated
| report_start_date_dt = datetime.strptime(report_start_date, "%Y-%m-%d") | ||
| report_end_date_dt = datetime.strptime(report_end_date, "%Y-%m-%d") | ||
| ignore_hours = outage_data.get_outages_during( | ||
| report_start_date, report_end_date, cluster_name | ||
| report_start_date_dt, report_end_date_dt, cluster_name |
There was a problem hiding this comment.
The current upstream code runs fine with report_start_date and report_end_date. Is there a reason you're converting them to datetime objects here?
| if RATE_CPU_SU is None: | ||
| raise ValueError("RATE_CPU_SU environment variable must be set") | ||
| if RATE_GPU_V100_SU is None: | ||
| raise ValueError("RATE_GPU_V100_SU environment variable must be set") | ||
| if RATE_GPU_A100SXM4_SU is None: | ||
| raise ValueError("RATE_GPU_A100SXM4_SU environment variable must be set") | ||
| if RATE_GPU_A100_SU is None: | ||
| raise ValueError("RATE_GPU_A100_SU environment variable must be set") | ||
| if RATE_GPU_H100_SU is None: | ||
| raise ValueError("RATE_GPU_H100_SU environment variable must be set") | ||
|
|
There was a problem hiding this comment.
In the upstream version of the code, an error is already raised by the Rates dataclass if the SU rates are None. I suppose these a bit more informative, but I would argue they are not entirely necessary and leads to more maintenance burden down the line.
There was a problem hiding this comment.
Just tapping back into this issue sorry about the delay. I don't see where the Rates dataclass validates? But yeah all the other comments I think I've addressed
| report_start_date_dt = datetime.strptime(report_start_date, "%Y-%m-%d").replace( | ||
| tzinfo=UTC | ||
| ) | ||
| report_end_date_dt = datetime.strptime(report_end_date, "%Y-%m-%d").replace( |
There was a problem hiding this comment.
If you want to indicate that report_start_date and report_end_date are datetime objects, I would suggest using type annotation instead, which I think would be simpler to interpret than having to rename references and assignements.
| report_start_date_dt = datetime.strptime(report_start_date, "%Y-%m-%d").replace( | |
| tzinfo=UTC | |
| ) | |
| report_end_date_dt = datetime.strptime(report_end_date, "%Y-%m-%d").replace( | |
| report_start_date : datetime = datetime.strptime(report_start_date, "%Y-%m-%d").replace( | |
| tzinfo=UTC | |
| ) | |
| report_end_date : datetime = datetime.strptime(report_end_date, "%Y-%m-%d").replace( | |
| tzinfo=UTC | |
| ) |
Is there another reason why you renamed these variables?
openshift_metrics/merge.py
Outdated
| if report_start_date_dt.month != report_end_date_dt.month: | ||
| logger.warning("The report spans multiple months") | ||
| report_month += " to " + datetime.strftime(report_end_date_dt, "%Y-%m") |
There was a problem hiding this comment.
This if clause is no longer needed. It will cause an error with the call to nerc-rates in get_su_definitions
The USE_NERC_RATES None check is gone since it’s always a bool. The multi‑month block no longer changes report_month. The outage call now passes date strings to match the function.
ba84a8a to
4833009
Compare
Part 2 of #144