Refactor merge.py and add tests for it by naved001 · Pull Request #165 · CCI-MOC/openshift-usage-scripts

naved001 · 2025-11-12T19:57:05Z

This refactors merge.py a bit, by moving some things into functions. Additionally it adds some basic tests and this time I switched to using pytest.

I ended up working on this because I realized I was adding more stuff in https://github.com/CCI-MOC/openshift-usage-scripts/pull/164/files and there were no tests.

so it's easier to read what's going on

Add tests for some of the important functions in merge.py. Since I wrote the tests for pytest, I switched to using pytest as the runner for the rest of the tests.

QuanMPhm

I have a few small questions before I approve

QuanMPhm · 2025-11-18T15:21:05Z

openshift_metrics/merge.py

+            if cluster_name is None:
+                cluster_name = metrics_from_file.get("cluster_name")


Is there any concern that cluster_name is not being checked? What if the provided files are from different clusters. It seems this behavior has been in the code prior to this refactoring, but wanted to ask just in case

I don't think we checked that cluster_name could be different in different files, so this behavior is unchanged. But it doesn't hurt to add that additional check. I'll add that in a different PR.

QuanMPhm · 2025-11-18T15:22:42Z

openshift_metrics/merge.py

-            gpu_a100sxm4=rates_data.get_value_at(
+            cpu=rates_data.get_value_at("CPU SU Rate", report_month, Decimal),  # type: ignore
+            gpu_a100=rates_data.get_value_at("GPUA100 SU Rate", report_month, Decimal),  # type: ignore
+            gpu_a100sxm4=rates_data.get_value_at(  # type: ignore


Are you using a linter additional to the ruff that we use in the CI? I didn't got any pre-commit errors when removing these comments

I think it was vscode yelling at me so I put these, but I am going to remove these.

QuanMPhm · 2025-11-18T15:38:02Z

openshift_metrics/merge.py

+        with open(file, "r") as jsonfile:
+            metrics_from_file = json.load(jsonfile)
+            cpu_request_metrics = metrics_from_file["cpu_metrics"]
+            memory_request_metrics = metrics_from_file["memory_metrics"]
+            gpu_request_metrics = metrics_from_file.get("gpu_metrics", None)
+            processor.merge_metrics("cpu_request", cpu_request_metrics)
+            processor.merge_metrics("memory_request", memory_request_metrics)
+            if gpu_request_metrics is not None:
+                processor.merge_metrics("gpu_request", gpu_request_metrics)


Minor suggestion, but I think this is more concise:

Suggested change

with open(file, "r") as jsonfile:

metrics_from_file = json.load(jsonfile)

cpu_request_metrics = metrics_from_file["cpu_metrics"]

memory_request_metrics = metrics_from_file["memory_metrics"]

gpu_request_metrics = metrics_from_file.get("gpu_metrics", None)

processor.merge_metrics("cpu_request", cpu_request_metrics)

processor.merge_metrics("memory_request", memory_request_metrics)

if gpu_request_metrics is not None:

processor.merge_metrics("gpu_request", gpu_request_metrics)

for resource in ["cpu_metrics", "memory_metrics", "gpu_metrics"]:

if resource == "gpu_metrics":

if gpu_request_metrics := metrics_from_file.get(resource):

processor.merge_metrics(resource, gpu_request_metrics)

else:

request_metrics = metrics_from_file[resource]

processor.merge_metrics(resource, request_metrics)

If cpu_metrics and memory_metrics is always present, is it fine to make the loop even simpler?

for resource in ["cpu_metrics", "memory_metrics", "gpu_metrics"]: if request_metrics := metrics_from_file.get(resource): processor.merge_metrics(resource, request_metrics)

good suggestion

uh, while this is a good suggestion, due to the old clumsy naming of things it'll break stuff. See, the files put things in cpu_metrics but the processors call it cpu_request so our neat little loop won't work. I am going to leave this as is.

Ah sorry, the strings looked similar and I thought they were the same

naved001 · 2025-11-19T21:59:09Z

openshift_metrics/merge.py

+    return processor
+
+
+def load_metadata(files: List[str]) -> MetricsMetadata:


I think I could load data and metadata in a single loop instead of loading files twice. And for that I reason I don't like what I've done. I am going to refactor it again later.

naved001 requested review from QuanMPhm, jimmysway and knikolla November 12, 2025 19:57

naved001 mentioned this pull request Nov 13, 2025

Make setting prometheus query interval minute configurable #164

Merged

naved001 added 5 commits November 17, 2025 14:13

This refactors merge.py into smaller functions

fbff4fd

so it's easier to read what's going on

Change the default names to be the ones we use for s3

73997cf

Add tests for merge.py

35028fa

Add tests for some of the important functions in merge.py. Since I wrote the tests for pytest, I switched to using pytest as the runner for the rest of the tests.

Fix formatting issue

29ef881

Update tests to work with setting interval_minutes

6d219b4

naved001 force-pushed the refactor-merge-py branch from 5fbc5c6 to 6d219b4 Compare November 17, 2025 22:05

QuanMPhm reviewed Nov 18, 2025

View reviewed changes

Remove vscode specific linting directives

717ddb2

naved001 commented Nov 19, 2025

View reviewed changes

naved001 marked this pull request as draft November 19, 2025 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor merge.py and add tests for it#165

Refactor merge.py and add tests for it#165
naved001 wants to merge 6 commits intoCCI-MOC:mainfrom
naved001:refactor-merge-py

naved001 commented Nov 12, 2025

Uh oh!

QuanMPhm left a comment

Uh oh!

QuanMPhm Nov 18, 2025

Uh oh!

naved001 Nov 18, 2025

Uh oh!

QuanMPhm Nov 18, 2025

Uh oh!

naved001 Nov 18, 2025

Uh oh!

QuanMPhm Nov 18, 2025

Uh oh!

naved001 Nov 18, 2025

Uh oh!

naved001 Nov 18, 2025

Uh oh!

QuanMPhm Nov 18, 2025

Uh oh!

naved001 Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if cluster_name is None:
		cluster_name = metrics_from_file.get("cluster_name")

		return processor


		def load_metadata(files: List[str]) -> MetricsMetadata:

Conversation

naved001 commented Nov 12, 2025

Uh oh!

QuanMPhm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants