fix: Better ID suffix counter (#149)

* fix: better increase id counter * fix: removed print statements * fix: typos * fix: typos * chore: consistent yaml file extension * fix: more typos
Helmholtz-AI-Energy · Nov 28, 2024 · a29fc5c · a29fc5c
1 parent f3c42a3
commit a29fc5c
Show file tree

Hide file tree

Showing 10 changed files with 48 additions and 21 deletions.
diff --git a/.github/workflows/release.yml → .github/workflows/release.yaml b/.github/workflows/release.yml → .github/workflows/release.yaml
diff --git a/.github/workflows/run_tests.yml → .github/workflows/run_tests.yaml b/.github/workflows/run_tests.yml → .github/workflows/run_tests.yaml
diff --git a/README.md b/README.md
@@ -37,7 +37,7 @@ From PyPI:
 pip install perun
 ```
 
-> Extra dependencies like nvidia-smi, rocm-smi and mpi can be installed through pip as well:
+> Extra dependencies like nvidia-smi, rocm-smi and mpi can be installed using pip as well:
 ```console
 pip install perun[nvidia, rocm, mpi]
 ```
@@ -77,7 +77,7 @@ RUN ID: 2023-08-17T13:29:29.969779
 |         0 | hkn0436.localdomain | 994.847 s | 960.469 kJ | 235.162 W   | 3.239 %    | 701.588 W   | 56.934 GB  | 27.830 W     | 0.061 %    |
 |         0 | All                 | 995.967 s | 1.921 MJ   | 466.981 W   | 3.240 %    | 1.404 kW    | 112.192 GB | 57.145 W     | 0.061 %    |
 
-The application has been run 7 times. Throught its runtime, it has used 3.128 kWh, released a total of 1.307 kgCO2e into the atmosphere, and you paid 1.02 € in electricity for it.
+The application has been run 7 times. In total, it has used 3.128 kWh, released a total of 1.307 kgCO2e into the atmosphere, and you paid 1.02 € in electricity for it.
 ```
 
 Perun will keep track of the energy of your application over multiple runs.

diff --git a/docs/install.rst b/docs/install.rst
@@ -56,7 +56,7 @@ CPU
 
 Supported backends:
 
- - CPU energy: Powercap RAPL throught `powercap <https://github.com/powercap/powercap>`_ for linux machines, supports recent Intel and AMD CPUs.
+ - CPU energy: Powercap RAPL using `powercap <https://github.com/powercap/powercap>`_ for linux machines, supports recent Intel and AMD CPUs.
  - CPU utilization: `psutil <https://github.com/giampaolo/psutil>`_
 
 Currently, cpu energy readings from perun only support linux environments with read access to the *powercap-rapl* interface, which can only be read by ``root`` on Linux 5.10 and later. If that is the case, please contact you system admin for solutions. We are currently working on alternative methods to provide energy readings.
@@ -66,14 +66,14 @@ GPU
 
 Supported backends:
 
- - NVIDIA GPU power draw: `NVIDIA NVML <https://developer.nvidia.com/nvidia-management-library-nvml>`_ through nvidia-ml-py.
- - AMD GPU power draw: `ROCM SMI <https://github.com/RadeonOpenCompute/pyrsmi>`_ through pyrsmi.
+ - NVIDIA GPU power draw: `NVIDIA NVML <https://developer.nvidia.com/nvidia-management-library-nvml>`_ using nvidia-ml-py.
+ - AMD GPU power draw: `ROCM SMI <https://github.com/RadeonOpenCompute/pyrsmi>`_ using pyrsmi.
 
 DRAM
 ~~~~
 
 Supported backends:
- - DRAM energy: Intel RAPL throught `powercap <https://github.com/powercap/powercap>`_ for linux machines.
+ - DRAM energy: Intel RAPL using `powercap <https://github.com/powercap/powercap>`_ for linux machines.
 
 Misc
 ~~~~

diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -38,15 +38,15 @@ Once your code finishes running, you will find a new directory called ``perun_re
     |         0 | hkn0436.localdomain | 994.847 s | 960.469 kJ | 235.162 W   | 3.239 %    | 701.588 W   | 56.934 GB  | 27.830 W     | 0.061 %    |
     |         0 | All                 | 995.967 s | 1.921 MJ   | 466.981 W   | 3.240 %    | 1.404 kW    | 112.192 GB | 57.145 W     | 0.061 %    |
 
-    The application has been run 7 times. Throught its runtime, it has used 3.128 kWh, released a total of 1.307 kgCO2e into the atmosphere, and you paid 1.02 € in electricity for it.
+    The application has been run 7 times. In total, it has used 3.128 kWh, released a total of 1.307 kgCO2e into the atmosphere, and you paid 1.02 € in electricity for it.
 
 
 .. note::
 
     Depending on the hardware you are running and the available interfaces, the output might look different than the one listed here. For more details on the support data sources used by perun, check the :ref:`dependencies` section
 
 
-The the text report summarizes the data gathered throught the application run by individual host, and averaging power consumption of the full runtime. Perun also makes all the raw data gathered from the hardware on an HDF5 file that is located on the same results folder. To explore the data manually, we recommend the Visual Studio Code extension `H5Web <https://marketplace.visualstudio.com/items?itemName=h5web.vscode-h5web>`_, to process it with python using `h5py <https://www.h5py.org/>`_, or to export using the :code:`perun export` subcommand (see :ref:`usage`).
+The text report summarizes the data gathered while the application was running. Perun also makes all the raw data gathered from the hardware on an HDF5 file that is located on the same results folder. To explore the data manually, we recommend the Visual Studio Code extension `H5Web <https://marketplace.visualstudio.com/items?itemName=h5web.vscode-h5web>`_, to process it with python using `h5py <https://www.h5py.org/>`_, or to export using the :code:`perun export` subcommand (see :ref:`usage`).
 
 The hdf5 file collects information over multiple runs of the application, adding a new section every time the application is executed using perun. The simplifies studying the behaviour of the application over time, make the last line in the summary report posible.
 

diff --git a/examples/torch_mnist/README.md b/examples/torch_mnist/README.md
@@ -69,7 +69,7 @@ Monitored Functions
 |         0 | train_epoch |                  5 | 8.980±1.055 s  | 433.082±11.012 W | 0.874±0.007 %  | 2.746±0.148 %      |
 |         0 | test        |                  5 | 1.098±0.003 s  | 274.947±83.746 W | 0.804±0.030 %  | 2.808±0.025 %      |
 
-The application has been run 1 times. Throught its runtime, it has used 0.012 kWh, released a total of 0.005 kgCO2e into the atmosphere, and you paid 0.00 € in electricity for it.
+The application has been run 1 times. In total, it has used 0.012 kWh, released a total of 0.005 kgCO2e into the atmosphere, and you paid 0.00 € in electricity for it.
 ```
 
 The results display data about the functions *train*, *test_epoch* and *test*. Those functions were specialy marked using the ```@monitor()``` decorator.
@@ -135,5 +135,5 @@ Monitored Functions
 |         4 | train_epoch |                  5 | 8.555±0.011 s  | 433.582±12.606 W | 0.899±0.029 %  | 2.820±0.000 %      |
 |         4 | test        |                  5 | 1.118±0.002 s  | 233.367±2.238 W  | 0.818±0.045 %  | 2.820±0.000 %      |
 
-The application has been run 2 times. Throught its runtime, it has used 0.062 kWh, released a total of 0.026 kgCO2e into the atmosphere, and you paid 0.02 € in electricity for it.
+The application has been run 2 times. In total, it has used 0.062 kWh, released a total of 0.026 kgCO2e into the atmosphere, and you paid 0.02 € in electricity for it.
 ```
diff --git a/perun/io/text_report.py b/perun/io/text_report.py
@@ -136,7 +136,7 @@ def textReport(dataNode: DataNode, mr_id: str) -> str:
         money = dataNode.metrics[MetricType.MONEY].sum  # type: ignore
         money_icon = mr_node.metadata["post-processing.price_unit"]
 
-        app_summary_str = f"Application Summary\n\nThe application has been run {n_runs} times. Throughout its runtime, it has used {e_kWh:.3f} kWh, released a total of {kgCO2:.3f} kgCO2e into the atmosphere, and you paid {money:.2f} {money_icon} in electricity for it."
+        app_summary_str = f"Application Summary\n\nThe application has been run {n_runs} times. In total, it has used {e_kWh:.3f} kWh, released a total of {kgCO2:.3f} kgCO2e into the atmosphere, and you paid {money:.2f} {money_icon} in electricity for it."
     else:
         app_summary_str = f"The application has been run {n_runs} times."
 

diff --git a/perun/processing.py b/perun/processing.py
@@ -589,7 +589,7 @@ def addRunAndRuntimeInfoToRegion(region: Region):
 def getInterpolatedValues(
     t: np.ndarray, x: np.ndarray, start: np.number, end: np.number
 ) -> Tuple[np.ndarray, np.ndarray]:
-    """Filter timeseries with a start and end limit, and interpolate the values at the edges.
+    """Extract a time range out of a time series, and interpolate the values at the edges.
 
     Parameters
     ----------
@@ -598,14 +598,14 @@ def getInterpolatedValues(
     x : np.ndarray
         Original values
     start : np.number
-        Start of the region of interest
+        Start of the roi
     end : np.number
         End of the roi
 
     Returns
     -------
-    np.ndarray
-        ROI values
+    Tuple[np.ndarray, np.ndarray]
+        Tuple with the new time steps and values.
     """
     new_t = np.concatenate([[start], t[np.all([t >= start, t <= end], axis=0)], [end]])
     new_x = np.interp(new_t, t, x)  # type: ignore

diff --git a/perun/util.py b/perun/util.py
@@ -142,12 +142,15 @@ def increaseIdCounter(existing: List[str], newId: str) -> str:
         newId with an added counter if any matches were found.
     """
     exp = re.compile(r"^" + newId + r"(_\d+)?$")
-    count = len(list(filter(lambda x: exp.match(x), existing)))
-    if count > 0:
-        if f"{newId}_{count}" in existing:
-            return f"{newId}_{count + 1}"
-        else:
-            return f"{newId}_{count}"
+    matches: List[re.Match] = list(
+        filter(lambda m: isinstance(m, re.Match), map(lambda x: exp.match(x), existing))  # type: ignore
+    )
+    if len(matches) > 0:
+        existing_idxs = list(
+            sorted(map(lambda m: int(m.group(1)[1:]) if m.group(1) else 0, matches))
+        )
+        highest_idx = existing_idxs[-1]
+        return newId + f"_{highest_idx + 1}"
     else:
         return newId
 

diff --git a/tests/perun/test_util.py b/tests/perun/test_util.py
@@ -43,6 +43,30 @@ def test_increaseIdCounter_existing_ids_with_suffix():
     assert result == "test_4"
 
 
+def test_increaseIdCounter_existing_ids_with_missing_entries():
+    existing = ["test_1", "test", "test_3", "test_10"]
+    newId = "test"
+    result = increaseIdCounter(existing, newId)
+    assert result == "test_11"
+
+    existing = ["test_10"]
+    newId = "test"
+    result = increaseIdCounter(existing, newId)
+    assert result == "test_11"
+
+
+def test_increaseIdCounter_double_suffix():
+    existing = ["test_1", "test_2", "test_3", "test_3_1", "test_3_2"]
+    newId = "test_2"
+    result = increaseIdCounter(existing, newId)
+    assert result == "test_2_1"
+
+    existing = ["test_1", "test_2", "test_3", "test_3_1", "test_3_2"]
+    newId = "test_3"
+    result = increaseIdCounter(existing, newId)
+    assert result == "test_3_3"
+
+
 def test_filter_sensors_no_filters():
     sensors = {
         "sensor1": ("backend1",),