Skip to content

Commit bd48e02

Browse files
committed
docs: mlflow example
1 parent 5b41ece commit bd48e02

File tree

6 files changed

+24
-4
lines changed

6 files changed

+24
-4
lines changed

CITATION.cff

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ authors:
1010
family-names: Gutiérrez Hermosillo Muriedas
1111
1212
affiliation: >-
13-
Scientific Computing Centre, Karlsruhe Institute für
13+
Scientific Computing Center, Karlsruhe Institute für
1414
Technologie
1515
orcid: 'https://orcid.org/0000-0001-8439-7145'
1616
repository-code: 'https://github.com/Helmholtz-AI-Energy/perun'

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ mpirun -n 8 perun monitor path/to/your/script.py
116116

117117
## Docs
118118

119-
To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/).
119+
To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/) or check the [examples](https://github.com/Helmholtz-AI-Energy/perun/tree/main/examples).
120120

121121
## Citing perun
122122

docs/data.rst

+5-1
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,12 @@
33
Data
44
====
55

6-
perun structures the data collected as a tree, with the root node containing the aggregated data of an indiviudal run of the application, and the nodes further down the tree contain the information of the indivdual compute nodes, devices and *sensors*. *Sensors* are meant as the individual API that perun uses to gather measurements, and a single API can provide information about multiple devices, and multiple devices can be monitored from multiple APIs.
6+
perun structures the data collected as a tree, with the root node containing the aggregated data of an indiviudal run of the application, and the nodes further down the tree contain the information of the indivdual compute nodes, devices and *sensors*. *Sensors* are meant as the individual values that can be collected from the distinct monitoring backends.
77

88
.. image:: images/data_structure.png
99

1010
Each node in the data structure, once the raw data at the bottom has been processed, contain a set of summarized metrics based on the data that was collected by its sub-nodes, and a metadata dictionary with any information that could be obtained by the application, node, device or API.
11+
12+
Each node contains a list of metrics or stats, which represent the accumulated data. As well as metadata.
13+
14+
The nodeType attribute indiciates the type of object in the hierarchy this nodes represents. At the lowest level, the leafs of the tree, you would have individual "sensors", values collected by a single device or interface. Higher up the tree, the data nodes represent groups of devices and computational nodes. The three bottom levels of the tree represent the hardware. Further up the three, data starts being acumulated by individual runs of the application, with "run" being a single execution of the application, a "multi_run" is the data from multiple runs when perun is run with the ```--rounds N``` option, and at the highest level, the root of the tree is the application itself.

docs/images/data_structure.png

33.4 KB
Loading

examples/mlflow/README.md

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# perun + MLFLow
2+
3+
If you are already using monitoring tools like MLFlow, you might want to add the data collected by perun to enhance the already existing data. This can be done easily with the ```@register_callback``` decorator. An example is shown in the train.py file:
4+
5+
```python
6+
@register_callback
7+
def perun2mlflow(node):
8+
mlflow.start_run(active_run.info.run_id)
9+
for metricType, metric in node.metrics.items():
10+
name = f"{metricType.value}"
11+
mlflow.log_metric(name, metric.value)
12+
```
13+
14+
Functions decorated by ```@register_callback``` takes only one argument, ```node```. The node object is an instance of ```perun.data_model.data.DataNode```, which is a tree structure that contains all the data collected while monitoring the current script. Each node contains the accumulated data of the sub-nodes in the ```metrics``` dictionary. Each metric object contains all the metadata relevant to the value and the value itself. In the example above, the summarized values for power, energy and hardware utilization are being submitted as metrics to the mlflow tracking system.
15+
16+
For more information on the data node object, [check our docs](https://perun.readthedocs.io/en/latest/data.html)

perun/processing.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ def processDataNode(
399399
pue = perunConfig.getfloat("post-processing", "pue")
400400
emissions_factor = perunConfig.getfloat("post-processing", "emissions_factor")
401401
price_factor = perunConfig.getfloat("post-processing", "price_factor")
402-
total_energy = dataNode.metrics[MetricType.ENERGY].value * pue
402+
total_energy = dataNode.metrics[MetricType.ENERGY].value * pue # type: ignore
403403
dataNode.metrics[MetricType.ENERGY].value = total_energy # type: ignore
404404
e_kWh = total_energy / (3600 * 1e3)
405405

0 commit comments

Comments
 (0)