Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add power_envelope & soc_power sensors #112

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

trindenau
Copy link
Contributor

@trindenau trindenau commented Dec 23, 2024

    Add `power_envelope` and `soc_power` sensors.
    Both will derive their values from `dbugfs`.
    testd by: ipmitool -I ipmb sensors:
        soc_power        | 22.000     | Watts      | ok    | na        | 5.000     | na        | na        | na        | na
        power_envelope   | 65.000     | Watts      | ok    | na        | na        | 10.000    | 150.000   | na        |na
 Or by running the TestRedFishSensorSchema test.
 Fixes jira https://redmine.mellanox.com/issues/4016386.

lanserv/mellanox-bf/mlx-bf-base.emu Outdated Show resolved Hide resolved
endsdr


#ddr_temp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, extra space and the comment should be #power_envelope. The comment can also be omitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lanserv/mellanox-bf/set_emu_param.sh Outdated Show resolved Hide resolved
@AdiFogel
Copy link
Collaborator

Please add more description in the commit message

@trindenau trindenau force-pushed the addNewSensors branch 2 times, most recently from 404f4f9 to 77c26c4 Compare December 23, 2024 14:46
@trindenau trindenau changed the title Add 2 new sensors Add power_envelope & soc_power sensors Jan 1, 2025
@odrang
Copy link
Collaborator

odrang commented Jan 1, 2025

@trindenau - please update commit message

# Get SOC power info #
###################################
#If the file doesn’t exists try to load the module
SOC_POWER_PATH="/sys/kernel/debug/mlxbf-ptm/monitors/status/total_power"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trindenau - is this path defined in Yochai arch doc? if not we need to make sure he define the path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right path according to the arch doc

#If the file doesn’t exists try to load the module
SOC_POWER_PATH="/sys/kernel/debug/mlxbf-ptm/monitors/status/total_power"
if [ ! -f "$SOC_POWER_PATH" ]; then
modprobe mlxbf-ptm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trindenau - arch state the we need to load the driver? what happens in secure Linux where we can't load it or driver does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't I added it.
Do you want to get read of that or find a way to add it to secure linux?

lanserv/mellanox-bf/set_emu_param.sh Show resolved Hide resolved
else
soc_power=$(cat "$SOC_POWER_PATH")
# Remove all the number after the decimal point – it can cause issues in the ipmb
if ! [[ "$soc_power" =~ ^-?[0-9]+(\.[0-9]+)?$ ]]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trindenau:

  • power can be negative? do we have such requirement?
  • the power reading is in decimal? not hex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The power is in decimal. and I got read of the minus

###################################
# Get power envelope info #
###################################
POWER_ENVELOPE_PATH="/sys/kernel/debug/mlxbf-ptm/monitors/status/power_envelope"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trindenau - same as above, this need to be defined in the arch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right path according to the arch doc

POWER_ENVELOPE_PATH="/sys/kernel/debug/mlxbf-ptm/monitors/status/power_envelope"
if [ ! -f "$POWER_ENVELOPE_PATH" ]; then
#the module loaded in the soc_power routine
echo "Error: power_envelope file still not found after loading module"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trindenau - indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

###################################
POWER_ENVELOPE_PATH="/sys/kernel/debug/mlxbf-ptm/monitors/status/power_envelope"
if [ ! -f "$POWER_ENVELOPE_PATH" ]; then
#the module loaded in the soc_power routine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trindenau - this comment not clear? what does it means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got read of that

remove_sensor "soc_power"
else
soc_power=$(cat "$SOC_POWER_PATH")
# Remove all the number after the decimal point – it can cause issues in the ipmb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trindenau - this is only a check it doesn't remove anything?

Copy link
Contributor Author

@trindenau trindenau Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

remove_sensor "power_envelope"
else
power_envelope=$(cat "$POWER_ENVELOPE_PATH")
# Remove all the number after the decimal point – it can cause issues in the ipmb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trindenau - same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@trindenau trindenau force-pushed the addNewSensors branch 6 times, most recently from b0d700d to 297b205 Compare January 2, 2025 06:56
    Add `power_envelope` and `soc_power` sensors.
    Both will derive their values from `dbugfs`.
    testd by: ipmitool -I ipmb sensors:
	soc_power        | 22.000     | Watts      | ok    | na        | 5.000     | na        | na        | na        | na
        power_envelope   | 65.000     | Watts      | ok    | na        | na        | 10.000    | 150.000   | na        |na
 The change also tests whether the system is running a locked-down version of Linux. If it is not, the mlxbf_pt module will be loaded.
 Fixes jira https://redmine.mellanox.com/issues/4016386.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants