-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread 'main' panicked at 'Trick: if you are running on a vm' when machine is not a VM #131
Comments
Hi ! Thanks for noticing this bug. What I see in the details you provided makes me think you are running kind or a "new" version of powercap: https://docs.kernel.org/power/powercap/dtpm.html You have an extra "dtpm" folder but no RAPL domain-specific folder under intel-rapl:0 (in mainstream cases you get 0:0, :0:1 :0:2 for example). We need to investigate this, as we have never seen this mode running on other machines. I guess this is related to your pretty recent cpu (i9) too. |
Thanks for the reply! Let me know if we can do anything to help. I've also linked a gist of the full folder structure for /sys/class/powercap - which may be helpful in debugging. https://gist.github.com/DavidMChan/21037f7a126f0e6a112ab691785f0e51 |
Hey
|
Thanks for noticing, I'll push info on this thread when there is something new. |
Hey, did you manage for this issue ? I can provide you more infos or we can program a workshop in order to investigate |
Hi @DavidMChan, @jules-delecour-dav. I should be able to work on this problem in february. The issue I'll quickly run into is that I don't have access to any machine with an i9 or equivalent cpu. Would you be able to share a remote access to a server or desktop with those caracteristics to help troubleshoot ? (so, probably for february, then) Thanks |
Hi @bpetit, thanks for looking into this! Do you have time to add a quick outline of what needs to be done? (i.e. where I should start looking to adjust the ingestion of the data, and what needs to be updated...) If so, I can take a look at creating a PR for this issue which might be a bit easier than trying to get you access to one of our machines. |
Hi ! Of course. The first thing to do would surely be to :
A great extra would be to provide some docs for #137 in the mean time, as you will probably go through the steps required to be documented. Just as a reference #140 seems related, in some way. |
Hi @bpetit, i can create a vm and give you access to it |
hi @bpetit do you still need thoses access ? |
Hi, hitting the same issue (Ubuntu 20.04.3 LTS, GNU/Linux 5.13.0-28-generic x86_64). Currently using a custom PowerTOP solution (which I expect tries to read same powercap metrics?), but would like to evaluate moving to Scaphandre. Let me know if I can help in any way.
root@node:~$ lsmod | grep rapl
intel_rapl_msr 20480 0
intel_rapl_common 24576 1 intel_rapl_msr
rapl 20480 0 root@node:~$ ll /sys/class/powercap
total 0
drwxr-xr-x 2 root root 0 feb 9 10:47 ./
drwxr-xr-x 85 root root 0 feb 9 10:47 ../
lrwxrwxrwx 1 root root 0 feb 9 10:47 dtpm -> ../../devices/virtual/powercap/dtpm/
lrwxrwxrwx 1 root root 0 feb 9 10:47 intel-rapl -> ../../devices/virtual/powercap/intel-rapl/
lrwxrwxrwx 1 root root 0 feb 9 14:14 intel-rapl:0 -> ../../devices/virtual/powercap/intel-rapl/intel-rapl:0/ Can read from root@node:~$ cat /sys/class/powercap/intel-rapl/intel-rapl\:0/energy_uj
174924188972 |
Hi ! Thanks for those informations. @pingenglj @jules-delecour-dav if need access to a physical machine with an i9 cpu to reproduce this error. A vm wouldn't do it. But thanks ! |
@DavidMChan would it be, by any chance, possible to access your machine for the tests (or another with the same cpu) ? |
Sure! Can you send me an email/DM, and I can work out how to get you ssh access to a machine. |
Hi ! Thanks a lot @DavidMChan for giving me access to the testing machine. For those of you interested in this issue and having the same or related, could you give a try to the PR #198 ? @jules-delecour-dav @pingenglj I've only tested stdout and prometheus exporters for now. |
…-are-running-on-a-vm-when-machine-is-not-a-vm
Testing now |
@geertpingen Hi, thanks for the feedback. Are you running on a physical machine ? and built #198 ? Thanks |
Hi @bpetit, you are too fast 😄 I was too quick and pulled the wrong branch, it seems to work on first observation! Will do a more thorough evaluation next week. Thanks so much! If I can do any specific tests for you please let me know. user@node:~/git/scaphandre$ sudo RUST_BACKTRACE=1 ./target/debug/scaphandre stdout -t 15
Scaphandre stdout exporter
Sending ⚡ metrics
scaphandre::sensors::powercap_rapl: Couldn't find domain folders from powercap. Fallback on socket folders.
scaphandre::sensors::powercap_rapl: Scaphandre will not be able to provide per-domain data.
Measurement step is: 2s
Host: 0 W
Top 5 consumers:
Power PID Exe
No processes found yet or filter returns no value.
------------------------------------------------------------
Host: 29.331151 W
Socket0 29.331152 W |
Top 5 consumers:
Power PID Exe
1.676065 W 918 "gitlab-runner"
0.838032 W 14 "rcu_sched"
0.838032 W 780 "systemd-resolve"
0 W 1 "systemd"
0 W 2 "kthreadd" |
Great, thanks for the feedback ! Happy testing ! |
A new fresh test (with update from recent dev commits) of #198 on @DavidMChan's machine seems to work properly with stdout and prometheus exporters. |
…-at-trick-if-you-are-running-on-a-vm-when-machine-is-not-a-vm fix: first implem of fallback to using only socket powercap folders a…
Thanks @bpetit - This seems to be working on our other machines as well! (Sorry for the delayed test!) |
Thank you for the feedback ! |
Bug description
Running scaphandre on bare metal (no VM) produces the following error, even though the machine is not a virtual machine:
Adding the "--vm" argument gives (and still does not work):
Reloading the rapl kernel modules, and restarting the machine does not seem to change anything.
To Reproduce
Expected behavior
Something should happen :)
Environment
Linux Distribution: Ubuntu 18.04.5 LTS
Linux Kernel: 5.14.15-051415-generic (Also had issues with 5.04)
The text was updated successfully, but these errors were encountered: