You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems the parameters --num-workers and --worker-cores are not working.
When I launch my script with --num-workers=1, the %CPU seen through the Linux top command is much bigger than 100%.
When I change only the Y variable, the runtime difference does not change as I expected. Also I didn't find under dmlc-core code where --worker-cores is being used. Any idea?
Also, given I am using a python script, how do you track the details of a distributed script (e.g., number of tasks per workers, etc.)? I know that by using top command I can see the number of workers but I don't know how to get extra information.
Thanks!
The text was updated successfully, but these errors were encountered:
@merleyc Hi, the CPU utilization rate reported by top on a multi-core machine can be more than 100%, because it represents the workload ratio over a single core. i.e., if you have two cores running at 80%, it will report 160%. You may check this out for a more detailed explaination.
If you did not observe a significant speedup, it can be caused by various reasons, including your algorithm, communication medium etc...
Personally I use htop to monitor the workload at different cores on the same worker. If the number of active cores matches with your worker-cores, your script should work fine.
I am using dmlc-submit to submit the job like this:
./dmlc-core/tracker/dmlc-submit --cluster=local --num-workers=X --worker-cores=Y python myscript.py
It seems the parameters --num-workers and --worker-cores are not working.
When I launch my script with --num-workers=1, the %CPU seen through the Linux top command is much bigger than 100%.
When I change only the Y variable, the runtime difference does not change as I expected. Also I didn't find under dmlc-core code where --worker-cores is being used. Any idea?
Also, given I am using a python script, how do you track the details of a distributed script (e.g., number of tasks per workers, etc.)? I know that by using top command I can see the number of workers but I don't know how to get extra information.
Thanks!
The text was updated successfully, but these errors were encountered: