-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support cray EX compute chassis #128
Comments
Just an addendum, the powerman.conf would subsequently look like:
|
This could also work (assumes a particular plug order):
|
New information from the sys admins:
Summarizing an offlline discussion (correct me if I'm wrong @chu11), we need something like:
Idea: add the following "set" command:
where
Example (presumes Node0 ... Node15 were added to the plug name list)
|
Problem: there is no support for the blade chassis in a Cray Shasta EX system. Fixes chaos#128
Problem: there is no support for the blade chassis in a Cray Shasta EX system. Fixes chaos#128
Problem: there is no support for the blade chassis in a Cray Shasta EX system. Fixes chaos#128
so I guess in the instructions for configuring this, we require the user to input hosts in a specific order, correct? And if they mess up, it just won't work? I guess alternately we'd need some type of "map" option?
|
Yeah, an example config would look like this:
It didn't seem too onerous to me to require the hostlist to track the internal plug order. These things will be generated by scripts for a big system anyway, and if it's wrong, it likely won't work at all. Edit: well I sort of misspoke - it happens to track the plug order in this case but the mapping is determined by the second argument to
|
Realized something while prototyping:
makes sense, Enclosure and the blades both map to index 0 as the power control address.
we actually do not need to set the paths for all the plugs I'm contemplating if we could do it by index instead? Although I see why this config design was done, as it makes more sense. If we did
it might confuse the average person b/c we're not configuring a bunch of plugs. Just debating it a bit. We'll see how prototype code falls out. |
I was trying to keep it open ended so any plug could have a custom path and/or host mapping. Maybe adding the semantics to a man page would make it less opaque ? |
@chu11 have you thought about how to handle unpopulated slots? Since the "plugs" are configured with hostlist indicies in this proposal, the hostlist should still be fully populated. I guess you could do something like |
honestly, I had forgotten about that possibility :-) thinking over some alternatives, I think your suggestion might be the best one. The person doing the powerman.conf config would just have to be a little smart about it, i.e.
(You did "unused[18-29]" above, which I assumed you typoed and mean "unused[18-19]") |
Yeah typo. We have to be sure that |
Ahh, I was going to suggest we skip any hosts with a special prefix, but implementing |
Minor design point: those were always plugs from powerman's perspective. When the plug list is not declared in the device script, it just means that any plug you put on the node declaration will be used. That case is sort of weird because when you have multiple instances of a .dev script, the plugs are unique for each one. But that was convenient abuse of the design not the design :-) |
Ahhh ... hmmm. Maybe skipping any hosts with a special "unused" prefix would work out better for backwards compatibility? Edit: Oh wait ... i guess we only have to develop a |
Right, the downside is Edit: actually I think powerman is smart and will use |
Or do we need a "status_ranged" equivalent script? I guess not implemented at the moment, but shouldn't be too hard to add ... I somehow added |
I'm guessing we don't need it but we can keep that in mind. |
Thinking about it a bit this weekend, with parent support we're doing a lot of extra messages. w/o a
if we have to query each target serially query cmmX for its status (1 total) 81 queries total if 1 query of cmmX for a total of 33 queries. but I first have to determine if I can program redfishpower to be smart. |
As an aside, looked into whether The only trickery appears to be how to get There may be corner cases I don't know about, but I think in
it should presumably work. This is pseudo-code above, b/c |
Yes but if this is true, most of the time
It feels like a premature optimization to go down that path before we have evidence that it's a problem. |
Ahhh I didn't think about the fact we could implement both in a device file, I thought it was an either / or situation, since most scripts only have one of them. |
Hmmmm, this isn't a deal breaker by any measure, but non-optimal under most circumstances the number of plugs equals the number of host indices when configuring setplugs, e.g.
the exception of course is when we do plug substitution
BUT ... we don't define plug substitution in
so in I could stick a flag in to say "hey this plug needs a double check later on .." but that's annoying to do for several reasons (most notably, when do I do the check? I don't know for sure when the login script is actually done ... unless I add a "check_my_config" option?) Going back and forth on ideas ... it's non-optimal, but livable. Edit : ... ehhh ... maybe it's not quite as annoying to add a check when a power control command is issued as I first thought, like
|
it appears from testing that if you define both Here's my testing example where blades[4-7] (and subsequently nodes[8-15]) and perif[4-7] are unpopulated.
result from
So it's expecting 17 plugs in the response. But because the redfishpower device file still configures "Blades[4-7]", "Perif[4-7]", and "Node[8-15]" plugs, way more than 17 plug status are output. My suspicion is after 17 plugs are output, all the remaining stdout is combined into that last line above. And since potential solutions
Edit: another idea, select |
Having a hard time parsing this sentence. Do you mean when Now that I think about it, why allow both I like the idea of making powerman a little bit smarter as described above. We could avoid any impact on existing device files that we distribute by deleting |
uh oh, didn't notice this line in the powerman code
an all script is used if you are targetting all hosts OR doing a query. |
What happens if we just drop the
|
Sorry, yeah that's what I was thinking.
I think it would, but lots of test failures. I'm assuming b/c a lot of devices don't define a After going down and (thus far) failing the "ignore plugs we don't care about" path ... I'm seeing promising results with diff --git a/src/powerman/device.c b/src/powerman/device.c
index 067d31b..cfc5be9 100644
--- a/src/powerman/device.c
+++ b/src/powerman/device.c
@@ -684,7 +684,7 @@ static int _enqueue_targeted_actions(Device * dev, int com, hostlist_t hl,
/* Try _all version of script.
*/
- if (all || _is_query_action(com)) {
+ if (all || (_is_query_action(com) && dev->scripts[com] == NULL)) {
int ncom = _get_all_script(dev, com); |
Ah I like it! A one line change! |
A short chat with @watson6282 yesterday, it ends up that a common scenario is for the chassis to not be fully populated. This would lead to To get around this, we could write specific device specifications that re-defines what "all" means for those non-fully populated chassis. But it makes me think that |
Hmm, yeah maybe. But I'd hold off and see if it's actually going to be needed since it further complicates an already kind of awful interface. |
Yeah, agreed. I was gonna try and hack up a test given previous comments, but I now see that it's a little hairier than i thought. Notes for later: can't create execution context e->ranged_pluglist, b/c execution context could get destroyed? Edit: i had an idea for a hack ... and somehow ipmipower tests (updated to use @@ -943,6 +946,9 @@ static bool _process_foreach(Device *dev, Action *act, ExecCtx *e)
/* we store a plug iterator in the ExecCtx */
if (e->plugitr == NULL) {
if (act->com == PM_STATUS_PLUGS_RANGED) {
+ if (!(e->pluglist = pluglist_create_from_plugs(e->plugs)))
+ fprintf(stderr, "boooooooooo\n");
+ e->plugitr = pluglist_iterator_create(e->pluglist);
}
else
e->plugitr = pluglist_iterator_create(dev->plugs); |
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
Add device file for a HPE Cray Supercomputing EX Chassis. Fixes chaos#128
The Cray Shasta Hardware Architecture includes a compute chassis we need to support with powerman + redfishpower.
As discussed in #126, it would be cleanest if we had one device specification for the entire chassis. To do this we can define "plugs" for each controllable target (8 compute nodes, 8 switches,1 chassis) and ask redfishower to substitute them into the URIs configured with
setstatpath
,setonpath
, andsetoffpath
(#129).Then the dev script can look something like this
Note: we observed on test hardware (hetchy-cmm2) that when the chassis is off
The text was updated successfully, but these errors were encountered: