redfishpower: single device specification for a chassis #126

chu11 · 2024-02-07T01:30:42Z

per conversation in #81,

I hadn't realized that the redfish "device" specification only defines one plug. Could we solve the above by defining a device spec for one chassis and then do plug substitution in the URIs?

OH I just realized the hostnames are the plugs. Well, then the hostname's index in the hostlist for that chassis?

chu11 · 2024-02-07T01:32:09Z

OH I just realized the hostnames are the plugs. Well, then the hostname's index in the hostlist for that chassis?

Hmmmm, I suppose this is possible. Although ... we get into some hairy stuff b/c I think I've seen some systems where they begin to index at 1 instead of 0. So now we need a special config for that.

I'm wondering if a giant config of "these URIs for these nodes" is what is needed.

garlick · 2024-02-07T01:36:19Z

For a single chassis, I think this is fairly trivial - define plug names that are the index (0 or 1 origin, whatever), then do like you did in the httppower example and put the URI in the on/off script, but substitute the plug name using %s.

garlick · 2024-02-07T01:39:19Z

Since that fits so naturally, I have to wonder how many redfishpower instances we could run concurrently if it were one per chassis in a really big system. Example: 8 slot chasiss scaled out to 8K nodes would be 1024. Maybe I'll do a quick experiment just for fun.

chu11 · 2024-02-07T02:18:18Z

Since that fits so naturally, I have to wonder how many redfishpower instances we could run concurrently if it were one per chassis in a really big system. Example: 8 slot chasiss scaled out to 8K nodes would be 1024. Maybe I'll do a quick experiment just for fun.

I was a little confused, until it occurred to me, i think you're recommending 1 chassis per redfishpower co-process? B/c I don't think we can specify a hostname and a plug on one powerman.conf line? i.e. something like

node "node1" "redfish1" "pnode1" "1"

can't be done? where the "1" is the "plug suffix" and "pnode1" is the hostname to power control.

garlick · 2024-02-07T02:39:55Z

Yes, but that is OK. You can specify a hostlist as before and just map the plugs in order. iow just specify

device "chassis0" "redfish" "redfishpower -h t[0-7] |&
node "t[0-7]" "chassis0"

chu11 · 2024-02-07T03:22:19Z

device "chassis0" "redfish" "redfishpower -h t[0-7] |&
node "t[0-7]" "chassis0"

What a second, are you specifying the chassis parent here? For the actual URI wouldn't we want something more like

device "chassis0" "redfish" "redfishpower -h t[8-15] |&
node "t[8-15]" "[0-7]"

where 0-7 are the "plugs"?

garlick · 2024-02-07T03:26:17Z

No, chassi0 is the device name, and the plugs are unspecified in the node line. They are implicitly "[0-7]".
So you can say

node "t[0-7]" "chassis0" "[0-7]"
node "t[8-15]" "chassis1" "[0-7]"

or equivalently

node "t[0-7]" "chassis0"
node "t[8-15]" "chassis1"

garlick · 2024-02-07T03:27:36Z

I had another idea about the chassis address but wanted to get this point across first.

chu11 · 2024-02-07T03:36:00Z

ohhh got it got it ... i was getting confused, yeah, all the URIs goto the same chassis.

Ugh ... maybe my prototype for #81 is a waste now ... maybe this has to be solved first.

garlick · 2024-02-07T03:49:40Z

Since the URI for the chassis power control is probably different from the slots, my thought was to have a special plug name c or something that is just mapped to a different URI than the rest of the plugs in redfishpower. If it's the last plug, e.g. "0", "1", "2", ... "7", "c" then

node "t[0-7],chassis0" "chassis0"
node "t[0-7],chassis1", "chassis1"

or equivalently

node "t[0-7],chassis0" "chassis0" "[0-7,c]"
node "t[0-7],chassis1", "chassis1" "[0-7,c]"

Maybe the "setconfig" stuff at the beginning of the device script could set the config for that special plug, including the hierarchical semantics.

garlick · 2024-02-07T04:21:38Z

If the URI is different for each blade, are we only talking to the chassis (one IP)?

Do we have an El Cap chassis to poke at? Because if we're only talking to the chassis, we don't care what nodes are in there!

chu11 · 2024-02-07T05:29:56Z

If the URI is different for each blade, are we only talking to the chassis (one IP)?

Of the one example I've seen yeah, the host is the same for each of the blades, just the suffix "path" is different (0, 1, 2, .., etc. different in each path).

garlick · 2024-02-07T05:43:15Z

For that type of a chassis I wouldn't think the hierarchical semantics we discussed would be required... The chassis probably remains responsive to queries about the nodes even when off (if it can even be turned off).

chu11 · 2024-02-07T05:58:10Z

as we go around in circles on some of this stuff, I'm beginning to think "mega-config file" is the right idea, because there's so many oddball cases with redfish.

non-bladed vs bladed
no-parents vs parents
different URI configuration for parents vs children
configuring "set" paths vs using "plugs" for the paths
different hardware in same cluster with different schemes
different vendors with different schemes in same cluster

i can't help but look at the proliferation of device files as evidence for the need.

garlick · 2024-02-07T06:18:56Z

On the first three items - I think we are zeroing in on how to do this simply without a separate config file. It seems like we have identified two cases that we may care about (but we should verify they really exist):

where there is a redfish chassis that you talk to to control the blades
where there is a redfish chassis and redfish blades, and when you turn off the chassis, the blades go off and potentially can no longer be contacted

Set vs plugs isn't an either or thing. You can set a URI template and then still substitute plugs.

On the last two items - this is what powerman does best. You can mix and match different schemes in one config. The device scripts provide the abstraction, and then you map "plugs" in each device to hostnames in the main config and powerman provides one interface to the admins.

It would feel like a design failure if we have to introduce a second config file so I think we should keep trying. Let's start by finding out exactly what we're dealing with in El Cap.

chu11 · 2024-02-07T06:35:02Z

On the last two items - this is what powerman does best.

The point on the last two items was the potential explosion of device specifications. Unlike previous device files in powerman, it seems that copy & modify the device files is going to a common pattern with redfish and some of these REST interfaces, as there are quirks in every system. And with blades and parents, we might be introducing additional quirks too. So perhaps a mega config just might be easier overall?

Bullet 3 above is the one that made me go "ugh" the most ... where we are crossing the line into different URI configs for different hosts within a single redfishpower process, so there was this ... "ugh ..."

garlick · 2024-02-07T15:05:14Z

I'm not convinced a new config file is the answer, particularly to this issue. If we could stay focused on this issue, let's look at what the admins had to do on hetchy with the following device script:

redfishpower-cray-olympus-blades.dev

This is apparently for an 8-blade chassis. They cut and pasted the same specification with all its scripts 8 times within the same .dev file and gave each spec's name a suffix like -blade0, -blade1, etc. and they (only) alter the URIs in each one, e.g.

send "setonpath redfish/v1/Chassis/Blade0/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n
send "setonpath redfish/v1/Chassis/Blade1/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
send "setonpath redfish/v1/Chassis/Blade2/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
...

Then their config looks like this:

device "redfishpower-blade0" "redfishpower-cray-olympus-blade0" "/usr/sbin/redfishpower -h hetchy-cmm[1-2] |&"
device "redfishpower-blade1" "redfishpower-cray-olympus-blade1" "/usr/sbin/redfishpower -h hetchy-cmm[1-2] |&"
device "redfishpower-blade2" "redfishpower-cray-olympus-blade2" "/usr/sbin/redfishpower -h hetchy-cmm1 |&"
device "redfishpower-blade3" "redfishpower-cray-olympus-blade3" "/usr/sbin/redfishpower -h hetchy-cmm1 |&"

### Login/Compute Blades
node "hetchy-blade1" "redfishpower-blade0" "hetchy-cmm1"
node "hetchy-blade2" "redfishpower-blade1" "hetchy-cmm1"
node "hetchy-blade3" "redfishpower-blade2" "hetchy-cmm1"
node "hetchy-blade4" "redfishpower-blade3" "hetchy-cmm1"
node "hetchy-blade5" "redfishpower-blade0" "hetchy-cmm2"
node "hetchy-blade6" "redfishpower-blade1" "hetchy-cmm2"

So I guess they have two blade chassis, one with 4 blades installed and one with 2. They really had to stand on their heads to get this set up.

IMHO there should have been one device spec for this particular chassis with 8 plugs defined. Then their config would be more intuitive, like this

device "cmm1" "redfishpower-cray-olympus-cmm" "/usr/sbin/redfishpower -h hetchy-cmm1 |&"
device "cmm2" "redfishpower-cray-olympus-cmm" "/usr/sbin/redfishpower -h hetchy-cmm2 |&"

### Login/Compute Blades
node "hetchy-blade[1-4]" "cmm1" "[0-3]"
node "hetchy-blade[5-6]" "cmm2" "[0-1]"

garlick · 2024-02-07T15:27:23Z

Incidentally they have a separate dev specification in another .dev file for the chassis itself:

redfishpower-cray-olympus-cmm.dev

It's another cut & paste, identical to the blades except for the URIs e.g.

send "setonpath redfish/v1/Chassis/Blade0/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"

Their config is:

device "redfishpower-cmm" "redfishpower-cray-olympus-cmm" "/usr/sbin/redfishpower -h hetchy-cmm[1-2] |&"

### CMMs
node "hetchy-cmm1" "redfishpower-cmm" "hetchy-cmm1"
node "hetchy-cmm2" "redfishpower-cmm" "hetchy-cmm2"

Ideally we would figure out a way to represent the chassis as another plug like c in the single .dev spec proposed above. Then they would not have any new devices, just a node config for the chassis, e.g.

### CMMs
node "hetchy-cmm1" "cmm1" "c"
node "hetchy-cmm2" "cmm2" "c"

Or even combined with the blades, e.g.

node "hetchy-blade[1-4],hetchy-cmm1" "cmm1" "[0-3,c]"
node "hetchy-blade[5-6],hetchy-cmm2" "cmm2" "[0-1,c]"

garlick · 2024-02-07T15:35:29Z

And the entire blade config, with all its internal cut & paste, is cut and paste to another .dev script for the switches

redfishpower-cray-olympus-switches.dev

In this one the URIs are like

send "setonpath redfish/v1/Chassis/Perif0/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
send "setonpath redfish/v1/Chassis/Perif1/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
send "setonpath redfish/v1/Chassis/Perif2/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
...

There doesn't seem to be chassis control for this one - not sure if that was just an omission or if there really isn't a capability. Anyway, 8 specs could be reduced to 1 with plugs.

garlick · 2024-02-07T15:39:59Z

So in summary I think the path forward is:

provide a way to optionally do %s substitution in the URIs within redfishpower
provide a way, like an alternate setonpath type command, to associate a special plug with a different URI for chassis support
optionally implement power control hierarchy support in redfish (maybe another set command to establish a parent plug) but first check and see if that actually helps with the El Cap stuff and defer if not.
build and test new dev scripts for the El Cap hardware

Edit: look at all the cut & paste this fixes! Does it go a little ways to address your concern

it seems that copy & modify the device files is going to a common pattern with redfish and some of these REST interfaces, as there are quirks in every system. And with blades and parents, we might be introducing additional quirks too

chu11 · 2024-02-07T16:04:57Z

hmmmm, I guess it's just a difference of opinion. In my mind, writing out something like the following would be easier? Now everything is in one place, vs multiple .dev files?

# not blades
[login]
login.hosts = nodes[0-7]
login.auth = ...
login.statpath = ...
login.onpath = ...

[blade]
blade.hosts = nodes[8-1024]
blade.auth = ...
blade.parent = chassis[0-63]
blade.statpath = ...%s...
blade.onpath = ...%s...
blade.chassisstatpath = ...

[chassis]
chassis.hosts = chassis[0-63]
chassis.auth = ...
chassis.statpath = ...
chassis.onpath = ...

[gateway]
gateway.hosts = other_node_type[0-7]
gateway.auth = ...
gateway.statpath = ...
gateway.onpath = ...

chu11 · 2024-02-07T16:23:21Z

look at all the cut & paste this fixes! Does it go a little ways to address your concern

Yeah. I guess here are just a few concerns:

would this approach lead to an unnecessary number of redfishpower co-procs on the system? In my mind, 16-64 is ok, but possibly 1000s?
I am also trying to think of systems that we haven't seen yet. Maybe this is me thinking too far ahead for imaginary scenarios we haven't witnessed, but I'm thinking more flexibility would be wise to engineer in now vs later. BUT ... I guess in the worst case, if there are strange systems that arrive in the future, admins could do what they are doing right now (i.e. -h node[0,8,16,32,...] is one kooky config, -h node[1,9,17,33] is another kooky config).

garlick · 2024-02-07T16:44:34Z

I'm not sure there is a problem with 1-2K coprocs or why we need to invest effort or add complexity to avoid it. See #127 - 2048 coprocs (for a fictitions 16K node system) even works in the tiny ci environment.

I am also trying to think of systems that we haven't seen yet. Maybe this is me thinking too far ahead for imaginary scenarios we haven't witnessed, but I'm thinking more flexibility would be wise to engineer in now vs later. BUT ... I guess in the worst case, if there are strange systems that arrive in the future, admins could do what they are doing right now (i.e. -h node[0,8,16,32,...] is one kooky config, -h node[1,9,17,33] is another kooky config).

I'm not sure what you're referring to here. I'd say let's stay focused on use cases we have in front of us (or that we at least can find extant somewhere).

redfishpower is essentially a powerman plugin, so it really should behave like one, not go too far off in the weeds doing its own thing (only as necessary to meet specific objectives).

chu11 · 2024-02-07T17:28:04Z

redfishpower is essentially a powerman plugin, so it really should behave like one, not go too far off in the weeds doing its own thing (only as necessary to meet specific objectives).

Good point. In my mind I might have been thinking of it like a separate utility.

garlick · 2024-02-07T19:51:42Z

Specific issues now open (#128 and #129) so let's close this one.

garlick mentioned this issue Feb 7, 2024

test a huge cray-ex configuration #127

Merged

This was referenced Feb 7, 2024

support cray EX compute chassis #128

Closed

redfishpower: need support for plug substitution in URIs #129

Closed

garlick closed this as completed Feb 7, 2024

chu11 mentioned this issue Feb 7, 2024

redfishpower: recognize hierarchies / pre-requisites #81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redfishpower: single device specification for a chassis #126

redfishpower: single device specification for a chassis #126

chu11 commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024 •

edited

Loading

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024 •

edited

Loading

chu11 commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

redfishpower: single device specification for a chassis #126

redfishpower: single device specification for a chassis #126

Comments

chu11 commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024 • edited Loading

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024 • edited Loading

chu11 commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

chu11 commented Feb 7, 2024

garlick commented Feb 7, 2024

garlick commented Feb 7, 2024 •

edited

Loading

garlick commented Feb 7, 2024 •

edited

Loading