forked from daniel-thom/HPC
-
Notifications
You must be signed in to change notification settings - Fork 0
/
slurm_variables
303 lines (189 loc) · 11 KB
/
slurm_variables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
INPUT ENVIRONMENT VARIABLES
Upon startup, sbatch will read and handle the options set in the following environment variables. Note that environment variables will over‐
ride any options set in a batch script, and command line options will override any environment variables.
SBATCH_ACCOUNT Same as -A, --account
SBATCH_ACCTG_FREQ Same as --acctg-freq
SBATCH_ARRAY_INX Same as -a, --array
SBATCH_BATCH Same as --batch
SBATCH_CLUSTERS or SLURM_CLUSTERS
Same as --clusters
SBATCH_CONSTRAINT Same as -C, --constraint
SBATCH_CORE_SPEC Same as --core-spec
SBATCH_CPUS_PER_GPU Same as --cpus-per-gpu
SBATCH_DEBUG Same as -v, --verbose
SBATCH_DELAY_BOOT Same as --delay-boot
SBATCH_DISTRIBUTION Same as -m, --distribution
SBATCH_EXCLUSIVE Same as --exclusive
SBATCH_EXPORT Same as --export
SBATCH_GET_USER_ENV Same as --get-user-env
SBATCH_GPUS Same as -G, --gpus
SBATCH_GPU_BIND Same as --gpu-bind
SBATCH_GPU_FREQ Same as --gpu-freq
SBATCH_GPUS_PER_NODE Same as --gpus-per-node
SBATCH_GPUS_PER_TASK Same as --gpus-per-task SBATCH_GRES Same as --gres
SBATCH_GRES_FLAGS Same as --gres-flags
SBATCH_HINT or SLURM_HINT
Same as --hint
SBATCH_IGNORE_PBS Same as --ignore-pbs
SBATCH_JOB_NAME Same as -J, --job-name
SBATCH_MEM_BIND Same as --mem-bind
SBATCH_MEM_PER_CPU Same as --mem-per-cpu
SBATCH_MEM_PER_GPU Same as --mem-per-gpu
SBATCH_MEM_PER_NODE Same as --mem
SBATCH_NETWORK Same as --network
SBATCH_NO_KILL Same as -k, --no-kill
SBATCH_NO_REQUEUE Same as --no-requeue
SBATCH_OPEN_MODE Same as --open-mode
SBATCH_OVERCOMMIT Same as -O, --overcommit
SBATCH_PARTITION Same as -p, --partition
SBATCH_POWER Same as --power
SBATCH_PROFILE Same as --profile
SBATCH_QOS Same as --qos
SBATCH_RESERVATION Same as --reservation
SBATCH_REQ_SWITCH When a tree topology is used, this defines the maximum count of switches desired for the job allocation and optionally
the maximum time to wait for that number of switches. See --switches
SBATCH_REQUEUE Same as --requeue
SBATCH_SIGNAL Same as --signal
SBATCH_SPREAD_JOB Same as --spread-job
SBATCH_THREAD_SPEC Same as --thread-spec
SBATCH_TIMELIMIT Same as -t, --time
SBATCH_USE_MIN_NODES Same as --use-min-nodes
SBATCH_WAIT Same as -W, --wait
SBATCH_WAIT_ALL_NODES Same as --wait-all-nodes
SBATCH_WAIT4SWITCH Max time waiting for requested switches. See --switches
SBATCH_WCKEY Same as --wckey
SLURM_CONF The location of the Slurm configuration file.
SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm error occurs (e.g. invalid options). This can be used by a script to dis‐
tinguish application exit codes from various Slurm error conditions.
SLURM_STEP_KILLED_MSG_NODE_ID=ID
If set, only the specified node will log when the job or step are killed by a signal.
OUTPUT ENVIRONMENT VARIABLES
The Slurm controller will set the following variables in the environment of the batch script.
SBATCH_MEM_BIND
Set to value of the --mem-bind option.
SBATCH_MEM_BIND_LIST
Set to bit mask used for memory binding.
SBATCH_MEM_BIND_PREFER
Set to "prefer" if the --mem-bind option includes the prefer option.
SBATCH_MEM_BIND_TYPE
Set to the memory binding type specified with the --mem-bind option. Possible values are "none", "rank", "map_map", "mask_mem" and
"local".
SBATCH_MEM_BIND_VERBOSE
Set to "verbose" if the --mem-bind option includes the verbose option. Set to "quiet" otherwise.
SLURM_*_HET_GROUP_#
For a heterogeneous job allocation, the environment variables are set separately for each component.
SLURM_ARRAY_TASK_COUNT
Total number of tasks in a job array.
SLURM_ARRAY_TASK_ID
Job array ID (index) number.
SLURM_ARRAY_TASK_MAX
Job array's maximum ID (index) number.
SLURM_ARRAY_TASK_MIN
Job array's minimum ID (index) number.
SLURM_ARRAY_TASK_STEP
Job array's index step size.
SLURM_ARRAY_JOB_ID
Job array's master job ID number.
SLURM_CLUSTER_NAME
Name of the cluster on which the job is executing.
SLURM_CPUS_ON_NODE
Number of CPUS on the allocated node.
SLURM_CPUS_PER_GPU
Number of CPUs requested per allocated GPU. Only set if the --cpus-per-gpu option is specified.
SLURM_CPUS_PER_TASK
Number of cpus requested per task. Only set if the --cpus-per-task option is specified.
SLURM_DIST_PLANESIZE
Plane distribution size. Only set for plane distributions. See -m, --distribution.
SLURM_DISTRIBUTION
Same as -m, --distribution
SLURM_EXPORT_ENV
Same as -e, --export.
SLURM_GPUS
Number of GPUs requested. Only set if the -G, --gpus option is specified.
SLURM_GPU_BIND
Requested binding of tasks to GPU. Only set if the --gpu-bind option is specified.
SLURM_GPU_FREQ
Requested GPU frequency. Only set if the --gpu-freq option is specified.
SLURM_GPUS_PER_NODE
Requested GPU count per allocated node. Only set if the --gpus-per-node option is specified.
SLURM_GPUS_PER_SOCKET
Requested GPU count per allocated socket. Only set if the --gpus-per-socket option is specified.
SLURM_GPUS_PER_TASK
Requested GPU count per allocated task. Only set if the --gpus-per-task option is specified.
SLURM_GTIDS
Global task IDs running on this node. Zero origin and comma separated.
SLURM_JOB_ACCOUNT
Account name associated of the job allocation.
SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
The ID of the job allocation.
SLURM_JOB_CPUS_PER_NODE
Count of processors available to the job on this node. Note the select/linear plugin allocates entire nodes to jobs, so the value indi‐
cates the total count of CPUs on the node. The select/cons_res plugin allocates individual processors to jobs, so this number indicates
the number of processors on this node allocated to the job.
SLURM_JOB_DEPENDENCY
Set to value of the --dependency option.
SLURM_JOB_NAME
Name of the job.
SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards compatibility)
List of nodes allocated to the job.
SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)
Total number of nodes in the job's resource allocation.
SLURM_JOB_PARTITION
Name of the partition in which the job is running.
SLURM_JOB_QOS
Quality Of Service (QOS) of the job allocation.
SLURM_JOB_RESERVATION
Advanced reservation containing the job allocation, if any.
SLURM_LOCALID
Node local task ID for the process within a job.
SLURM_MEM_PER_CPU
Same as --mem-per-cpu
SLURM_MEM_PER_GPU
Requested memory per allocated GPU. Only set if the --mem-per-gpu option is specified.
SLURM_MEM_PER_NODE
Same as --mem
SLURM_NODE_ALIASES
Sets of node name, communication address and hostname for nodes allocated to the job from the cloud. Each element in the set if colon
separated and each set is comma separated. For example: SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
SLURM_NODEID
ID of the nodes allocated.
SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)
Same as -n, --ntasks
SLURM_NTASKS_PER_CORE
Number of tasks requested per core. Only set if the --ntasks-per-core option is specified.
SLURM_NTASKS_PER_NODE
Number of tasks requested per node. Only set if the --ntasks-per-node option is specified.
SLURM_NTASKS_PER_SOCKET
Number of tasks requested per socket. Only set if the --ntasks-per-socket option is specified.
SLURM_HET_SIZE
Set to count of components in heterogeneous job.
SLURM_PRIO_PROCESS
The scheduling priority (nice value) at the time of job submission. This value is propagated to the spawned processes.
SLURM_PROCID
The MPI rank (or relative process ID) of the current process
SLURM_PROFILE
Same as --profile
SLURM_RESTART_COUNT
If the job has been restarted due to system failure or has been explicitly requeued, this will be sent to the number of times the job
has been restarted.
SLURM_SUBMIT_DIR
The directory from which sbatch was invoked or, if applicable, the directory specified by the -D, --chdir option.
SLURM_SUBMIT_HOST
The hostname of the computer from which sbatch was invoked.
SLURM_TASKS_PER_NODE
Number of tasks to be initiated on each node. Values are comma separated and in the same order as SLURM_JOB_NODELIST. If two or more
consecutive nodes are to have the same task count, that count is followed by "(x#)" where "#" is the repetition count. For example,
"SLURM_TASKS_PER_NODE=2(x3),1" indicates that the first three nodes will each execute two tasks and the fourth node will execute one
task.
SLURM_TASK_PID
The process ID of the task being started.
SLURM_TOPOLOGY_ADDR
This is set only if the system has the topology/tree plugin configured. The value will be set to the names network switches which
may be involved in the job's communications from the system's top level switch down to the leaf switch and ending with node name.
A period is used to separate each hardware component name.
SLURM_TOPOLOGY_ADDR_PATTERN
This is set only if the system has the topology/tree plugin configured. The value will be set component types listed in
SLURM_TOPOLOGY_ADDR. Each component will be identified as either "switch" or "node". A period is used to separate each hardware
component type.
SLURMD_NODENAME
Name of the node running the job script.