Slurm : Job , Step , Task
In Slurm, job, step, and task describe different layers of work. The terminology can be confusing, so a clean breakdown helps.
๐งฉ Slurm Concepts: Job vs Task (and Job Step)
1. Job
A job is the top‑level unit you submit to Slurm using sbatch, srun, or salloc.
Represents the entire workload you want Slurm to run.
Has resource requests: nodes, CPUs, memory, time limit, etc.
Can contain one or more job steps.
Think of a job as the container.
2. Job Step
A job step is a subdivision of a job, created with srun inside a job allocation.
Each step can run a different program or phase.
Steps share the job’s allocated resources.
Steps can run sequentially or in parallel.
Example: preprocessing → simulation → postprocessing.
3. Task
A task is the smallest unit: typically one process (often one MPI rank).
Created by
srunor by Slurm when launching a job step.If you request
--ntasks=8, Slurm launches 8 tasks.Each task may have 1 or more CPU threads (
--cpus-per-task).
Think of a task as a process.
✔️ Summary Table
| Concept | Meaning | Created By | Typical Analogy |
|---|---|---|---|
| Job | Entire workload submitted to Slurm | sbatch, srun, salloc | A project |
| Job Step | A phase inside a job | srun inside a job | A stage in the project |
| Task | A single process (often MPI rank) | srun | A worker doing the work |
๐งช Example
Submit a job requesting 2 nodes and 8 tasks:
sbatch --nodes=2 --ntasks=8 run.sh
Inside run.sh:
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=8
# Job step 1: preprocessing (single task)
srun --ntasks=1 preprocess.py
# Job step 2: main simulation (8 tasks)
srun --ntasks=8 ./simulate_mpi
# Job step 3: postprocessing (2 tasks)
srun --ntasks=2 postprocess.py
What happens?
Job: the entire script
run.shJob Step 1:
preprocess.py→ 1 taskJob Step 2:
simulate_mpi→ 8 tasks (MPI ranks)Job Step 3:
postprocess.py→ 2 tasks
๐ง Intuition
Job = reservation of resources
Job step = a command executed within that reservation
Task = a process launched by that command
Comments
Post a Comment