Slurm : Job , Step , Task

 


In Slurm, job, step, and task describe different layers of work. The terminology can be confusing, so a clean breakdown helps.

๐Ÿงฉ Slurm Concepts: Job vs Task (and Job Step)

1. Job

A job is the top‑level unit you submit to Slurm using sbatch, srun, or salloc.

  • Represents the entire workload you want Slurm to run.

  • Has resource requests: nodes, CPUs, memory, time limit, etc.

  • Can contain one or more job steps.

Think of a job as the container.

2. Job Step

A job step is a subdivision of a job, created with srun inside a job allocation.

  • Each step can run a different program or phase.

  • Steps share the job’s allocated resources.

  • Steps can run sequentially or in parallel.

Example: preprocessing → simulation → postprocessing.

3. Task

A task is the smallest unit: typically one process (often one MPI rank).

  • Created by srun or by Slurm when launching a job step.

  • If you request --ntasks=8, Slurm launches 8 tasks.

  • Each task may have 1 or more CPU threads (--cpus-per-task).

Think of a task as a process.

✔️ Summary Table

ConceptMeaningCreated ByTypical Analogy
JobEntire workload submitted to Slurmsbatch, srun, sallocA project
Job StepA phase inside a jobsrun inside a jobA stage in the project
TaskA single process (often MPI rank)srunA worker doing the work

๐Ÿงช Example

Submit a job requesting 2 nodes and 8 tasks:

bash
sbatch --nodes=2 --ntasks=8 run.sh

Inside run.sh:

bash
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=8

# Job step 1: preprocessing (single task)
srun --ntasks=1 preprocess.py

# Job step 2: main simulation (8 tasks)
srun --ntasks=8 ./simulate_mpi

# Job step 3: postprocessing (2 tasks)
srun --ntasks=2 postprocess.py

What happens?

  • Job: the entire script run.sh

  • Job Step 1: preprocess.py → 1 task

  • Job Step 2: simulate_mpi → 8 tasks (MPI ranks)

  • Job Step 3: postprocess.py → 2 tasks

๐Ÿง  Intuition

  • Job = reservation of resources

  • Job step = a command executed within that reservation

  • Task = a process launched by that command

Comments

Popular posts from this blog

Nvidia BlueField DPUs

Nvidia Rubin