Site Tools


howtouseslurm_004

Using dependencies in Slurm

Slurm has a fairly robust set of dependencies you can use. These are set when you submit the job and can be used for setting up pipelines. A job can depend on more than one other job as well.

To use dependencies, submit the job with the following switch. If using multiple dependency types, they should be comma seperated:

-d, --dependency=<dependency_list>

You may want to run a set of jobs sequentially, so that the second job runs only after the first one has completed. This can be accomplished using Slurm's job dependencies options. For example, if you have two jobs, job1.batch and job2.batch, you can utilize job dependencies as in the example below :

[user@lnode002]$ sbatch job1.batch
Submitted batch job 111

[user@lnode002]$ sbatch --dependency=afterany:111 job2.batch
Submitted batch job 112

The flag –dependency=afterany:111 tells the batch system to start the second job only after completion of the first job. afterany indicates that job2 will run regardless of the exit status of job1, i.e. regardless of whether the batch system thinks job1 completed successfully or unsuccessfully.

Once job 111 completes, job 112 will be released by the batch system and then will run as the appropriate nodes become available.

Exit status: The exit status of a job is the exit status of the last command that was run in the batch script. An exit status of '0' means that the batch system thinks the job completed successfully. It does not necessarily mean that all commands in the batch script completed successfully.

There are several options for the –dependency flag that depend on the status of job1. e.g.

--dependency=afterany:job1	job2 will start after job1 completes with any exit status
--dependency=after:job1	        job2 will start any time after job1 starts
--dependency=afterok:job1	job2 will run only if job1 completed with an exit status of 0
--dependency=afternotok:job1	job2 will run only if job1 completed with a non-zero exit status 

Making a job depend on the completion of several other jobs: example below :

[user@lnode002]$ sbatch job1.batch
Submitted batch job 201

[user@lnode002]$ sbatch job2.batch
Submitted batch job 202

[user@lnode002]$ sbatch --dependency=afterany:201,202 job3.batch
Submitted batch job 203

[user@lnode002]$ squeue -u $USER -S S,i,M -o "%12i %15j %4t %30E"
JOBID        NAME            ST   DEPENDENCY                    
201          job1.batch      R                                  
202          job2.batch      R                                  
203          job3.batch      PD   afterany:201,afterany:202 

*) https://hpc.nih.gov/docs/job_dependencies.html
*) http://www.hpc.caltech.edu/documentation/faq/dependencies-and-pipelines

howtouseslurm_004.txt · Last modified: 2023/08/01 01:08 by 127.0.0.1