documentation.tcl

2.8 Batch Queuing Support

2.8 Batch Queuing Support

DESCRIPTION

This section describes how to use job schedulers (or batch queuing systems) in PWTK. At present, PWTK supports the SLURM, LSF, PBS, and Load-Leveler job schedulers.

TABLE OF CONTENTS

Structure of a batch shell script in PWTK
How to submit PWTK scripts to job schedulers
PWTK configuration files for job schedulers
Head and tail parts of a batch shell script
Beware of caveat

Structure of a batch shell script in PWTK

The way PWTK supports job schedulers is that it creates a small shell script with batch-queuing directives. The PWTK script itself is then executed by PWTK within the so-created batch shell script. The batch shell script is composed of four parts:

        profile   (always exists)
         |
        head      (optional)
         |
        pwtk      (always exists)
         |
        tail      (optional)

where:

profile -- contains the batch-queuing directives (e.g. #SBATCH for SLURM, #PBS for PBS, etc.)
head -- contains optional shell commands that are executed prior to executing PWTK, such as "module load ..."
pwtk -- this part is set internally by PWTK, it is where the PWTK script is executed
tail -- contains optional shell commands that are executed after the PWTK script finishes

How to submit PWTK scripts to job schedulers

There are two ways how to instruct PWTK to submit a PWTK script to the batch queuing system. The easiest way to submit a PWTK script (say job.pwtk) to job scheduler is from the terminal, i.e.:

    pwtk --slurm job.pwtk   (for SLURM)
    pwtk --lsf job.pwtk     (for LSF)
    pwtk --pbs job.pwtk     (for PBS)
    pwtk --ll job.pwtk      (for LoadLeveler)

Note that these options are configurable. For example, for the SLURM job scheduler (for other job schedulers, the usage is analogous), further details can be specified with the --slurm option, i.e.:

    pwtk --slurm=PROFILE job.pwtk

    pwtk --slurm="PROFILE OPTIONS" job.pwtk

where PROFILE is the name of the user-defined SLURM profile (see below), and OPTIONS are the Slurm 'sbatch' command-line options.

The other way to instruct PWTK to submit to the Slurm job scheduler is within the PWTK script with the SLURM command, i.e.:

    SLURM {
       # here comes the PWTK script code, for example:
       import data.pwtk
       foreach structure $structureList {
          runPW relax.$structure
       }
    }

For other job schedulers, the corresponding commands are LSF, PBS, LL. These commands are fully configurable, see ::pwtk::SLURM, ::pwtk::LSF, ::pwtk::PBS, ::pwtk::LL. The syntax of the SLURM and other such commands is:

    SLURM ?profile? ?options? { ...script code... }

or:

    SLURM ?profile? ?options? file.pwtk

where 'profile' and 'options' arguments are optional; 'profile' is the name of the SLURM profile. If it is omitted, the default profile is used, which is guaranteed to exist because it is defined in the $PWTK/config/slurm.tcl file.

PWTK configuration files for job schedulers

A user can define any number of such profiles in the ~/.pwtk/slurm.tcl, ~/.pwtk/lsf.tcl, ~/.pwtk/pbs.tcl, ~/.pwtk/ll.tcl configuration files for the respective job schedulers. Here is an example of two user defined SLURM profiles, named "parallel" and "long":

    slurm_profile parallel {
        #!/bin/sh
        #SBATCH --nodes=1
        #SBATCH --ntasks=16
        #SBATCH --time=6:00:00
        #SBATCH --partition=parallel
    } {
        prefix mpirun -np 16
    }
    
    slurm_profile long {
        #!/bin/sh
        #SBATCH --nodes=1
        #SBATCH --ntasks=64
        #SBATCH --time=2-00:00:00
        #SBATCH --partition=long
    } {
        prefix mpirun -np 64
    }

The usage of the slurm_profile is (for other supported job schedulers, analogous commands are lsf_profile, pbs_profile, ...):

slurm_profile profileName slurmDirectives ?pwtkDirectives?

where the last pwtkDirectives argument is optional. Its purpose is to provide to PWTK a default way of how to run executables. For example, the above long profile request 64 tasks, hence it is reasonable to run executables with mpirun -np 64.

The 'long' profile is then requested either from the terminal as:

    pwtk --slurm=long job.pwtk

or within the PWTK script as:

    SLURM long job.pwtk

Note that further options can be specified on the command line, for example:

    SLURM long --nodes=4 --ntasks-per-node=16 job.pwtk

Head and tail parts of a batch shell script

The head and tail parts of a batch shell script are set with the slurm_head and slurm_tails commands, and analogously for the other job schedulers (e.g., lsf_head & lsf_tail, pbs_head, pbs_tail, ...). A typical usage of the slurm_head command is to load modules, e.g.:

    slurm_head {
       module load qe-7.3
    }

To clear the head, use:

    slurm_head {}

Note that modules can also be specified in profiles, i.e.:

    slurm_profile long {
        #!/bin/sh
        #SBATCH --nodes=1
        #SBATCH --ntasks=64
        #SBATCH --time=2-00:00:00
        #SBATCH --partition=long

        module load qe-7.3
    }

The difference between loading modules in a given profile and the head part of the script is that for a profile, the module will be loaded only when the specific profile is requested. In contrast, module loaded within the head applies to all profiles.

Beware of caveat

Note that scripts specified with the SLURM, LSF, PBS... commands are PWTK child instances. The problem with child instances is that they do not automatically inherit the state from the parent. For example, the following script will fail:

    import data.pwtk
    SYSTEM { ecutwfc = 30.0 }

    SLURM {
       runPW calc1
    }

because the script supplied to the SLURM command is a child instance born in the empty state. Hence, runPW will fail because it has no input data. There are two ways how to deal with this issue. One possibility is to include everything inside the SLURM command, i.e.:

    SLURM {
       import data.pwtk
       SYSTEM { ecutwfc = 30.0 }
       runPW calc1
    }

The other option is to use the PWTK's propagate mechanism (see ::pwtk::propagate), with which one can specify what to propagate to child processes, i.e.:

    propagate {
       import data.pwtk
       SYSTEM { ecutwfc = 30.0 }
    }

    SLURM {
       runPW calc1
    }