LoadLeveler

Batch Jobs (LoadLeveler)

Description

Using a batch system is not difficult; however, it needs some practical expertise to create an efficient batch job. Some information on the principles of a batch system is described in our Batch HOWTO.

On the Virgo system LoadLeveler is installed as batch system.

Batch jobs are submitted to the LoadLeveler system via

llsubmit

LoadLeveler is a batch scheduling system for submitting serial and parallel jobs. LoadLeveler matches job requirements with the machine resources.. Submitting and running your jobs through the batch system is necessary to avoid system overload and enforces a fair share of the resources among the users. Before you can submit a job or perform any other job related tasks, you need to build a job command file.

A Job Command file is simply a text file that may contain the following different types of information:

Your Job Command file may tell LoadLeveler what you want to run through the executable keyword or you may have your Job Command file serve as the executable by not specifying an executable or by explicitly setting the executable to be the Job Command file itself. Therefore, your Job Command file may be a shell script that drives the programs that you want to execute.

The Job Command file may also contain different job steps that together constitute your job. You may name each job step and you must have a queue keyword entry for each job step you want to run. Unless otherwise noted, the keywords you set for the first job step will be inherited by all subsequent job steps. By default, LoadLeveler will view each job step as an independent entity but, by using the dependency keyword, you can conditionally execute different programs depending on the return value of the previous job steps.

Commands

CommandDescription
llclassReturns information about classes
llqQuery information about jobs in the queues
llcancelCancel job from the queue
llstatusReturns status information about nodes in the cluster

Recommended best practices

Set environment variables before submitting your job

For some programs, we provide module files that modify the environment appropriately. If these are loaded in the script section, the environment is often not properly inherited. It is much safer to do the following:

$ module load

$ llsubmit file.job

Furthermore, environment variables should be explicitly copied to individual processes using the appropriate LoadLeveler keyword:

#@ environment = COPY_ALL

What not to do

Do not oversubscribe for memory or CPU resources unnecessarily

Unless the entire node is reserved for running your job, requesting too much memory will make it more difficult for the system to schedule other jobs on that node, as there will be fewer jobs that can fit into the leftover portion of memory. Thus, part of the node may be idle while other researchers' jobs are needlessly delayed. Similarly, if your job can only scale well to 12 cores, requesting 24 cores will delay other researchers' jobs, while offering you only a marginal speedup. In this matter, please be considerate of your colleagues.

Do not oversubscribe for walltime

LL uses backfilling scheduling which means, that if there are resources (cores or memory or GPU devices), reserved for another job, waiting for more resources, the scheduler can slip in a shorter job to use currently idle resources, increasing the efficiency of scheduling algorithm and allowing you to receive results faster. This process, however, relies on fair assessment of the requested job wall time, which is specified by wall_clock_limit.

LL Classes

LL Commands

LL Job States

LL Keyword