Codes are executed by submitting a job script to a batch submission queue which in turn schedules the jobs to run on the compute nodes of Gele. To submit a job to Gele, please use the following submission script and submissions commands.
#! /bin/bash
#PBS -l mppwidth=8
#PBS -l walltime=00:10:00
#PBS -V
set -ex
cd $PBS_O_WORKDIR
aprun -n 8 ./foo.exe
exit
In this script we request 8 parallel tasks (mppwidth=8) for a maximum duration of 10 minutes (walltime=00:10:00). We ensure that our environment settings are exported to the compute nodes for execution (-V) and that our execution command is invoked within the same directory as that which the job script was submitted in.
The $PBS_O_WORKDIR environment variable always contains the path to the submission directory for the batch script.
To invoke an instance of the executable on the compute nodes, you are required to use the aprun command. You can run multiple instances of the same executable across multiple cores by including the -n parameter. In the sample batch script above, we have requested that our foo executable be duplicated on all 8 cores requested within the batch script resources.
Please ensure that your aprun command uses all the resources requested by the mppwidth parameter, otherwise valuable resources will be wasted.
You can modify most of the above settings if required. Please refer to the man pages (man qsub and man aprun) for further information.
To submit your job script to the batch queuing system use the following command:
In this example the job script is named script
When you submit your script successfully, the batch queuing system will issue you with a unique jobid for your submitted job. You can view the queuing and running status of your job by using the following commands:
After termination of your job, you will receive stdout (standard output) and stderr (standard error) logs to your submission directory. These logs are suffixed with the id of your submitted job. Please review these logs carefully to determine whether you had successful or unsuccessful termination of your executable.