Need help brainstorming how to best submit 100,000 jobs

Hi, I need to submit 100,000 runs of a program called fastsimcoal. Each run is relatively memory light, but can take anywhere from a few min to several hours to complete. Since I have so many runs, it would be ideal to submit and run as many as possible at one time. I’m worried about overloading the queueing system and don’t want to break anything. If anyone has any idea how I could accomplish this, that would be helpful.

This sounds like a good use for a Slurm Array Job.

You can submit array jobs with a range of 2500 jobs (ie. --array 1-2500, --array=2501-5000, etc). Within the job itself, the SLURM_ARRAY_TASK_ID environment variable is set and can be used as input to fastsimcoal.

I’m not personally sure what fastsimcoal takes as input, but if it is a handful of files that isn’t easily adapted to the SLURM_ARRAY_TASK_ID variable, then another option could be to use a bash script to mass-submit jobs (maybe only a couple thousand at a time, so as not to completely overwhelm the queue). As example of this could be the following files.

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2 
#SBATCH --mem=1G
#SBATCH --time=0-08:00:00     # 8 hours
#SBATCH --partition=epyc     # epyc, intel, or batch for long-running jobs >2 hours

FILENAME=$1    # $1 is the first argument passed to the bash script

# (I'm making up how fastsimcoal takes input, adapt to reality)
fastsimcoal --input $FILENAME


# This assumes there is a folder called "submit" that contains the files that fastsimcoal takes as input
for FILENAME in $(ls ./submit); do
    sbatch ${FILENAME}

If it’s something more advanced than that, let me know and I can see if I can adapt something :slight_smile:

This should work, I think. Let me try it and if I need something more nuanced, I’ll let you know. Thank you!