WrapSlurm is a powerful and user-friendly wrapper for SLURM job management, designed to simplify job submission, resource querying, log monitoring, and cancellation in SLURM environments. With a suite of commands like wrun, wlog, wqueue, winfo, and wk, WrapSlurm enhances productivity for researchers and engineers working in high-performance computing (HPC) clusters.
-
Simplified Job Submission (
wr):- Automatically detect optimal resources (nodes, partitions, CPUs, memory, GPUs) based on the cluster's configuration.
- Friendly summaries before each run highlight auto-detected values and log locations.
- Persist preferred defaults (e.g., partition, account, log directory) with
--save-defaults. - Automatically use the partition's maximum runtime when no explicit
--timeis provided. - Support for interactive and non-interactive SLURM jobs, plus a convenient
--dry-runpreview mode. - Customizable SLURM settings like time, tasks per node, exclusions, job names, and output directories.
-
Log Monitoring (
wl):- Watch real-time SLURM logs for specific job IDs or the latest job.
-
Job Cancellation (
wk):- Quickly terminate jobs (optionally with a signal) using a friendly wrapper around
scancel.
- Quickly terminate jobs (optionally with a signal) using a friendly wrapper around
-
Queue Visualization (
wq):- View and analyze job queues in a prettified table format with color-coded states.
-
Node Resource Querying (
wi):- Display detailed SLURM node information, including memory, CPU, and GPU usage.
-
Help / Usage (
ws):- Display a summary of all WrapSlurm commands and their usage.
WrapSlurm is available on PyPI and can be installed using pip:
pip install wrapslurmIf the scripts wrun, wlog, wqueue, winfo, and wk are installed in a directory not included in your system's PATH (e.g., ~/.local/bin), you may need to update your PATH environment variable:
-
Add the following line to your shell configuration file (
~/.bashrcor~/.zshrc):export PATH="$PATH:$HOME/.local/bin"
-
Reload your shell:
source ~/.bashrc # or source ~/.zshrc
Submit a script with auto-detected resources:
wr ./train_script.py --epochs 10wr now shows a colorized summary of the resources that will be requested, including values auto-detected from sinfo and those loaded from saved defaults.
wr now shows a colorized summary of the resources that will be requested, including values auto-detected from sinfo and those loaded from saved defaults.
Submit a job with explicit resources:
wr --nodes 2 --partition gp4d --account ENT212162 --cpus-per-task 8 --memory 200G --gpus 4 ./train_script.pyYou can also name the job, change where helper scripts are stored, or choose a custom log directory:
wr --job-name my-training --script-dir ./sbatch --report-dir ./logs python train.pyStart an interactive session:
wrUse wr --interactive --nodes 2 to override the automatic detection while still launching an interactive shell.
You can persist frequently used settings (e.g., partition, account, log directory) so future runs pick them up automatically:
wr --save-defaults --partition gp4d --account ENT212162 --report-dir ./slurm-reportDefaults are stored in ~/.config/wrapslurm/defaults.json. Running wr --save-defaults stores the provided flags and exits without submitting a job.
View all available options:
wr --helpwr --dry-run python train.pyDry runs print the exact sbatch script so you can review the environment setup before submitting.
wlog streams SLURM output with tail -n 20 -f so you can follow job progress without the extra load from watch.
Logs are written to ./slurm-report/%j.out and ./slurm-report/%j.err by default.
wlwl --job-id 12345678To inspect stderr instead, open ./slurm-report/12345678.err with your preferred tool.
Send scancel commands without memorizing flags:
wk 12345678Cancel multiple jobs in one go:
wk 12345678 12345679Pass through additional options such as a signal or user scope:
wk 12345678 --signal SIGINT
wk --user alice 12345680All options are forwarded to scancel, so you can combine them as needed.
Display the job queue in a table format:
wqueuewinfowinfo --include-downwinfo --graph-
Query available resources:
wi
-
Submit a job:
wr --account xxxxxx --time 2-00:00:00 ./train_script.py
-
Monitor job logs:
wl
-
Check the queue:
wq
git clone https://github.com/yourusername/wrapslurm.git
cd wrapslurmInstall the required Python packages:
pip install -r requirements.txtExecute unit tests:
pytestWe welcome contributions! Please follow these steps:
- Fork the repository.
- Create a feature branch:
git checkout -b feature-name
- Commit your changes:
git commit -m "Add feature-name" - Push to your fork:
git push origin feature-name
- Submit a pull request.
This project is licensed under the MIT License.
Special thanks to the SLURM community for making HPC resource management accessible to researchers worldwide.