Frequently Asked Questions

My Job Is Aborted At Runtime Due To Excessive Disk-Swapping
If your job is aborted at runtime due to disk-swapping, this typically indicates that your simulation is starved of RAM at runtime. A general rule of thumb is to ensure that you request a maximum of 3.2 GB of RAM for every 1 core your simulation requires e.g. if your simulation requires 8GB of RAM in total, then you specify that in your slurm job submission script.

If unsure how much RAM your job will require, you can use the Ganglia Memory Monitoring tool to see how much RAM your job is using. So there should not be a problem if your job’s Actual memory graph never reaches the Total in-core memory graph. If it does, that means the machine has run out of RAM for your job and that will cause the system to use the Hard Disk to read/write data. This can cause your job to take longer than expected as well as potentially crash the system your job is running on if it is excessively reading/writing to disk.

The HPE DSI also provides some large memory nodes (512GB or 1TB RAM) which can be used for particularly demanding simulations.

 
Attachments

Please Wait!

Please wait... it will take a second!