Document toolboxDocument toolbox

(v2.3.0.0) Final Validation with Slurm Job

Finalize Integration with Slurm

  1. Edit resume_xspot.sh and add a sed command for every pool:

    • user_data=$(cat /opt/slurm/etc/xvm16-_user_data | base64 -w 0)
    • becomes:

    • user_data=$(cat /opt/slurm/etc/exostellar/xvm16-_user_data | sed "s/XSPOT_NODENAME/$host/g" | base64 -w 0)

      N.B.: The cat command works against a user_data script in the ${SLURM_CONF_DIR}/exostellar directory.

  2. Edit your slurm.conf and add include statement to pick up xspot.slurm.conf. Replace ${SLURM_CONF_DIR} with the path to the Slurm configuration directory:

    1. include ${SLURM_CONF_DIR}/exostellar/xspot.slurm.conf
  3. Verify the xpot.slurm.conf file’s ResumeProgram and SuspendProgram point correctly at ${SLURM_CONF_DIR}/exostellar/resume_xspot.py and ${SLURM_CONF_DIR}/exostellar/suspend_xspot.py.

  4. Introducing new nodes into a slurm cluster requires restart of the slurm control deamon:

    1. systemctl restart slurmctld
  5. Integration steps are complete and a job submission to the new partition is the last validation:

    1. As a cluster user, navigate to a valid job submission directory and launch a job as normal, but be sure to specify the new partition:

      1. sbatch -p NewPartitionName < job-script.sh