/
(v2.3.0.0) Customizing the Slurm Application Environment

(v2.3.0.0) Customizing the Slurm Application Environment

Edit the Slurm Environment JSON for Your Purposes:

  1. Copy default-slurm-env.json to something convenient like env0.json.

    1. cp default-slurm-env.json env0.json
  2. Note Line numbers listed below reference the above example file. Once changes start being made on the system, the line numbers may change.

  3. Line 2 : "EnvName" is set to slurm by default, but you can specify something unique if needed.

    1. NOTE: Currently, - characters are not supported in values forEnvName.

  4. Lines 5-20 can be modified for a single pool of identical compute resources or they can be duplicated and then modified for each “hardware” configuration or “pool” you choose. When duplicating, be sure to add a comma after the brace on line 17, except when it is the last brace, or the final pool declaration.

    1. PoolName: This will be the apparent hostnames of the compute resources provided for slurm.

      1. It is recommended that all pools share a common trunk or base in each PoolName.

    2. PoolSize: This is the maximum number of these compute resources.

    3. ProfileName: This is the default profile name, az1: If this is changed, you will need to carry the change forward.

    4. CPUs: This is the targeted CPU-core limit for this "hardware" configuration or pool.

    5. ImageName: This is tied to the AMI that will be used for your compute resources. This name will be used in subsequent steps.

    6. MaxMemory: This is the targeted memory limit for this "hardware" configuration or pool.

    7. MinMemory: reserved for future use; can be ignored currently.

    8. UserData: This string is a base64 encoded version of user_data.

      1. To generate it:

        1. cat user_data.sh | base64 -w 0

      2. To decode it:

        1. echo "<LongBase64EncodedString>" | base64 -d

      3. It’s not required to be perfectly fine-tuned at this stage; it will be refined and corrected later.

      4. You may format user_data.sh in the usual ways:

        1. Simple slurm example:

          #!/bin/bash set -x #export SLURM_BIN_DIR=/opt/slurm/bin export SLURM_BIN_DIR=/usr/bin hostname XSPOT_NODENAME ${SLURM_BIN_DIR}/scontrol update nodename=XSPOT_NODENAME nodeaddr=`hostname -I | cut -d" " -f1` systemctl start slurmd
        2. APC Example:

        3. #!/bin/bash set -x APCHEAD=XXX.XX.X.XXX #enter APC Head Node IP Address ###### hostname XSPOT_NODENAME #For trouble shooting #echo root:TroubleShooting |chpasswd #sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/sshd/sshd_config #sed -i 's/UsePAM yes/UsePAM no/g' /etc/sshd/sshd_config sed -i 's/#PermitRootLogin yes/PermitRootLogin yes/g' /etc/ssh/sshd_config echo 'ssh-rsa 0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010 root@APCHEAD' >> /root/.ssh/authorized_keys systemctl restart sshd mkdir -p /home /opt/parallelcluster/shared /opt/intel /opt/slurm for i in /home /opt/parallelcluster/shared /opt/intel /opt/slurm ; do echo Mounting ${APCHEAD}:${i} ${i} mount -t nfs ${APCHEAD}:${i} ${i} echo Mounting ${APCHEAD}:${i} ${i} : SUCCESS. done mkdir /exoniv echo 'fs-0553a8e956ccff4da.efs.us-east-1.amazonaws.com:/ /exoniv nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=30,retrans=2,noresvport,_netdev 0 0' >> /etc/fstab mount -a #add local users, real users, and/or testing users groupadd -g 899 exo useradd -u 1001 -g 899 krs groupadd -g 401 slurm groupadd -g 402 munge useradd -g 401 -u 401 slurm useradd -g 402 -u 402 munge rpm -ivh /opt/parallelcluster/shared/munge/x86_64/munge-0.5.14-1.el7.x86_64.rpm cp -p /opt/parallelcluster/shared/munge/munge.key /etc/munge/ chown munge.munge /etc/munge /var/log/munge mkdir -p /var/spool/slurmd chown slurm.slurm /var/spool/slurmd sleep 5 systemctl start munge if [[ $? -ne 0 ]]; then sleep 10 systemctl start munge fi SLURM_BIN_PATH=/opt/slurm/bin SLURM_SBIN_PATH=/opt/slurm/sbin SLURM_CONF_DIR=/opt/slurm/etc ${SLURM_BIN_PATH}/scontrol update nodename=XSPOT_NODENAME nodeaddr=`hostname -I | cut -d" " -f1` #systemctl start slurmd ${SLURM_SBIN_PATH}/slurmd -f ${SLURM_CONF_DIR}/slurm.conf -N XSPOT_NODENAME
    9. VolumeSize: reserved for future use; can be ignored currently.

  5. Lines 24, 25, 26 should be modified for your slurm environment and according to your preference for the partition name.

    1. BinPath: This is where scontrol, squeue, and other slurm binaries exist.

    2. ConfPath: This is where slurm.conf resides.

    3. PartitionName: This is for naming the new partition.

  6. All other fields/lines in this asset can be ignored.

Related content

v2.3.9.0 Customizing the Slurm Application Environment
v2.3.9.0 Customizing the Slurm Application Environment
More like this
(v2.3.0.0) Adding or Modifying Pools Slurm
(v2.3.0.0) Adding or Modifying Pools Slurm
More like this
v2.3.9.0 Adding or Modifying Pools Slurm
v2.3.9.0 Adding or Modifying Pools Slurm
More like this
(v2.3.0.0) Importing Slurm Compute AMI
(v2.3.0.0) Importing Slurm Compute AMI
More like this
(v2.3.0.0) Integrating with Slurm
(v2.3.0.0) Integrating with Slurm
More like this
v2.3.9.0 Default Slurm Application Environment
v2.3.9.0 Default Slurm Application Environment
More like this