/
(v2.3.0.0) Example User-Data Slurm

(v2.3.0.0) Example User-Data Slurm

Example User Data Scripts

Whether you’ve rolled your own slurm cluster or you’re relying on AWS Parallel Cluster (APC), you might find these examples helpful.

APC Example with a Base CentOS 7 AMI

#!/bin/bash set -x hostname XSPOT_NODENAME #echo 'root:AAAAAAA' |chpasswd #sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/sshd/sshd_config #sed -i 's/UsePAM yes/UsePAM no/g' /etc/sshd/sshd_config #sed -i 's/#PermitRootLogin yes/PermitRootLogin yes/g' /etc/ssh/sshd_config #echo 'ssh-rsa AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA root@apchead' >> /root/.ssh/authorized_keys #systemctl restart sshd APCHEAD=172.31.60.46 #mounting APC NFS dirs mkdir -p /home /opt/parallelcluster/shared /opt/intel /opt/slurm for i in /home /opt/parallelcluster/shared /opt/intel /opt/slurm ; do echo Mounting ${APCHEAD}:${i} ${i} mount -t nfs ${APCHEAD}:${i} ${i} echo Mounting ${APCHEAD}:${i} ${i} : SUCCESS. done #mounting EFS mkdir /exoefs echo 'fs-AAAAAAAAAAAAAAAAAA.efs.us-east-1.amazonaws.com:/ /exoefs nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=30,retrans=2,noresvport,_netdev 0 0' >> /etc/fstab mount -a #add krs, slurm401, munge402 users groupadd -g 899 exo useradd -u 1001 -g 899 krs groupadd -g 401 slurm groupadd -g 402 munge useradd -g 401 -u 401 slurm useradd -g 402 -u 402 munge rpm -ivh /opt/parallelcluster/shared/munge/x86_64/munge-0.5.14-1.el7.x86_64.rpm cp -p /opt/parallelcluster/shared/munge/munge.key /etc/munge/ chown munge.munge /etc/munge /var/log/munge mkdir -p /var/spool/slurmd chown slurm.slurm /var/spool/slurmd sleep 5 systemctl start munge if [[ $? -ne 0 ]]; then sleep 10 systemctl start munge fi SLURM_BIN_PATH=/opt/slurm/bin SLURM_SBIN_PATH=/opt/slurm/sbin SLURM_CONF_DIR=/opt/slurm/etc ${SLURM_BIN_PATH}/scontrol update nodename=XSPOT_NODENAME nodeaddr=`hostname -I | cut -d" " -f1` #systemctl start slurmd ${SLURM_SBIN_PATH}/slurmd -f ${SLURM_CONF_DIR}/slurm.conf -N XSPOT_NODENAME

Note: Capturing an AMI from a running compute resource booted by AWS Parallel Cluster (APC) may be a very tedious task. APC nodes are not the most efficient pathway to generating an AMI for parsing due the complexities inherent in the APC management of its resources. It would be more expeditious to start with a base image that contains no APC dependencies and then to add those dependenices via a user_data.sh as above.

 

Example with a non-APC-compute-node-based AMI

#!/bin/bash set -x hostname XSPOT_NODENAME SLURM_BIN_PATH=/usr/bin ${SLURM_BIN_PATH}/scontrol update nodename=XSPOT_NODENAME nodeaddr=`hostname -I | cut -d" " -f1` systemctl start slurmd