(v2.3.0.0) v2.3.0.0 Example User-Data Slurm

Example User Data Scripts

Whether you’ve rolled your own slurm cluster or you’re relying on AWS Parallel Cluster (APC), you might find these examples helpful.

APC Example with a Base CentOS 7 AMI

Longer user_data.sh

#!/bin/bash

set -x

hostname XSPOT_NODENAME

#echo 'root:AAAAAAA' |chpasswd
#sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/sshd/sshd_config
#sed -i 's/UsePAM yes/UsePAM no/g' /etc/sshd/sshd_config
#sed -i 's/#PermitRootLogin yes/PermitRootLogin yes/g' /etc/ssh/sshd_config
#echo 'ssh-rsa AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA root@apchead' >> /root/.ssh/authorized_keys
#systemctl restart sshd

APCHEAD=172.31.60.46

#mounting APC NFS dirs
mkdir -p /home /opt/parallelcluster/shared /opt/intel /opt/slurm
for i in /home /opt/parallelcluster/shared /opt/intel /opt/slurm ; do
	echo Mounting ${APCHEAD}:${i} ${i}
	mount -t nfs ${APCHEAD}:${i} ${i}
	echo Mounting ${APCHEAD}:${i} ${i} : SUCCESS.
done

#mounting EFS
mkdir /exoefs
echo 'fs-AAAAAAAAAAAAAAAAAA.efs.us-east-1.amazonaws.com:/ /exoefs nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=30,retrans=2,noresvport,_netdev 0 0' >> /etc/fstab
mount -a

#add krs, slurm401, munge402 users
groupadd -g 899 exo
useradd -u 1001 -g 899 krs
groupadd -g 401 slurm
groupadd -g 402 munge
useradd -g 401 -u 401 slurm
useradd -g 402 -u 402 munge

rpm -ivh /opt/parallelcluster/shared/munge/x86_64/munge-0.5.14-1.el7.x86_64.rpm
cp -p /opt/parallelcluster/shared/munge/munge.key /etc/munge/
chown munge.munge /etc/munge /var/log/munge
mkdir -p /var/spool/slurmd
chown slurm.slurm /var/spool/slurmd
sleep 5
systemctl start munge
if [[ $? -ne 0 ]]; then 
	sleep 10
	systemctl start munge
fi

SLURM_BIN_PATH=/opt/slurm/bin
SLURM_SBIN_PATH=/opt/slurm/sbin
SLURM_CONF_DIR=/opt/slurm/etc

${SLURM_BIN_PATH}/scontrol update nodename=XSPOT_NODENAME nodeaddr=`hostname -I | cut -d" " -f1`

#systemctl start slurmd
${SLURM_SBIN_PATH}/slurmd -f ${SLURM_CONF_DIR}/slurm.conf -N XSPOT_NODENAME

Note: Capturing an AMI from a running compute resource booted by AWS Parallel Cluster (APC) may be a very tedious task. APC nodes are not the most efficient pathway to generating an AMI for parsing due the complexities inherent in the APC management of its resources. It would be more expeditious to start with a base image that contains no APC dependencies and then to add those dependenices via a user_data.sh as above.

Example with a non-APC-compute-node-based AMI

Simple user_data.sh

#!/bin/bash

set -x

hostname XSPOT_NODENAME

SLURM_BIN_PATH=/usr/bin

${SLURM_BIN_PATH}/scontrol update nodename=XSPOT_NODENAME nodeaddr=`hostname -I | cut -d" " -f1`

systemctl start slurmd