To ensure optimal setup of Infrastructure Optimizer, please make a note of the following information that will be used during installation and integration:
Slurm Installation
SLURM_CONF_DIR : directory where
slurm.conf
is locatedSLURM_BIN_DIR : directory where slurm’s binaries are located, usually on users' PATH
Exostellar Management Server Information
MGMT_SERVER_IP : The internal or private IP Address can be found in the CloudFormation Outputs tab.
Facilitating Commands
Variables can be export
’d to facilitate copy/paste commands in the next sections of this guide, or source an arbitrary file, for example : . /root/facilitate
or source /root/facilitate
.
export MGMT_SERVER_IP=173.31.23.23 export SLURM_CONF_DIR=/opt/slurm/etc
Compute Environment
To prepare the nested compute node we need to process an AMI. If you have an existing AMI for your compute environment we can leverage that. If an AMI needs to be created from an existing compute node the following steps walk through that process.
The key concepts to keep in mind about getting a good AMI is that it should boot fast and do little to no work, for example in bootstrapping or user_data
. Also, it should have everything required to run the workflows and authenticate users.
To create an AMI from a slurm compute node:
Allocate a compute node:
salloc -N 1 -J ami-creation --no-shell --exclusive --nodelist=<NodeNanme>
When the job is allocated, gather some information on the node running the job:
The
salloc
command above should have output a JOB_ID.The
squeue
command should show the JOB_ID running on particular node.Issue the following command to capture information about that node:
scontrol show node <NODENAME>
Look for the
NodeAddr=
field in the output to find the Private IPv4 Address of the node running theami-creation
job.
Navigate to AWS console EC2 Instances page and search for the Private IPv4 Address.
Select that instance and click “Create an image” from the Actions button in the upper right corner.
AMI Creation takes several minutes to complete. Give the AMI a unique name, optionally a description, and accept the default values provided by the prompts.
NOTE : this action will reboot the node and kill our job, which is expected.