Integrating with LSF

The following manual steps for EAR will be replaced with a simplified workflow for command line users and alternatively, the Mangement Console (Web UI) will be able to replace most of these steps, as well.

Connect to Your Slurm Head Node

During Early Access, integreation requires a handful of commands and root or sudo access on the slurm controller, where slurmctld runs.

Get a shell on the head node and navigate to the slurm configuration directory, where slurm.conf resides.
1. ```
cd $SLURM_CONF_DIR
```

Make subdirectories here:

mkdir -p exostellar/json
cd exostellar/json

Pull Down the Default Slurm Environment Assets as a JSON Payload:

The packages for jq and curl are required:
1. ```
yum install jq curl
```
2. CentOS EoL: You may need to ensure yum can still function due to recent End of Life cycle from RedHat for CentOS 7. The following command is an exmple mitigation for an internet-dependent yum repository:
3. ```
sed -i -e 's|^mirrorlist=|#mirrorlist=|g' -e 's|^#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*.repo
yum clean all
```

Pull down from the MGMT_SERVER default assets for customization:

curl -X GET http://${MGMT_SERVER_IP}:5000/v1/env/slurm | jq > default-slurm-env.json

The asset will look like:

default-slurm-env.json

{
  "EnvName": "slurm",
  "HeadAddress": "<HeadAddress>",
  "Pools": [
    {
      "PoolName": "xvm16-",
      "PoolSize": 10,
      "ProfileName": "az1",
      "VM": {
        "CPUs": 16,
        "ImageName": "ubuntu",
        "MaxMemory": 60000,
        "MinMemory": 4096,
        "UserData": "I2Nsb3VkLWNvbmZpZwpydW5jbWQ6CiAgLSBbc2gsIC1jLCAibWtkaXIgLXAgL3hjb21wdXRlIl0KICAtIFtzaCwgLWMsICJtb3VudCAxNzIuMzEuMjQuNToveGNvbXB1dGUgL3hjb21wdXRlIl0KICAtIFtzaCwgLWMsICJta2RpciAtcCAvaG9tZS9zbHVybSJdCiAgLSBbc2gsIC1jLCAibW91bnQgMTcyLjMxLjI0LjU6L2hvbWUvc2x1cm0gL2hvbWUvc2x1cm0iXQogIC0gW3NoLCAtYywgInJtIC1yZiAvZXRjL3NsdXJtIl0KICAtIFtzaCwgLWMsICJsbiAtcyAveGNvbXB1dGUvc2x1cm0vIC9ldGMvc2x1cm0iXQogIC0gW3NoLCAtYywgImNwIC94Y29tcHV0ZS9zbHVybS9tdW5nZS5rZXkgL2V0Yy9tdW5nZS9tdW5nZS5rZXkiXQogIC0gW3NoLCAtYywgInN5c3RlbWN0bCByZXN0YXJ0IG11bmdlIl0KICAjIEFMV0FZUyBMQVNUIQogIC0gWwogICAgICBzaCwKICAgICAgLWMsCiAgICAgICJlY2hvIFhTUE9UX05PREVOQU1FID4gL3Zhci9ydW4vbm9kZW5hbWU7IHNjb250cm9sIHVwZGF0ZSBub2RlbmFtZT1YU1BPVF9OT0RFTkFNRSBub2RlYWRkcj1gaG9zdG5hbWUgLUlgIiwKICAgIF0KCg==",
        "VolumeSize": 10
      }
    }
  ],
  "Security": {
    "User": "slurm",
    "UserKeyPem": ""
  },
  "Slurm": {
    "BinPath": "/bin",
    "ConfPath": "/etc/slurm",
    "PartitionName": "normal"
  },
  "Type": "slurm",
  "Id": "1f019d92-d356-42b4-ba2c-d65bea40474a"
}

Edit the Slurm Environment JSON for Your Purposes:

Copy default-slurm-env.json to something convenient like env0.json.
1. ```
cp default-slurm-env.json env0.json
```
Note Line numbers listed below reference the above example file. Once changes start being made on the system, the line numbers may change.
Line 2 : "EnvName" is set to slurm by default, but you can specify something unique if needed.
Lines 5-17 can be modified for a single pool of identical compute resources or they can be duplicated and then modified for each “hardware” configuration or “pool” you choose. When duplicating, be sure to add a comma after the brace on line 17, except when it is the last brace, or the final pool declaration.
1. PoolName: This will be the apparent hostnames of the compute resources provided for slurm.
  1. It is recommended that all pools share a common trunk or base in each PoolName.
2. PoolSize: This is the maximum number of these compute resources.
3. ProfileName: This is the default profile name, az1: If this is changed, you will need to carry the change forward.
4. CPUs: This is the targeted CPU-core limit for this "hardware" configuration or pool.
5. ImageName: This is tied to the AMI that will be used for your compute resources. This name will be used in subsequent steps.
6. MaxMemory: This is the targeted memory limit for this "hardware" configuration or pool.
7. MinMemory: reserved for future use; can be ignored currently.
8. UserData: This string is a base64 encoded version of user_data.
  1. To generate it:
    1. cat user_data.sh | base64 -w 0
  2. To decode it:
    1. echo "<LongBase64EncodedString>" | base64 -d
  3. It’s not required to be perfectly fine-tuned at this stage; it will be refined and corrected later.
  4. You may format user_data.sh in the usual ways:
    1. #cloud-config runcmd: - [sh, -c, "set -x"] - [sh, -c, "hostname XSPOT_NODENAME"] - [sh, -c, '/usr/bin/scontrol update nodename=XSPOT_NODENAME nodeaddr=`hostname -I | cut -d" " -f1`'] - [sh, -c, "/usr/sbin/slurmd -N XSPOT_NODENAME -f /etc/slurm/slurm.conf"]
      or
    2. #!/bin/bash set -x #export SLURM_BIN_DIR=/opt/slurm/bin export SLURM_BIN_DIR=/usr/bin hostname XSPOT_NODENAME ${SLURM_BIN_DIR}/scontrol update nodename=XSPOT_NODENAME nodeaddr=`hostname -I | cut -d" " -f1` systemctl start slurmd
9. VolumeSize: reserved for future use; can be ignored currently.
Lines 24, 25, 26 should be modified for your slurm environment and according to your preference for the partition name.
1. BinPath: This is where scontrol, squeue, and other slurm binaries exist.
2. ConfPath: This is where slurm.conf resides.
3. PartitionName: This is for naming the new partition.
All other fields/lines in this asset can be ignored.

Validate and Push the Customized Environment to the MGMT_SERVER

Validate the JSON asset with jq:
1. ```
jq . env0.json
```
2. You will see well-formatted JSON if jq can read the file, indicating no errors. If you see an error message, that means the JSON is not valid.

When the JSON is valid, the file can be pushed to the MGMT_SERVER:

curl -d "@env0.json" -H 'Content-Type: application/json' -X PUT http://${MGMT_SERVER_IP}:5000/v1/env

Pull Down the Default Profile Assets as a JSON Payload:

The default is named az1.

curl -X GET http://${MGMT_SERVER_IP}:5000/v1/profile/az1 |jq > default-profile.json

Copy it to faciliatate customization, leaving the default for future reference.
1. ```
cp default-profile.json profile0.json
```
The asset will look like this:

default-profile.json

{
  "AvailabilityZone": "us-east-2c",
  "Controller": {
    "IamInstanceProfile": "arn:aws:iam::270000099005:instance-profile/io-apc05-ExostellarInstanceProfile-CmxXXD1CZAId",
    "InstanceTags": [
      {
        "Key": "exostellar.xspot-role",
        "Value": "xspot-controller"
      }
    ],
    "InstanceType": "c5d.xlarge",
    "SecurityGroupIds": [
      "sg-016bd6ead636fa5bb"
    ],
    "SubnetId": "subnet-02d2d57c0673d6a5a",
    "VolumeSize": 100,
    "ImageId": "ami-0d4c57b22746fe832"
  },
  "LogPath": "/xcompute/logs",
  "MaxControllers": 10,
  "ProfileName": "az1",
  "Region": "us-east-2",
  "Worker": {
    "IamInstanceProfile": "arn:aws:iam::270000099005:instance-profile/io-apc05-ExostellarInstanceProfile-CmxXXD1CZAId",
    "InstanceTags": [
      {
        "Key": "exostellar.xspot-role",
        "Value": "xspot-worker"
      }
    ],
   "InstanceTypes": [
      "m5:0",
      "m6i:1"
    ],
    "SecurityGroupIds": [
      "sg-016bd6ead636fa5bb"
    ],
    "SpotFleetTypes": [
      "m5:1",
      "m5d:0",
      "m5n:0",
      "m6i:2"
    ],
    "SubnetId": "subnet-02d2d57c0673d6a5a",
    "ImageId": "ami-09559faf3fc003160"
  },
  "Xspot": {
    "HyperthreadingDisabled": false
  },
  "XspotVersion": "xspot-2.2.0.1",
  "Id": "eb78d0e0-24a9-4dfd-a5ae-d7abcb9edcbd",
  "NodeGroupName": "wlby3xy1"
}

Edit the Profile JSON for Your Purposes:

Tagging instances created by the backend is controlled by two sections, depending on the function of the asset:
1. Controllers are On-Demand instances that manage other instances. By default, they are tagged as seen on lines 6-9, above, and 1-4 below.
  1. { "Key": "exostellar.xspot-role", "Value": "xspot-controller" }
  2. To add additional tags, duplicate lines 1-4 as 5-8 below (as many times as you need), noting that an additional comma is added on line 4.
  3. { "Key": "exostellar.xspot-role", "Value": "xspot-controller' }, { "Key": "MyCustomKey", "Value": "MyCustomValue" }
  4. Don’t forget the comma between tags.
2. Workers will be created by Controllers as needed and they can be On-Demand/Reserved instances or Spot. By default, they are tagged as seen on lines 26-30, above, and 1-4 below:
  1. { "Key": "exostellar.xspot-role", "Value": "xspot-worker" }
  2. Add as many tags as needed.
  3. { "Key": "exostellar.xspot-role", "Value": "xspot-worker" }, { "Key": "MyCustomKey", "Value": "MyCustomValue" }
  4. Don’t forget the comma between tags.
Note Line numbers listed below reference the above example file. Once changes start being made on the system, the line numbers may change.
Line 11 - InstanceType: Controllers do not generally require large instances.
1. In terms of performance, these On-Demand Instances can be set as c5.xlarge or m5.xlarge with no adverse effect.
Line 20 - MaxControllers : This will define an upper bound for your configuration.
1. Controllers will manage upto 80 workers.
2. The default upper bound is 800 nodes joining your production cluster: notice line 20 "MaxControllers": 10,.
3. If you plan to autoscale past 800 nodes joining your production cluster, MaxControllers should be increased.
4. If you want to lower that upper bound, MaxControllers should be decreased.
Line 21 - ProfileName: This is used for your logical tracking, in the event you configure multiple profiles.
Lines 31-34 - InstanceTypes here in the Worker section, this refers to On-Demand instances – if there is no spot availability, what instances do you want to run on.
Lines 38-43 - SpotFleetTypes : here in the Worker section, this refers to Spot instance types – because of the discounts, you may be comfortable with a much broader range of instance types.
1. More types and families here, means more opportunities for cost optimization.
2. Priorities can be managed by appending a : and an integer, e.g. m5:0 is a higher priority than c5:1.
Line 48 - Hyperthreading: This is reserved for future use, can be ignored currently.
Line 52 - NodeGroupName : This string appears in Controller Name tagging <profile>-NGN-count
All other field/lines can be ignored in the asset.

Validate and Push the Customized Profile to the MGMT_SERVER

Validate the profile with the quick jq test.
1. ```
jq . profile0.json
```

Push the changes live.

curl -d "@profile0.json" -H 'Content-Type: application/json' -X PUT http://${MGMT_SERVER_IP}:5000/v1/profile

Download Scheduler Assets from the Management Server

curl -X GET http://${MGMT_SERVER_IP}:5000/v1/xcompute/download/slurm -o slurm.tgz

If the EnvName was changed (above in Edit the Slurm Environment JSON for Your Purposes - Step 2 ), then the following command can be used with your CustomEnvironmentName :

curl -X GET http://${MGMT_SERVER_IP}:5000/v1/xcompute/download/slurm?envName=CustomEnvironmentName -o slurm.tgz

Unpack them into the exostellar folder:

tar xf slurm.tgz -C ../
cd ..
mv assets/* .
rmdir assets

Compute AMI Preparation

An AMI is required that is capable of running your workloads. Ideally, this AMI is capable of booting quickly.

./parse_helper.sh -a <AMI-ID> -i <IMAGE_NAME>

The AMI-ID should be based on a slurm compute node from your cluster, capable of running your workloads.
The AMI should be created by this account.
The AMI should not have product codes.
The Image Name was specified in the environment set up previously and will be used in this command.
Additionally, we can pass -s script.sh if troubleshooting is required.

Validation of Migratable VM Joined to Your Slurm Cluster

The script test_createVm.sh exists for a quick validation that new compute resources can successfully connect and register with the scheduler.

./test_createVm.sh -h xvm0 -i <IMAGE_NAME> -u user_data.sh

The hostname specified with -h xvm0 is completely arbitrary.
The Image Name specified with -i <IMAGE_NAME> should correspond to the Image Name from the parse_helper.sh command and the environment setup earlier.
The -u user_data.sh is available for any customization that may be required: temporarily changing a password to faciliate logging in, for example.
The test_createVm.shscript will continuously output updates until the VM is created. When the VM is ready, the script will exit and you’ll see all the fields in the output are now filled with values:
1. ```
Waiting for xvm0... (4)
NodeName: xvm0
Controller: az1-qeuiptjx-1
Controller IP: 172.31.57.160
Vm IP: 172.31.48.108
```
This step is meant to provide a migratable VM so that sanity checking may occur:
1. Have network mounts appeared as expected?
2. Is authentication working as intended?
3. What commands are required to finish bootstrapping?
4. Et cetera.
Lastly, slurmd should be started at the end of bootstrapping.
1. Output from starting slurmd willl likely show an error because the arbitrary host is unknown to the scheduler:
2. ```
/opt/slurm/sbin/slurmd -N xvm0 -f /opt/slurm/etc/slurm.conf
```
  But that is not a problem since xvm0 has not been added to the cluster yet. That will happen in subsequent steps.
To remove this temporary VM:
1. Replace VM_NAME with the name of the VM , -h xvm0 example above.
2. ```
curl -X DELETE  http://${MGMT_SERVER_IP}:5000/v1/xcompute/vm/VM_NAME
```
The above steps may need to be iterated through several times. When totally satisfied, stash the various commands required for successful bootstrapping and overwrite the user data scripts in the exostellar directory.
1. There will be a per-pool user_data script in the assets folder. It can be overwritten at any time a change is needed and the next time a node is instantiated from that pool, the node will get the changes.
2. A common scenario is that all the user_data scripts are identical, but it could be beneficial for different pools to have different user_data bootstrapping assets.

Finalize Integration with Slurm

Edit resume_xspot.sh and add sed command for every pool:

user_data=$(cat /opt/slurm/etc/xvm16-_user_data | base64 -w 0)

becomes:

user_data=$(cat /opt/slurm/etc/exostellar/xvm16-_user_data | sed "s/XSPOT_NODENAME/$host/g" | base64 -w 0)

Edit your slurm.conf and add include statement to pick up xspot.slurm.conf:
1. ```
include ${SLURM_CONF_DIR}/exostellar/xspot.slurm.conf
```
Introducing new nodes into a slurm cluster requires restart of the slurm control deamon:
1. ```
systemctl restart slurmctld
```
Integration steps are complete and a job submission to the new partition is the last validation:
1. As a user, navigate to a valid job submission directory and launch a job as normal, but be sure to specifiy the new partition:
  1. sbatch -p NewPartitionName < job-script.sh