Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Copy default-lsf-env.json to something convenient like env0.json.

    1. Code Block
      cp default-lsf-env.json env0.json
  2. Note Line numbers listed below reference the above example file. Once changes start being made on the system, the line numbers may change.

  3. Line 2 : "EnvName" is set to lsf by default, but you can specify something unique if needed.

  4. Lines 5-17 can be modified for a single pool of identical compute resources or they can be duplicated and then modified for each “hardware” configuration or “pool” you choose. When duplicating, be sure to add a comma after the brace on line 17, except when it is the last brace, or the final pool declaration.

    1. PoolName: This will be the apparent hostnames of the compute resources provided for LSF.

      1. It is recommended that all pools share a common trunk or base in each PoolName.

    2. Priority: When equal, LSF will treat all pools as having equal priority and then make scheduling decisions based on alphabetical naming. It may be beneficial to set smaller nodes with a higher priority so that jobs are placed on the smallest fitting node, something like:

      1. 2-core nodes : Priority=1000

      2. 4-core nodes : Priority=100

      3. 8-core nodes : Priority=10

      4. So that jobs are scheduled on the smallest node that fulfills the resource requirements of the job.

    3. PoolSize: This is the maximum number of these compute resources.

    4. ProfileName: This is the default profile name, az1: If this is changed, you will need to carry the change forward.

    5. CPUs: This is the targeted CPU-core limit for this "hardware" configuration or pool.

    6. ImageName: This is tied to the AMI that will be used for your compute resources. This name will be used in subsequent steps.

    7. MaxMemory: This is the targeted memory limit for this "hardware" configuration or pool.

    8. MinMemory: reserved for future use; can be ignored currently.

    9. UserData: This string is a base64 encoded version of user_data.

      1. To generate it:

        1. cat user_data.sh | base64 -w 0

      2. To decode it:

        1. echo "<LongBase64EncodedString>" | base64 -d

      3. It’s not required to be perfectly fine-tuned at this stage; it will be refined and corrected later.

      4. You may format user_data.sh in the usual ways:

      5. Code Block
        #cloud-config
        
        runcmd:
          - [sh, -c, "set -x"]
          - [sh, -c, "hostname $( echo ip-$ (hostname -I |sed 's/\./-/g' |sed 's/ //g' ) )"]
          - [sh, -c, "echo root:AAAAAA |chpasswd"]
          - [sh, -c, "sed -i.orig '3d' /etc/hosts"]
          - [sh, -c, "echo >> /etc/hosts"]
          - [sh, -c, "echo -e \"$( hostname -I )\t\t\t$( hostname )\" >> /etc/hosts"]
          - [sh, -c, "sed -i 's/awshost/xiohost/g' /opt/lsf/conf/lsf.conf"]
          - [sh, -c, "source /opt/lsf/conf/profile.lsf"]
          - [sh, -c, "lsadmin limstartup"]
          - [sh, -c, "lsadmin resstartup"]
          - [sh, -c, "badmin hstartup"]

        or

      6. Exampleuser_data.sh:

        1. Code Block
          #!/bin/bash
          
          set -x
          
          IP=$( hostname -I |awk '{print $1}' )
          NEW_HOSTNAME=ip-$( echo ${IP} |sed 's/\./-/g' )
          hostname ${NEW_HOSTNAME}
          
          echo >> /etc/hosts
          echo -e "${IP}\t\t${NEW_HOSTNAME}" >> /etc/hosts
          
          . /opt/lsf/conf/profile.lsf
          
          lsadmin limstartup
          lsadmin resstartup
          badmin hstartup
    10. VolumeSize: reserved for future use; can be ignored currently.

  5. All other fields/lines in this asset can be ignored currently.

Validate and Push the Customized Environment to the MGMT_SERVER

...

Expand
titledefault-profile.json
  1. Code Block
    {
      "AvailabilityZone": "us-east-2c",
      "Controller": {
        "IamInstanceProfileIdentityRole": "arn:aws:iam::270000099005:instance-profile/io-apc05-ExostellarInstanceProfile-CmxXXD1CZAId",
        "InstanceTags": [
          {
            "Key": "exostellar.xspot-role",
            "Value": "xspot-controller"
          }
        ],
        "InstanceType": "c5d.xlarge",
        "SecurityGroupIds": [
          "sg-016bd6ead636fa5bb"
        ],
        "SubnetId": "subnet-02d2d57c0673d6a5a",
        "VolumeSize": 100,
        "ImageId": "ami-0d4c57b22746fe832"
      },
      "LogPath": "/xcompute/logs",
      "MaxControllers": 10,
      "ProfileName": "az1",
      "Region": "us-east-2",
      "Worker": {
        "IamInstanceProfileIdentityRole": "arn:aws:iam::270000099005:instance-profile/io-apc05-ExostellarInstanceProfile-CmxXXD1CZAId",
        "InstanceTags": [
          {
            "Key": "exostellar.xspot-role",
            "Value": "xspot-worker"
          }
        ],
       "InstanceTypes": [
          "m5:0",
          "m6i:1"
        ],
        "SecurityGroupIds": [
          "sg-016bd6ead636fa5bb"
        ],
        "SpotFleetTypes": [
          "m5:1",
          "m5d:0",
          "m5n:0",
          "m6i:2"
        ],
        "SubnetId": "subnet-02d2d57c0673d6a5a",
        "ImageId": "ami-09559faf3fc003160"
      },
      "Xspot": {
        "EnableHyperthreading": true,
        "HyperthreadingDisabledEnableBalloon": falsetrue
      },
      "XspotVersion": "xspot-2.2.0.1",
      "Id": "eb78d0e0-24a9-4dfd-a5ae-d7abcb9edcbd",
      "NodeGroupName": "wlby3xy1",
      "Status": "idle"
    }

Edit the Profile JSON for Your Purposes:

  1. Tagging instances created by the backend is controlled by two sections, depending on the function of the asset:

    1. Controllers are On-Demand instances that manage other instances. By default, they are tagged as seen on lines 6-9, above, and 1-4 below.

      1. Code Block
              {
                "Key": "exostellar.xspot-role",
                "Value": "xspot-controller"
              }
      2. To add additional tags, duplicate lines 1-4 as 5-8 below (as many times as you need), noting that an additional comma is added on line 4.

      3. Code Block
              {
                "Key": "exostellar.xspot-role",
                "Value": "xspot-controller'
              },
              {
                "Key": "MyCustomKey",
                "Value": "MyCustomValue"
              }
      4. Don’t forget the comma between tags.

    2. Workers will be created by Controllers as needed and they can be On-Demand/Reserved instances or Spot. By default, they are tagged as seen on lines 26-30, above, and 1-4 below:

      1. Code Block
              {
                "Key": "exostellar.xspot-role",
                "Value": "xspot-worker"
              }
      2. Add as many tags as needed.

      3. Code Block
              {
                "Key": "exostellar.xspot-role",
                "Value": "xspot-worker"
              },
              {
                "Key": "MyCustomKey",
                "Value": "MyCustomValue"
              }
      4. Don’t forget the comma between tags.

  2. Note Line numbers listed below reference the above example file. Once changes start being made on the system, the line numbers may change.

  3. Line 11 - InstanceType: Controllers do not generally require large instances.

    1. In terms of performance, these On-Demand Instances can be set as c5.xlarge or m5.xlarge with no adverse effect.

  4. Line 20 - MaxControllers : This will define an upper bound for your configuration.

    1. Controllers will manage upto 80 workers.

    2. The default upper bound is 800 nodes joining your production cluster: notice line 20 "MaxControllers": 10,.

    3. If you plan to autoscale past 800 nodes joining your production cluster, MaxControllers should be increased.

    4. If you want to lower that upper bound, MaxControllers should be decreased.

  5. Line 21 - ProfileName: This is used for your logical tracking, in the event you configure multiple profiles.

  6. Lines 31-34 - InstanceTypes here in the Worker section, this refers to On-Demand instances – if there is no spot availability, what instances do you want to run on.

  7. Lines 38-43 - SpotFleetTypes : here in the Worker section, this refers to Spot instance types – because of the discounts, you may be comfortable with a much broader range of instance types.

    1. More types and families here, means more opportunities for cost optimization.

    2. Priorities can be managed by appending a : and an integer, e.g. m5:1 is a higher priority than c5:0 c5:0.

  8. Line 48 - EnableHyperthreading: Set true for hyperthreaded cores and false to disable hyperthreading.

  9. Line 48 49 - Hyperthreading: This is reserved for future use, can be ignored currently EnableBalloon: This will always be set to true. It increases migration efficiency. Setting to false may be useful in a troubleshooting scenario.

  10. Line 52 - NodeGroupName : This string appears in Controller Name tagging <profile>-NGN-count

  11. All other field/lines can be ignored in the asset.

...

Expand
titleQueue Declaration
Code Block
Begin Queue
QUEUE_NAME              = xio
PRIORITY                = 90
NICE                    = 20
FAIRSHARE               = USER_SHARES[[default,10]]
RC_DEMAND_POLICY        = THRESHOLD[ [1, 1] [10, 60] [100] ]
RC_HOSTS                = xiohost
RES_REQ                 = xiohost
RC_ACCOUNT              = xio
DESCRIPTION             = xspot
NEW_JOB_SCHED_DELAY     = 0
REQUEUE_EXIT_VALUES     = 99
End Queue
  1. Add three lines resources to the Resource Definitions in lsf.shared:

    1. Code Block
      cd ${LSF_TOP}/conf
      vi lsf.shared
    2. Example with the required

    line
    1. lines added:

      1. Code Block
        Begin Resource
        RESOURCENAME  TYPE    INTERVAL INCREASING  DESCRIPTION        # Keywords
        ...
           xiohost    Boolean    ()       ()       (instances from Infrastructure Optimizer)
           rc_account String     ()       ()       (account name for the external hosts)
           templateID   String   ()       ()       (template ID for the external hosts)
        ...
        End Resource
    2. Resource Connector can leverage xiohost, rc_account, and templateID when provisioning compute capacity.

Add Required Lines to lsf.conf

  1. Code Block
    #exostellar
    LSB_STDOUT_DIRECT=Y
    LSB_RC_EXTERNAL_HOST_FLAG=xiohost
    LSF_LOCAL_RESOURCES="[resource xiohost] [type X86_64]"
    LSF_DYNAMIC_HOST_WAIT_TIME=2
    LSF_DYNAMIC_HOST_TIMEOUT=10m
    ENABLE_DYNAMIC_HOSTS=Y
    LSF_REG_FLOAT_HOSTS=Y
    EBROKERD_HOST_CLEAN_DELAY=5
    LSF_MQ_BROKER_HOSTS=head
    LSB_RC_EXTERNAL_HOST_IDLE_TIME=2
    EGO_DEFINE_NCPUS=threads 
  2. The values assigned for variables with TIME and DELAY may be tuned for the best timing scenario of your cluster and assets. The LSF Admin may opt for different timing than above, but .

Compute AMI Import

During Prepping the LSF Integration an AMI for compute nodes was identified or created. This step will import that AMI into Infrastructure Optimizer. Ideally, this AMI is capable of booting quickly.

...