Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
tar xf lsf.tgz -C ../
cd ..
mv assets/* .
rmdir assets
chown lsfadmin.root -R ../exostellar

Ensure lsb.modules is prepared for Resource Connector

...

  1. Code Block
    #exostellar
    LSB_STDOUT_DIRECT=Y        #optional
    LSB_RC_EXTERNAL_HOST_FLAG=xiohost
    LSF_LOCAL_RESOURCES="[resource xiohost] [type X86_64]"
    LSF_DYNAMIC_HOST_WAIT_TIME=2
    LSF_DYNAMIC_HOST_TIMEOUT=10m
    ENABLE_DYNAMIC_HOSTS=Y
    LSF_REG_FLOAT_HOSTS=Y
    EBROKERD_HOST_CLEAN_DELAY=5
    LSF_MQ_BROKER_HOSTS=head        #equivalent to LSF Master, in this example
    LSB_RC_EXTERNAL_HOST_IDLE_TIME=2
    EGO_DEFINE_NCPUS=threads 
  2. The values assigned for variables with TIME and DELAY may be tuned for the best timing scenario of your cluster and assets. The LSF Admin may opt for different timing than above.

Compute AMI Import

During Prepping the LSF Integration an AMI for compute nodes was identified or created. This step will import that AMI into Infrastructure Optimizer. Ideally, this AMI is capable of booting quickly.

Note: If SELINUX is not already disabled in the target AMI, it will need to be disabled. Step 5. below offers a script argument to make the change.

Code Block
./parse_helper.sh -a <AMI-ID> -i <IMAGE_NAME>
  1. The AMI-ID should be based on a LSF compute node from your cluster, capable of running your workloads.

  2. The AMI should be created by this account.

  3. The AMI should not have product codes.

  4. The Image Name was specified in the environment set up previously and will be used in this command.

  5. Additionally, we can pass -s script.sh if troubleshooting is required.

Validation of Migratable VM Joined to Your LSF Cluster

The script test_createVm.sh exists for a quick validation that new compute resources can successfully connect and register with the scheduler.

Code Block
./test_createVm.sh -h xvm0 -i <IMAGE_NAME> -u user_data.sh
  1. The hostname specified with -h xvm0 is arbitrary.

  2. The Image Name specified with -i <IMAGE_NAME> should correspond to the Image Name from the parse_helper.sh command and the environment setup earlier.

  3. The -u user_data.sh is available for any customization that may be required: temporarily changing a password to faciliate logging in, for example.

  4. The test_createVm.shscript will continuously output updates until the VM is created. When the VM is ready, the script will exit and you’ll see all the fields in the output are now filled with values:

    1. Code Block
      Waiting for xvm0... (4)
      NodeName: xvm0
      Controller: az1-qeuiptjx-1
      Controller IP: 172.31.57.160
      Vm IP: 172.31.48.108
  5. With the Vm IP above, ssh to the node and to inspect the compute node. This step is meant to provide a migratable VM so that sanity checking may occur:

    1. Have network mounts appeared as expected?

    2. Is authentication working as intended?

    3. What commands are required to finish bootstrapping?

    4. Et cetera.

  6. Iterate and validate as many times as required to satisfy all requirements.

  7. Lastly, LSF services should be started at the end of bootstrapping.

    1. It may take 5 minutes or longer for the LSF services to register with the LSF Master Host.

    2. When the RC Execution Host is properly registered, it will be visible via the lshost command.

  8. To remove this temporary VM:

    1. Replace VM_NAME with the name of the VM , -h xvm0 example above.

    2. Code Block
      languagenone
      curl -X DELETE  http://${MGMT_SERVER_IP}:5000/v1/xcompute/vm/VM_NAME
  9. When totally satisfied, stash the various commands required for successful bootstrapping and overwrite the user data scripts in the ${LSF_TOP}/conf/resource_connector/exostellar/conf directory.

    1. There will be a per-pool user_data script in that folder. It can be overwritten at any time a change is needed and the next time a node is instantiated from that pool, the node will get the changes.

    2. A common scenario is that all the user_data scripts are identical, but it could be beneficial for different pools to have different user_data bootstrapping assets.You can also add these settings to lsf.conf to faciliate reading output from lshosts and bhosts commands, but this is purely optional:

      1. Code Block
        LSB_BHOSTS_FORMAT="HOST_NAME:47 status:13 max:-8 njobs:-8 run:-8 ssusp:-8 ususp:-8 comments"
        LSF_LSHOSTS_FORMAT="HOST_NAME:47 res:13 nprocs:-8 ncores:-8 nthreads:-8 ncpus:-8 maxmem:-8:S maxswp:8:S server:7 type"
  10. The values assigned for variables with TIME and DELAY may be tuned for the best timing scenario of your cluster and assets. The LSF Admin may opt for different timing than above.

Restart Select Services on the LSF Master

Following directions from the files modified:

  1. Code Block
    su lsfadmin -c "lsadmin reconfig"
  2. Code Block
    su lsfadmin -c "badmin mbdrestart"
  3. Code Block
    su lsfadmin -c "badmin reconfig"
  4. Validate configuration changes with several additional commands:

    1. Verify your new queue exists:

      1. Code Block
        su lsfadmin -c "bqueues"
      2. You should see the queue you defined in the list of available queues.

    2. Verify Resource Connector’s xioProvider has templates available for job run:

      1. Code Block
        su lsfadmin -c "badmin rc view -c templates -p xio"
      2. Resource Connector cycles every 30 seconds by default, and this command may need to be rerun after 30 seconds.

      3. You should now see as many templates defined in the output as PoolNames were configured in earlier steps.

LSF and Resource Connector are now configured for use with Exostellar’s Infrastructure Optimizer.

Compute AMI Import

During Prepping the LSF Integration an AMI for compute nodes was identified or created. This step will import that AMI into Infrastructure Optimizer. Ideally, this AMI is capable of booting quickly.

Note: If SELINUX is not already disabled in the target AMI, it will need to be disabled. Step 5. below offers a script argument to make the change.

Code Block
./parse_helper.sh -a <AMI-ID> -i <IMAGE_NAME>
  1. The AMI-ID should be based on a LSF compute node from your cluster, capable of running your workloads.

  2. The AMI should be created by this account.

  3. The AMI should not have product codes.

  4. The Image Name was specified in the environment set up previously and will be used in this command.

  5. Additionally, we can pass -s script.sh if troubleshooting is required.

Validation of Migratable VM Joined to Your LSF Cluster

The script test_createVm.sh exists for a quick validation that new compute resources can successfully connect and register with the scheduler.

Code Block
./test_createVm.sh -h xvm0 -i <IMAGE_NAME> -u user_data.sh
  1. The hostname specified with -h xvm0 is arbitrary.

  2. The Image Name specified with -i <IMAGE_NAME> should correspond to the Image Name from the parse_helper.sh command and the environment setup earlier.

  3. The -u user_data.sh is available for any customization that may be required: temporarily changing a password to faciliate logging in, for example.

  4. The test_createVm.shscript will continuously output updates until the VM is created. When the VM is ready, the script will exit and you’ll see all the fields in the output are now filled with values:

    1. Code Block
      Waiting for xvm0... (4)
      NodeName: xvm0
      Controller: az1-qeuiptjx-1
      Controller IP: 172.31.57.160
      Vm IP: 172.31.48.108
  5. With the Vm IP above, ssh to the node and to inspect the compute node. This step is meant to provide a migratable VM so that sanity checking may occur:

    1. Have network mounts appeared as expected?

    2. Is authentication working as intended?

    3. What commands are required to finish bootstrapping?

    4. Et cetera.

  6. Iterate and validate as many times as required to satisfy all requirements.

  7. Lastly, LSF services should be started at the end of bootstrapping.

    1. It may take 5 minutes or longer for the LSF services to register with the LSF Master Host.

    2. When the RC Execution Host is properly registered, it will be visible via the lshost command.

  8. To remove this temporary VM:

    1. Replace VM_NAME with the name of the VM , -h xvm0 example above.

    2. Code Block
      languagenone
      curl -X DELETE  http://${MGMT_SERVER_IP}:5000/v1/xcompute/vm/VM_NAME
  9. When totally satisfied, place final versions of the user_data scripts in the ${LSF_RC_CONF_DIR}/exostellar/conf directory.

    1. There needs to be a per-pool user_data script in that folder. It can be overwritten at any time a change is needed and the next time a node is instantiated from that pool, the node will get the changes.

    2. A common scenario is that all the user_data scripts are identical, but it could be beneficial for different pools to have different user_data bootstrapping assets. If they are identical, links can be placed in the ${LSF_RC_CONF_DIR}/exostellar/conf directory instead of individual files.

    3. The auto-generated user_data scripts are initially located in ${LSF_RC_CONF_DIR}/exostellar/scripts. After copying them to ${LSF_RC_CONF_DIR}/exostellar/conf and modifying them, be sure to set their permissions so LSF can use them, e.g.:

      Code Block
      chown lsfadmin.root ${LSF_RC_CONF_DIR}/exostellar/conf/*_user_data.sh

Validate Integration with LSF

...