Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

This is an installation guide for deploying the Exostellar Management Server using the provided AWS CloudFormation template. Intended for DevOps, System Administrators, and Cloud Engineers, it outlines the process from prerequisites to accessing the deployed instance.

If you are installing using the CFT, please subscribe to the CFT product listing on the AWS Marketplace to proceed.

Prerequisites

Before you begin, use this checklist to confirm that your tools and environment satisfy the specifications required.

Tools:

Network

Component

Requirements

VPC

  • It contains at least one private subnet

NAT Gateway

  • The connectivity type is public

Use the following commands to ensure that the AWS VPC where the product will run has at least one private subnet with public NAT Gateways.

Check private subnets that are suitable for running the Exostellar Workers:

aws ec2 describe-subnets --filters "Name=vpc-id,Values=<vpc_id>" --query 'Subnets[?MapPublicIpOnLaunch==`false`].SubnetId'

Check whether there is a public NAT Gateway attached:

aws ec2 describe-nat-gateways --filter Name=vpc-id,Values=<vpc_id> --output json | jq '.NatGateways[] | {NatGatewayId, SubnetId, ConnectivityType}'

If no private subnets exist, follow the AWS documentation to create a private subnet and a public NAT Gateway.

EKS users: If you already have an EKS Cluster set up, please install the Exostellar Management Server into the same VPC as your EKS Cluster and ensure that the VPC meets the above requirements.

Security

Component

Details

SSH Key

  • This will be used to attach to the Management Server

Trusted Certificate

  • Required only if deploying in a private environment.

A pre-provisioned, user-managed SSH key pair is required to access the Exostellar Management Server.

Use the following command to create a new SSH key pair:

aws ec2 create-key-pair --key-name 'my-dev-key' --query 'KeyMaterial' --output text --region us-east-2 > my-dev-key.pem

Modify the permission to secure the key:

chmod 400 my-dev-key.pem 

For environments with existing PKI setup, the x509 certificates, private key, and optionally, intermediate chain certificates and CA certificates will also be needed.

Compute

Component

Requirements

Operating System

  • Exostellar runs on Rocky Linux.

Permissions

The following file contains the minimum IAM permissions required by the AWS IAM principal used to install the product:

 Expand this section to view a detailed explanation of each IAM permission
  1. Amazon EC2 (Elastic Compute Cloud)

  • Instance Management

    • ec2:RunInstances

    • ec2:DescribeInstances

    • ec2:DescribeInstanceTypes

    • ec2:DescribeInstanceStatus

    • ec2:StopInstances

    • ec2:TerminateInstances

    • ec2:ModifyInstanceAttribute

  • Network and Security

    • ec2:DescribeSubnets

    • ec2:DescribeVpcs

    • ec2:DescribeVpcAttribute

    • ec2:DescribeSecurityGroups

    • ec2:AuthorizeSecurityGroupIngress

    • ec2:CreateSecurityGroup

    • ec2:RevokeSecurityGroupIngress

    • ec2:DeleteSecurityGroup

    • ec2:DescribeSecurityGroupRules

  • Resource Tagging and Metadata

    • ec2:CreateTags

  • Others

    • ec2:DescribeKeyPairs

    • ec2:DescribeImages

    • ec2:DescribeImageAttribute

    • ec2:DescribeAvailabilityZones

    • ec2:DescribeAccountAttributes

    • ec2:DescribeRouteTables

    • ec2:DescribeNetworkAcls

    • ec2:DescribeAddresses

    • ec2:DescribeDhcpOptions

    • ec2:DescribeSnapshots

  1. Amazon S3 (Simple Storage Service)

  • Object Operations

    • s3:GetObject

    • s3:PutObject

  1. Amazon CloudFormation

  • Stack Operations

    • cloudformation:CreateStack

    • cloudformation:UpdateStack

    • cloudformation:CreateUploadBucket

    • cloudformation:DescribeStackEvents

    • cloudformation:DescribeStacks

    • cloudformation:GetTemplateSummary

    • cloudformation:ListStacks

    • cloudformation:ListStackResources

    • cloudformation:DeleteStack

  1. AWS IAM (Identity and Access Management)

  • Role Management

    • iam:CreateRole

    • iam:DeleteRole

    • iam:ListRoles

    • iam:TagRole

    • iam:PutRolePolicy

    • iam:DeleteRolePolicy

    • iam:GetRole

    • iam:ListAttachedRolePolicies

    • iam:AttachRolePolicy

  • Instance Profile Operations

    • iam:CreateInstanceProfile

    • iam:AddRoleToInstanceProfile

    • iam:RemoveRoleFromInstanceProfile

    • iam:DeleteInstanceProfile

  • Policy Management

    • iam:ListPolicies

    • iam:PassRole

  • Other

    • iam:ListOpenIDConnectProviders

    • iam:GetOpenIDConnectProvider

    • iam:ListEntitiesForPolicy

    • iam:CreateServiceLinkedRole

    • iam:ListInstanceProfiles

    • iam:ListInstanceProfilesForRole

  1. Amazon EKS (Elastic Kubernetes Service)

  • Cluster Operations

    • eks:DescribeCluster

    • eks:ListClusters

    • eks:UpdateClusterConfig

    • eks:UpdateClusterVersion

  • Nodegroup Operations

    • eks:CreateNodegroup

    • eks:DescribeNodegroup

    • eks:ListNodegroups

    • eks:UpdateNodegroupConfig

    • eks:UpdateNodegroupVersion

  • Addon Operations

    • eks:DescribeAddon

    • eks:DescribeAddonVersions

    • eks:ListAddons

    • eks:UpdateAddon

  • API Access and Policy Management

    • eks:AccessKubernetesApi

    • eks:ListAccessPolicies

    • eks:AssociateAccessPolicy

    • eks:ListIdentityProviderConfigs

    • eks:DescribeAccessEntry

    • eks:ListPodIdentityAssociations

    • eks:ListAssociatedAccessPolicies

    • eks:CreateAccessEntry

  1. AWS SSM (System Manager)

  • Association Operations

    • ssm:ListAssociations

    • ssm:GetParametersByPath

To manage and scale your workloads efficiently, the Exostellar Controllers and Workers require a set of IAM permissions. Use this CloudFormation template to create the necessary permissions:

When completed, the roles and instance profile ARNs outputs by CloudFormation will be needed for subsequent installation steps.

Installation Steps

Step 1: Preparation

Please verify your environment configurations align with the prerequisites outlined above.

Step 2: Navigate to AWS CloudFormation

Log in to the AWS Management Console and select the CloudFormation service.

Step 3: Create a New Stack

  • Select Create stack > With new resources (standard).

  • Select Choose an existing template.

  • Choose to use the Amazon S3 URL, then copy the URL below.

  • Proceed by clicking Next.

Step 4: Specify Stack Details

  • Stack Name: Assign a unique name to your CloudFormation stack, e.g., IOManagementServer.

Network Configurations

  • VPC ID: Choose the VPC where the Management Server will be deployed.

  • Subnet ID: Select the appropriate subnet for deployment.

  • Is Subnet Public? Indicate true for public subnets.

  • VPC CIDR: Enter the CIDR block for the selected VPC.

  • Shared Security Group ID: Enter the Security Group ID for your EKS or Slurm cluster, or leave it blank.

Instance Configurations

  • EC2 Instance Type: Select a suitable instance type for the Management Server. The default is m5d.xlarge.

  • SSH Key Pair: Select an existing SSH key pair. This is used to access the Management Server.

  • TerminationProtection: We recommend enabling termination protection for your Management Server to prevent accidental deletions or stops.

  • Volume Size: Define the volume for the Management Server. The default is set to 100 GB, which you can adjust based on your needs.

HA NFS Integration

By default, the X-IO NFS file system is run on the Management Server. To use a standalone remote NFS file share instance such as AWS EFS, enter the NFS DNS name and security group ID:

  • NFS DNS name: DNS name of the remote file shares instance. E.g., fs-123456789.efs.us-east-1.amazonaws.com. If left empty, the default NFS file system is used.

  • NFS security group ID: ID of the security group to enable traffic between X-IO and the remote file shares.

On EFS, only the default EFS file system policy is supported. See the AWS documentation for more information on this file system policy.

Click Next to continue.

Step 5: Configure Options

This step is optional. Configure any additional stack options such as tags, permissions, and other settings relevant to your organization. Click Next to advance.

Step 6: Review and Deploy

  • Carefully review all configurations to ensure accuracy.

  • Acknowledge the creation of IAM resources by ticking the relevant checkbox.

  • Initiate the deployment by clicking Create stack.

Step 7: Access the Management Server

Upon successful deployment, navigate to the Outputs section of your CloudFormation stack to retrieve access details:

  • ExostellarMgmtServerURL: Access URL for the Management Server.

  • ExostellarMgmtServerPrivateIP: Private IP of the Management Server.

  • ExostellarAdminUsername: The initial admin username for the Management Server.

  • ExostellarOptimizerAdminPassword: The initial admin password. Make sure to change this upon your first login for security.

Verification

After successful deployment, log in to the Management Server with the provided credentials. You will be prompted to change the admin password. It's also recommended to explore the platform and configure your infrastructure optimization settings as per your organization's requirements.

How can I verify that my environment is correctly configured and the installation is successful without connecting to a cluster?

Step 1 - Navigate to the Resources tab of the CloudFormation page and go to ExostellarInstance

Step 2 - SSH into the Management Server EC2 instance and start a Hello World VM on the Amazon EC2 Spot Instance!

cp /xcompute/slurm/bin/xcompute-daemon/host-cluster/test_createVm.sh ~/test_createVm.sh && \
sed -i "s/^XCOMPUTE_HEAD_IP.*/XCOMPUTE_HEAD_IP=127.0.0.1/g" ~/test_createVm.sh && \
cat > ~/user_data.sh <<- EOX
#!/bin/bash

echo "$(cat ~/.ssh/id_rsa.pub)" >> /root/.ssh/authorized_keys

EOX
cd ~ && ./test_createVm.sh -i ubuntu -h hello_world_ubuntu -u ./user_data.sh

If you see a message similar to this, congratulations! You have successfully configured and installed everything correctly!

NodeName: hello_world_ubuntutu... 340s
Controller: az1-yy7jyuv5-1
Controller IP: 192.0.12.xx
Vm IP: 192.0.6.xxx
##########   done    ##########

Clean up - Shut down the Hello World VM, and the instances will be automatically terminated:

curl -v -X DELETE  http://localhost:5000/v1/xcompute/vm/hello_world_ubuntu

  • No labels