Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleWhat happens if there are no available spot instances?

Exostellar’s scheduler searches for EC2 Spot availability in Exostellar recommends selecting a diversification of instance types to maximize the likelihood of finding EC2 Spot instance availability within the region. The scheduler dynamically searches for Spot availability across related instance types you have identified in the same region. If no EC2 Spot instance types are available for those types, the system will automatically and safely migrate your workloads back to diverse on-demand instances, ensuring no disruption to your operations.

...

Expand
titleWhere can I find all logs for debugging?

For diagnosing issues related to the Exostellar Management Server, you will primarily rely on the following log locations:

  • /var/log/messages

  • /var/log/munge/munged.log/var/log/slurm/aws.log

  • /var/log/slurm/slurmctld.log

  • /home/slurm

For any debugging relating to the Exostellar Controller and the Worker, the logs can be found at:

  • /xcompute/slurm/bin/xcompute-daemon/data/

  • /xcompute/logs/

    • Within this directory you will find the messages directory and an xspot directory. This xspot directory will contain X-Spot specific logs, in addition to logs for the jobs and workers spawned by that controller.

If you need help troubleshooting, please use the following command to pack all logs on the Exostellar Management Server and upload the file with your support request here:

Code Block
languagebash
tar -czvf exostellar-logs-$(date +%Y-%m-%d).tar.gz /var/log/messages /var/log/munge/munged.log /var/log/slurm/aws.log /var/log/slurm/slurmctld.log /home/slurm /xcompute/logs /xcompute/slurm/bin/xcompute-daemon/data