This troubleshooting guide is designed to help you identify and resolve common issues in your Infrastructure Optimizer environment. Using this guide, you can quickly pinpoint problems and implement solutions to ensure smooth operations.
Please use the following command to pack all logs on the Exostellar Management Server:
tar -czvf exostellar-logs-$(date +%Y-%m-%d).tar.gz /var/log/messages /var/log/munge/munged.log /var/log/slurm/aws.log /var/log/slurm/slurmctld.log /home/slurm /xcompute/logs /xcompute/slurm/bin/xcompute-daemon/data
Logs
For diagnosing issues related to the Exostellar Management Server, you will primarily rely on the following log locations:
/var/log/messages
/var/log/munge/munged.log
/var/log/slurm/aws.log
/var/log/slurm/slurmctld.log
/home/slurm
For any debugging relating to the Exostellar Controller and the Worker, the logs can be found at:
/xcompute/slurm/bin/xcompute-daemon/data/
/xcompute/logs/
Within this directory you will find the
messages
directory and anxspot
directory. Thisxspot
directory will contain X-Spot specific logs, in addition to logs for the jobs and workers spawned by that controller.