Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

This troubleshooting guide is designed to help you identify and resolve common issues in your Infrastructure Optimizer environment. Using this guide, you can quickly pinpoint problems and implement solutions to ensure smooth operations.

Logs

For diagnosing issues related to the Management Server, you will primarily rely on the following log locations:

  • /var/log/messages

  • /var/log/munge/munged.log

  • /var/log/slurm/aws.log

  • /var/log/slurm/slurmctld.log

  • /home/slurm

For any debugging relating to the Controller and the Worker, the logs can be found at:

  • /xcompute/slurm/bin/xcompute-daemon/data/

  • /xcompute/logs/

    • Within this directory you will find the messages directory and an xspot directory. This xspot directory will contain X-Spot specific logs, in addition to logs for the jobs and workers spawned by that controller.

Pack all logs:

tar -czvf exostellar-logs-$(date +%Y-%m-%d).tar.gz /var/log/messages /var/log/munge/munged.log /var/log/slurm/aws.log /var/log/slurm/slurmctld.log /home/slurm /xcompute/logs /xcompute/slurm/bin/xcompute-daemon/data

  • No labels