Slurm History Ingestor

Overview

The Slurm History Ingestor is a standalone Go service that syncs job history from Slurm HPC clusters into PostgreSQL for analytics and reporting.

Key Features:

Incremental Syncing – Only fetches new jobs since last sync
Robust Data Handling – Uses lookback window to catch out-of-order jobs
Data Normalization – Converts TRES strings to query-friendly columns
Multi-Cluster Support – Tag records with cluster name for centralized databases

Quick Start

The fastest way to get started is with the interactive setup script:

git clone https://github.com/thediymaker/slurm-history-ingestor.git
cd slurm-history-ingestor
chmod +x setup.sh
./setup.sh

The script will guide you through:

Choosing between sacct mode (recommended) or API mode
Database configuration and running all 3 migrations
Environment variable setup (.env file creation)
Building the binary from source
Optional systemd service installation

What you'll need before running the script:

PostgreSQL 13+ (can be remote)
Slurm 20.11+ with sacct command access
PostgreSQL client (psql) installed locally
Go 1.22+ (for building from source)

Installation takes about 5 minutes with the interactive script. If you prefer not to build from source or want more control, see the manual installation below.

Prerequisites

Database Requirements

Component	Version	Notes
PostgreSQL Server	13+	Can be on a different machine
PostgreSQL Client	Any	For running migrations (`psql` command)

Slurm Requirements

Component	Version	Notes
Slurm	20.11+	Must have `sacct` command available
Access Level	User	Service must run as user with `sacct` permissions

Build Requirements (for setup.sh script)

The interactive setup script builds from source, so you'll need:

Go 1.22+
sqlc (automatically installed by the script)

Installing PostgreSQL Client

Install psql on the machine where you'll run the ingestor:

sudo apt update && sudo apt install -y postgresql-client

Installation (Manual Method)

For production deployments or if you prefer not to build from source, use the pre-built binary and follow these manual steps.

Note:

Setup Script vs Manual Installation:

Setup Script (setup.sh) – Builds from source, runs all migrations, and can optionally set up systemd. Best for development or if you want everything automated.
Manual Installation – Uses pre-built binary, gives you full control over each step. Best for production deployments.

Step 1: Download the Binary

# Create installation directory
sudo mkdir -p /opt/slurm-ingestor
cd /opt/slurm-ingestor

# Download the latest release (Linux x64)
sudo wget https://github.com/thediymaker/slurm-history-ingestor/releases/latest/download/slurm-ingestor-linux-amd64 -O slurm-ingestor
sudo chmod +x slurm-ingestor

Step 2: Set Up the Database

Apply all three migrations in order. Choose one of these methods:

Method A: Using local files (if you cloned the repo)

psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/001_init.sql
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/002_add_gpu_fields.sql
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/003_add_gpu_metrics.sql

Method B: Download migrations directly

wget https://raw.githubusercontent.com/thediymaker/slurm-history-ingestor/main/db/migrations/001_init.sql
wget https://raw.githubusercontent.com/thediymaker/slurm-history-ingestor/main/db/migrations/002_add_gpu_fields.sql
wget https://raw.githubusercontent.com/thediymaker/slurm-history-ingestor/main/db/migrations/003_add_gpu_metrics.sql

psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f 001_init.sql
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f 002_add_gpu_fields.sql
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f 003_add_gpu_metrics.sql

Step 3: Configure the Service

Get the configuration template and customize it:

# Download the example config (or copy from repo if cloned)
sudo wget https://raw.githubusercontent.com/thediymaker/slurm-history-ingestor/main/.env.example -O .env

# Edit with your database and cluster information
sudo vim .env

# Secure the file
sudo chmod 600 .env

Minimal configuration example:

# Database connection
DATABASE_URL=postgres://slurm_user:yourpassword@db-host:5432/slurm_history?sslmode=disable

# Cluster identification
CLUSTER_NAME=production-hpc

# Use sacct mode (recommended)
INGEST_MODE=sacct
SACCT_PATH=/usr/bin/sacct

# Sync every 5 minutes
SYNC_INTERVAL=300

# Start syncing from this date on first run
INITIAL_SYNC_DATE=2024-01-01

Step 4: Create the Systemd Service

Create /etc/systemd/system/slurm-ingestor.service:

sudo vim /etc/systemd/system/slurm-ingestor.service

Add this content:

[Unit]
Description=Slurm History Ingestor
After=network.target remote-fs.target munge.service
Requires=remote-fs.target munge.service

[Service]
Type=simple
User=slurm
Group=slurm
WorkingDirectory=/opt/slurm-ingestor
ExecStart=/opt/slurm-ingestor/slurm-ingestor
Restart=on-failure
RestartSec=10
EnvironmentFile=/opt/slurm-ingestor/.env

[Install]
WantedBy=multi-user.target

Step 5: Start the Service

# Reload systemd to recognize the new service
sudo systemctl daemon-reload

# Enable and start the service
sudo systemctl enable --now slurm-ingestor

# Check that it's running
systemctl status slurm-ingestor

# Watch the logs
journalctl -u slurm-ingestor -f

Configuration Reference

Configure the ingestor using environment variables in your .env file.

Required Settings

Variable	Description	Example
`DATABASE_URL`	PostgreSQL connection string	`postgres://user:pass@localhost:5432/slurm_history?sslmode=disable`
`CLUSTER_NAME`	Unique identifier for this cluster	`production-hpc`

Ingest Mode Selection

Choose sacct mode (recommended) or API mode:

Sacct Mode – Direct command-line interface (faster and more reliable)

Variable	Description	Default
`INGEST_MODE`	Set to `sacct`	-
`SACCT_PATH`	Path to sacct binary	`sacct`

Note:

Sacct mode requires running on a Slurm node with sacct command access. This is the recommended mode for production.

API Mode – REST API interface (for remote access)

Variable	Description	Example
`INGEST_MODE`	Set to `api`	-
`SLURM_SERVER`	Slurm REST API URL	`http://slurm-head:6820`
`SLURM_API_ACCOUNT`	Username for API authentication	`slurm_api_user`
`SLURM_API_TOKEN`	JWT token for authentication	`eyJhbGc...`
`SLURM_API_VERSION`	API version string	`v0.0.41`

Sync Behavior

Variable	Default	Description
`SYNC_INTERVAL`	`300`	Seconds between sync cycles
`INITIAL_SYNC_DATE`	`2024-01-01`	Start date for first sync (YYYY-MM-DD format)
`CHUNK_HOURS`	`24`	Hours of data to fetch per request

Advanced Options

Variable	Default	Description
`HTTP_TIMEOUT`	`120`	API request timeout in seconds (API mode only)
`DEBUG`	`false`	Enable verbose logging for troubleshooting

Alternative Installation Methods

Docker Deployment

Note:

Docker is primarily useful for API mode or when you can mount the sacct binary into the container. For most deployments, the binary + systemd approach is simpler and more reliable.

Step 1: Clone and configure

git clone https://github.com/thediymaker/slurm-history-ingestor.git
cd slurm-history-ingestor
cp .env.example .env
vim .env

Step 2: Run database migrations

If using an external PostgreSQL server:

psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/001_init.sql
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/002_add_gpu_fields.sql
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/003_add_gpu_metrics.sql

If using Docker Compose with bundled PostgreSQL:

docker compose up -d postgres
docker compose exec postgres psql -U slurm_user -d slurm_history -f /docker-entrypoint-initdb.d/001_init.sql
docker compose exec postgres psql -U slurm_user -d slurm_history -f /docker-entrypoint-initdb.d/002_add_gpu_fields.sql
docker compose exec postgres psql -U slurm_user -d slurm_history -f /docker-entrypoint-initdb.d/003_add_gpu_metrics.sql

Step 3: Start the ingestor

docker compose up -d ingestor
docker compose logs -f ingestor

Building from Source

For development or custom builds:

# Clone the repository
git clone https://github.com/thediymaker/slurm-history-ingestor.git
cd slurm-history-ingestor

# Generate database code
go install github.com/sqlc-dev/sqlc/cmd/sqlc@latest
sqlc generate

# Build the binary
go mod tidy
go build -o slurm-ingestor cmd/ingest/main.go

# Run migrations
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/001_init.sql
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/002_add_gpu_fields.sql
psql -h YOUR_DB_HOST -U YOUR_DB_USER -d YOUR_DB_NAME -f db/migrations/003_add_gpu_metrics.sql

# Configure and run
cp .env.example .env
vim .env
./slurm-ingestor

Storage Planning

Each job record uses approximately 500 bytes of storage:

Job Count	Estimated Storage
100,000	~50 MB
1 million	~500 MB
10 million	~5 GB
100 million	~50 GB

Recommendations for large deployments:

Use a dedicated PostgreSQL server for >1M jobs
Consider table partitioning by submit_time for >10M jobs
Plan for index overhead (roughly 30-40% of table size)

Troubleshooting

Common Issues

Problem	Solution
Connection Refused	Verify `SLURM_SERVER` URL is correct and slurmrestd is running
Authentication Failed	Check that `SLURM_API_ACCOUNT` exists and token is valid
No Jobs Syncing	Enable `DEBUG=true` and verify `CLUSTER_NAME` matches your cluster
Migration Errors	Tables may already exist from previous setup – usually safe to ignore
Sacct Permission Denied	Ensure the service user has permissions to run `sacct`
Service Won't Start	Check `journalctl -u slurm-ingestor -xe` for detailed error messages

Debugging Tips

Enable debug logging:

# Add to .env file
DEBUG=true

# Restart service
sudo systemctl restart slurm-ingestor

Watch logs in real-time:
```
journalctl -u slurm-ingestor -f
```

Test database connection:

psql $DATABASE_URL -c "SELECT COUNT(*) FROM slurm_jobs;"

Test sacct access:

sudo -u slurm sacct --starttime now-1hour --format=JobID,JobName,State

Getting Help

If you encounter issues not covered here:

Check the GitHub Issues
Review systemd logs: journalctl -u slurm-ingestor -n 100
Verify your .env configuration matches the examples above

PreviousHierarchy

NextOpen OnDemand

Getting Started

Dashboard Overview

Installation

Advanced Features

AI Integration

Reporting

Integrations

Customization

Tutorials

Slurm History Ingestor

Overview

Quick Start

Prerequisites

Database Requirements

Slurm Requirements

Build Requirements (for setup.sh script)

Installing PostgreSQL Client

Installation (Manual Method)

Step 1: Download the Binary

Step 2: Set Up the Database

Step 3: Configure the Service

Step 4: Create the Systemd Service

Step 5: Start the Service

Configuration Reference

Required Settings

Ingest Mode Selection

Sync Behavior

Advanced Options

Alternative Installation Methods

Docker Deployment

Building from Source

Storage Planning

Troubleshooting

Common Issues

Debugging Tips

Getting Help

On this page