Troubleshooting

Common issues and solutions when running Stategraph.

Deployment Issues

Container won't start

Symptoms: Server container exits immediately or keeps restarting.

Check logs:

docker compose logs server

Missing required environment variables

Error: Key_error "STATEGRAPH_UI_BASE"

Solution:
- Set all required environment variables (see Environment Variables)

Database connection failed

Error: Could not connect to database

Solution:
- Verify database is running and credentials are correct

Port already in use

Error: bind: address already in use

Solution:
- Stop the service using the port or change STATEGRAPH_PORT

Database connection errors

Symptoms: Server starts but can't connect to PostgreSQL.

Checklist:

  • PostgreSQL container is healthy: docker compose ps
  • DB_HOST matches the service name (e.g., db for Docker Compose)
  • DB_PORT is correct (default: 5432)
  • DB_USER, DB_PASS, DB_NAME match PostgreSQL configuration
  • Network connectivity between containers

Test connection:

docker compose exec server nc -zv db 5432

Health check failing

Symptoms: Container marked unhealthy, restarts repeatedly.

Check health endpoint:

curl http://localhost:8080/api/v1/health

Common causes:
- Database not ready yet (increase depends_on timeout)
- Port mismatch between health check and actual port
- Internal service not starting

Authentication Issues

OAuth redirect errors

"redirect_uri_mismatch"

The callback URL doesn't match your OAuth provider configuration.

Solution:
1. Check STATEGRAPH_UI_BASE matches your access URL exactly
2. Add exact callback URL to OAuth provider:
- For Google: {STATEGRAPH_UI_BASE}/oauth2/google/callback
- For OIDC: {STATEGRAPH_UI_BASE}/oauth2/oidc/callback
3. Verify protocol (http vs https) matches

"invalid_client"

Client ID or secret is incorrect.

Solution:
- Verify credentials from OAuth provider dashboard
- Check for extra whitespace or newlines
- Regenerate secret if needed

Session not persisting

Symptoms: Login succeeds but immediately redirects back to login.

Causes:
1. URL mismatch: STATEGRAPH_UI_BASE differs from access URL
2. Cookie not set: Proxy stripping cookies, or SameSite issues
3. HTTPS mismatch: Accessing via http when configured for https

Solutions:
- Verify STATEGRAPH_UI_BASE exactly matches your browser URL
- Check for reverse proxy cookie handling
- Use consistent protocol

"Access denied" after authentication

Causes:
1. Email domain restriction: User email not in allowed domain
2. Google Groups: User not in required group
3. OAuth app not approved in organization

Solutions:
- Check STATEGRAPH_OAUTH_EMAIL_DOMAIN setting
- Verify group membership (Google Groups)
- Request app approval from organization admin

Terraform Backend Issues

"Failed to get existing workspaces"

Symptoms: terraform init fails with HTTP error.

Causes:
1. Stategraph server not running
2. URL incorrect
3. Network connectivity
4. Authentication failure

Solutions:

# Test connectivity
curl http://localhost:8080/api/v1/health

# Test with credentials
curl -H "Authorization: Bearer $STATEGRAPH_API_KEY" http://localhost:8080/api/v1/whoami

"Error acquiring the state lock"

Symptoms: Terraform can't acquire lock, another process may be running.

Solutions:
1. Wait for other operation to complete
2. If stuck, force unlock:
bash terraform force-unlock LOCK_ID
3. Check for crashed Terraform processes

"HTTP error: 401 Unauthorized"

Causes:
1. API key invalid or expired
2. Username not set to session
3. Token has leading/trailing whitespace

Solutions:
- Create a new API key
- Verify username = "session" in backend config
- Check token value for whitespace

Large state timeout

Symptoms: Operations fail for large state files.

Solutions:
1. Increase STATEGRAPH_CLIENT_MAX_BODY_SIZE (default: 512m)
2. Check reverse proxy timeouts
3. Consider splitting state into smaller files

UI Issues

Page won't load

Symptoms: Browser shows blank page or error.

Check:
1. Browser developer console for JavaScript errors
2. Network tab for failed requests
3. Server logs for backend errors

Solutions:
- Clear browser cache
- Try incognito/private mode
- Check CORS settings if UI is separate

Query returns no results

Symptoms: MQL query runs but returns empty.

Causes:
1. No matching data
2. Query syntax issue
3. Wrong table or column names

Debug steps:

-- Verify data exists
SELECT count(*) FROM instances

-- Check available types
SELECT DISTINCT r.type FROM resources r ORDER BY r.type

-- Simplify query
SELECT * FROM instances LIMIT 10

Graph won't render

Symptoms: Dependency graph blank or shows error.

Causes:
1. State has no resources
2. Very large state causing performance issues
3. Browser memory limitations

Solutions:
- Check state has resources in the list view
- Apply filters to reduce graph size
- Try a different browser

Performance Issues

Slow queries

Symptoms: MQL queries take long to execute.

Solutions:
1. Add LIMIT clause:
sql SELECT * FROM instances LIMIT 100
2. Use specific columns instead of *
3. Add filters early in query
4. Check database indexes

High memory usage

Causes:
1. Large state files
2. Many concurrent connections
3. Memory leak (report as bug)

Solutions:
- Increase container memory limits
- Reduce DB_MAX_POOL_SIZE
- Monitor and restart periodically if needed

Database connection exhaustion

Symptoms: "too many connections" errors.

Solutions:
1. Reduce DB_MAX_POOL_SIZE
2. Increase PostgreSQL max_connections
3. Check for connection leaks

Gap Analysis Issues

"Not ready for gap analysis"

Symptoms: Gap analysis reports not ready.

Causes:
1. AWS Config not enabled
2. Aggregator not configured
3. Missing IAM permissions

Solutions:
- Enable AWS Config in your account
- Create a configuration aggregator
- Grant Stategraph required permissions

AWS resources not appearing

Causes:
1. AWS Config not recording resource types
2. Aggregator missing regions
3. Stale cache

Solutions:
- Verify AWS Config recording settings
- Check aggregator configuration
- Use source=no-cache to force refresh

Getting Help

Collect diagnostic information

Before reporting issues, gather:

  1. Server logs:
    bash docker compose logs server > server.log 2>&1

  2. Environment (redact secrets):
    bash docker compose config

  3. Version information:
    bash docker compose images

  4. Health check output:
    bash curl http://localhost:8080/api/v1/health

Reporting issues

Report issues at: https://github.com/stategraph/releases/issues

Include:
- Description of the problem
- Steps to reproduce
- Expected vs actual behavior
- Diagnostic information (above)
- Screenshots if applicable

Common Error Messages

Error Cause Solution
Key_error "..." Missing environment variable Set the required variable
Connection refused Service not running Start the service
401 Unauthorized Invalid credentials Check token/session
redirect_uri_mismatch OAuth URL mismatch Fix callback URL
Lock held by another Concurrent Terraform Wait or force-unlock
too many connections Pool exhausted Reduce pool size