Files

GSRN e3800b49b8 docs: Add comprehensive cache troubleshooting guide

- Document the root cause of cache timeout errors
- Explain all implemented solutions
- Provide step-by-step fix instructions
- Include verification and troubleshooting steps
- Add support resources and additional help

2025-09-15 16:41:11 +02:00

3.5 KiB

Raw Blame History

Cache Troubleshooting Guide

Problem Description

The LabFusion CI/CD pipelines were experiencing cache timeout errors:

::warning::Failed to restore: getCacheEntry failed: connect ETIMEDOUT 172.31.0.3:44029

This error occurs when the cache service is not accessible from the job containers due to Docker networking issues.

Root Cause

The issue is caused by:

Docker Networking: Containers can't reach the cache server on the host
Random Port Assignment: Using port 0 causes unpredictable port assignments
Cache Service Location: The cache service binds to an IP that containers can't access

Solutions Implemented

1. Workflow-Level Fixes

Added fail-on-cache-miss: false to all cache actions in:

.gitea/workflows/api-gateway.yml
.gitea/workflows/frontend.yml
.gitea/workflows/service-adapters.yml
.gitea/workflows/api-docs.yml
.gitea/workflows/ci.yml

This ensures that cache failures don't cause the entire pipeline to fail.

2. Runner Configuration Fixes

Created runners/config_cache_fixed.yaml with:

Fixed Host: host.docker.internal (allows containers to access host)
Fixed Port: 44029 (instead of random port 0)
Host Network: Uses host networking for better connectivity

3. Troubleshooting Tools

Created diagnostic scripts:

runners/fix-cache-issues.sh (Linux/macOS)
runners/fix-cache-issues.ps1 (Windows)

These scripts help diagnose and fix cache issues.

How to Apply the Fixes

Option 1: Use the Fixed Configuration

Stop your current runner:
```
pkill -f act_runner
```

Start with the fixed configuration:

./act_runner daemon --config config_cache_fixed.yaml

Option 2: Run the Troubleshooting Script

Linux/macOS:

cd runners
./fix-cache-issues.sh

Windows:

cd runners
.\fix-cache-issues.ps1

Option 3: Manual Configuration

Update your runner configuration with these key changes:

cache:
  enabled: true
  host: "host.docker.internal"  # Fixed host
  port: 44029                   # Fixed port

container:
  network: "host"               # Use host networking

Verification

After applying the fixes:

Check Runner Logs: Look for cache service startup messages
Test a Workflow: Run a simple workflow to verify cache works
Monitor Cache Hits: Check if dependencies are being cached properly

Expected Results

✅ No more ETIMEDOUT errors
✅ Cache hits show "✅ Cache hit!" messages
✅ Faster build times due to dependency caching
✅ Workflows continue even if cache fails

Troubleshooting

If issues persist:

Check Docker Networking:

docker network ls
docker network inspect bridge

Verify Cache Service:
```
netstat -tlnp | grep 44029
```

Test Connectivity:

curl http://host.docker.internal:44029/

Check Runner Logs:
```
tail -f runner.log
```

Additional Resources

Support

If you continue to experience cache issues after applying these fixes, please:

Run the troubleshooting script and share the output
Check the runner logs for any error messages
Verify your Docker and network configuration

3.5 KiB Raw Blame History