- Document the root cause of cache timeout errors - Explain all implemented solutions - Provide step-by-step fix instructions - Include verification and troubleshooting steps - Add support resources and additional help
3.5 KiB
Cache Troubleshooting Guide
Problem Description
The LabFusion CI/CD pipelines were experiencing cache timeout errors:
::warning::Failed to restore: getCacheEntry failed: connect ETIMEDOUT 172.31.0.3:44029
This error occurs when the cache service is not accessible from the job containers due to Docker networking issues.
Root Cause
The issue is caused by:
- Docker Networking: Containers can't reach the cache server on the host
- Random Port Assignment: Using port 0 causes unpredictable port assignments
- Cache Service Location: The cache service binds to an IP that containers can't access
Solutions Implemented
1. Workflow-Level Fixes
Added fail-on-cache-miss: false to all cache actions in:
.gitea/workflows/api-gateway.yml.gitea/workflows/frontend.yml.gitea/workflows/service-adapters.yml.gitea/workflows/api-docs.yml.gitea/workflows/ci.yml
This ensures that cache failures don't cause the entire pipeline to fail.
2. Runner Configuration Fixes
Created runners/config_cache_fixed.yaml with:
- Fixed Host:
host.docker.internal(allows containers to access host) - Fixed Port:
44029(instead of random port 0) - Host Network: Uses host networking for better connectivity
3. Troubleshooting Tools
Created diagnostic scripts:
runners/fix-cache-issues.sh(Linux/macOS)runners/fix-cache-issues.ps1(Windows)
These scripts help diagnose and fix cache issues.
How to Apply the Fixes
Option 1: Use the Fixed Configuration
-
Stop your current runner:
pkill -f act_runner -
Start with the fixed configuration:
./act_runner daemon --config config_cache_fixed.yaml
Option 2: Run the Troubleshooting Script
Linux/macOS:
cd runners
./fix-cache-issues.sh
Windows:
cd runners
.\fix-cache-issues.ps1
Option 3: Manual Configuration
Update your runner configuration with these key changes:
cache:
enabled: true
host: "host.docker.internal" # Fixed host
port: 44029 # Fixed port
container:
network: "host" # Use host networking
Verification
After applying the fixes:
- Check Runner Logs: Look for cache service startup messages
- Test a Workflow: Run a simple workflow to verify cache works
- Monitor Cache Hits: Check if dependencies are being cached properly
Expected Results
- ✅ No more
ETIMEDOUTerrors - ✅ Cache hits show "✅ Cache hit!" messages
- ✅ Faster build times due to dependency caching
- ✅ Workflows continue even if cache fails
Troubleshooting
If issues persist:
-
Check Docker Networking:
docker network ls docker network inspect bridge -
Verify Cache Service:
netstat -tlnp | grep 44029 -
Test Connectivity:
curl http://host.docker.internal:44029/ -
Check Runner Logs:
tail -f runner.log
Additional Resources
Support
If you continue to experience cache issues after applying these fixes, please:
- Run the troubleshooting script and share the output
- Check the runner logs for any error messages
- Verify your Docker and network configuration