- Replace 'mvn' commands with './mvnw' in CI workflow - Add chmod +x ./mvnw step to make Maven wrapper executable - Add cache: maven to Java setup step for better caching - Update troubleshooting scripts to use correct port 40047 - Update documentation to reflect port change This fixes the 'mvn: command not found' error by ensuring all Maven commands use the Maven wrapper (mvnw) which is included in the project and doesn't require Maven to be pre-installed on the runner.
3.9 KiB
Cache Troubleshooting Guide
Problem Description
The LabFusion CI/CD pipelines were experiencing cache timeout errors:
::warning::Failed to restore: getCacheEntry failed: connect ETIMEDOUT 172.31.0.3:44029
This error occurs when the cache service is not accessible from the job containers due to Docker networking issues.
Root Cause
The issue is caused by:
- Docker Networking: Containers can't reach the cache server on the host
- Random Port Assignment: Using port 0 causes unpredictable port assignments
- Cache Service Location: The cache service binds to an IP that containers can't access
Solutions Implemented
1. Workflow-Level Fixes
Added fail-on-cache-miss: false to all cache actions in:
.gitea/workflows/api-gateway.yml.gitea/workflows/frontend.yml.gitea/workflows/service-adapters.yml.gitea/workflows/api-docs.yml.gitea/workflows/ci.yml
This ensures that cache failures don't cause the entire pipeline to fail.
2. Runner Configuration Fixes
Updated all existing runner configuration files with:
- Auto-detect Host: Empty host field (allows act_runner to auto-detect the correct IP)
- Fixed Port:
40047(instead of random port 0) - Host Network: Uses host networking for better connectivity
Updated files:
runners/config_docker.yamlrunners/config_heavy.yamlrunners/config_light.yamlrunners/config_security.yaml
3. Troubleshooting Tools
Created diagnostic scripts:
runners/fix-cache-issues.sh(Linux/macOS)runners/fix-cache-issues.ps1(Windows)
These scripts help diagnose and fix cache issues.
How to Apply the Fixes
Option 1: Use the Updated Configuration
-
Stop your current runner:
pkill -f act_runner -
Start with an updated configuration:
./act_runner daemon --config config_docker.yaml # or ./act_runner daemon --config config_heavy.yaml # or ./act_runner daemon --config config_light.yaml # or ./act_runner daemon --config config_security.yaml
Option 2: Run the Troubleshooting Script
Linux/macOS:
cd runners
./fix-cache-issues.sh
Windows:
cd runners
.\fix-cache-issues.ps1
Option 3: Manual Configuration
Update your runner configuration with these key changes:
cache:
enabled: true
host: "" # Auto-detect host IP
port: 40047 # Fixed port
container:
network: "host" # Use host networking
Verification
After applying the fixes:
- Check Runner Logs: Look for cache service startup messages
- Test a Workflow: Run a simple workflow to verify cache works
- Monitor Cache Hits: Check if dependencies are being cached properly
Expected Results
- ✅ No more
ETIMEDOUTerrors - ✅ Cache hits show "✅ Cache hit!" messages
- ✅ Faster build times due to dependency caching
- ✅ Workflows continue even if cache fails
Troubleshooting
If issues persist:
-
Check Docker Networking:
docker network ls docker network inspect bridge -
Verify Cache Service:
netstat -tlnp | grep 44029 -
Test Connectivity:
curl http://host.docker.internal:44029/ -
Check Runner Logs:
tail -f runner.log
Additional Resources
Support
If you continue to experience cache issues after applying these fixes, please:
- Run the troubleshooting script and share the output
- Check the runner logs for any error messages
- Verify your Docker and network configuration