docs: Add comprehensive cache troubleshooting guide
- Document the root cause of cache timeout errors - Explain all implemented solutions - Provide step-by-step fix instructions - Include verification and troubleshooting steps - Add support resources and additional help
This commit is contained in:
140
docs/CACHE_TROUBLESHOOTING.md
Normal file
140
docs/CACHE_TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
# Cache Troubleshooting Guide
|
||||||
|
|
||||||
|
## Problem Description
|
||||||
|
|
||||||
|
The LabFusion CI/CD pipelines were experiencing cache timeout errors:
|
||||||
|
```
|
||||||
|
::warning::Failed to restore: getCacheEntry failed: connect ETIMEDOUT 172.31.0.3:44029
|
||||||
|
```
|
||||||
|
|
||||||
|
This error occurs when the cache service is not accessible from the job containers due to Docker networking issues.
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
The issue is caused by:
|
||||||
|
1. **Docker Networking**: Containers can't reach the cache server on the host
|
||||||
|
2. **Random Port Assignment**: Using port 0 causes unpredictable port assignments
|
||||||
|
3. **Cache Service Location**: The cache service binds to an IP that containers can't access
|
||||||
|
|
||||||
|
## Solutions Implemented
|
||||||
|
|
||||||
|
### 1. Workflow-Level Fixes
|
||||||
|
|
||||||
|
Added `fail-on-cache-miss: false` to all cache actions in:
|
||||||
|
- `.gitea/workflows/api-gateway.yml`
|
||||||
|
- `.gitea/workflows/frontend.yml`
|
||||||
|
- `.gitea/workflows/service-adapters.yml`
|
||||||
|
- `.gitea/workflows/api-docs.yml`
|
||||||
|
- `.gitea/workflows/ci.yml`
|
||||||
|
|
||||||
|
This ensures that cache failures don't cause the entire pipeline to fail.
|
||||||
|
|
||||||
|
### 2. Runner Configuration Fixes
|
||||||
|
|
||||||
|
Created `runners/config_cache_fixed.yaml` with:
|
||||||
|
- **Fixed Host**: `host.docker.internal` (allows containers to access host)
|
||||||
|
- **Fixed Port**: `44029` (instead of random port 0)
|
||||||
|
- **Host Network**: Uses host networking for better connectivity
|
||||||
|
|
||||||
|
### 3. Troubleshooting Tools
|
||||||
|
|
||||||
|
Created diagnostic scripts:
|
||||||
|
- `runners/fix-cache-issues.sh` (Linux/macOS)
|
||||||
|
- `runners/fix-cache-issues.ps1` (Windows)
|
||||||
|
|
||||||
|
These scripts help diagnose and fix cache issues.
|
||||||
|
|
||||||
|
## How to Apply the Fixes
|
||||||
|
|
||||||
|
### Option 1: Use the Fixed Configuration
|
||||||
|
|
||||||
|
1. Stop your current runner:
|
||||||
|
```bash
|
||||||
|
pkill -f act_runner
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Start with the fixed configuration:
|
||||||
|
```bash
|
||||||
|
./act_runner daemon --config config_cache_fixed.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Run the Troubleshooting Script
|
||||||
|
|
||||||
|
**Linux/macOS:**
|
||||||
|
```bash
|
||||||
|
cd runners
|
||||||
|
./fix-cache-issues.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Windows:**
|
||||||
|
```powershell
|
||||||
|
cd runners
|
||||||
|
.\fix-cache-issues.ps1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 3: Manual Configuration
|
||||||
|
|
||||||
|
Update your runner configuration with these key changes:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
cache:
|
||||||
|
enabled: true
|
||||||
|
host: "host.docker.internal" # Fixed host
|
||||||
|
port: 44029 # Fixed port
|
||||||
|
|
||||||
|
container:
|
||||||
|
network: "host" # Use host networking
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
After applying the fixes:
|
||||||
|
|
||||||
|
1. **Check Runner Logs**: Look for cache service startup messages
|
||||||
|
2. **Test a Workflow**: Run a simple workflow to verify cache works
|
||||||
|
3. **Monitor Cache Hits**: Check if dependencies are being cached properly
|
||||||
|
|
||||||
|
## Expected Results
|
||||||
|
|
||||||
|
- ✅ No more `ETIMEDOUT` errors
|
||||||
|
- ✅ Cache hits show "✅ Cache hit!" messages
|
||||||
|
- ✅ Faster build times due to dependency caching
|
||||||
|
- ✅ Workflows continue even if cache fails
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
If issues persist:
|
||||||
|
|
||||||
|
1. **Check Docker Networking**:
|
||||||
|
```bash
|
||||||
|
docker network ls
|
||||||
|
docker network inspect bridge
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Verify Cache Service**:
|
||||||
|
```bash
|
||||||
|
netstat -tlnp | grep 44029
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Test Connectivity**:
|
||||||
|
```bash
|
||||||
|
curl http://host.docker.internal:44029/
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Check Runner Logs**:
|
||||||
|
```bash
|
||||||
|
tail -f runner.log
|
||||||
|
```
|
||||||
|
|
||||||
|
## Additional Resources
|
||||||
|
|
||||||
|
- [Gitea Act Runner Documentation](https://gitea.com/gitea/act_runner/src/branch/main/docs/configuration.md)
|
||||||
|
- [GitHub Actions Cache Documentation](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows)
|
||||||
|
- [Docker Networking Documentation](https://docs.docker.com/network/)
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
If you continue to experience cache issues after applying these fixes, please:
|
||||||
|
1. Run the troubleshooting script and share the output
|
||||||
|
2. Check the runner logs for any error messages
|
||||||
|
3. Verify your Docker and network configuration
|
||||||
Reference in New Issue
Block a user