From e3800b49b831afbd22ee757419f5adc5bb07c1f3 Mon Sep 17 00:00:00 2001 From: GSRN Date: Mon, 15 Sep 2025 16:41:11 +0200 Subject: [PATCH] docs: Add comprehensive cache troubleshooting guide - Document the root cause of cache timeout errors - Explain all implemented solutions - Provide step-by-step fix instructions - Include verification and troubleshooting steps - Add support resources and additional help --- docs/CACHE_TROUBLESHOOTING.md | 140 ++++++++++++++++++++++++++++++++++ 1 file changed, 140 insertions(+) create mode 100644 docs/CACHE_TROUBLESHOOTING.md diff --git a/docs/CACHE_TROUBLESHOOTING.md b/docs/CACHE_TROUBLESHOOTING.md new file mode 100644 index 0000000..f913490 --- /dev/null +++ b/docs/CACHE_TROUBLESHOOTING.md @@ -0,0 +1,140 @@ +# Cache Troubleshooting Guide + +## Problem Description + +The LabFusion CI/CD pipelines were experiencing cache timeout errors: +``` +::warning::Failed to restore: getCacheEntry failed: connect ETIMEDOUT 172.31.0.3:44029 +``` + +This error occurs when the cache service is not accessible from the job containers due to Docker networking issues. + +## Root Cause + +The issue is caused by: +1. **Docker Networking**: Containers can't reach the cache server on the host +2. **Random Port Assignment**: Using port 0 causes unpredictable port assignments +3. **Cache Service Location**: The cache service binds to an IP that containers can't access + +## Solutions Implemented + +### 1. Workflow-Level Fixes + +Added `fail-on-cache-miss: false` to all cache actions in: +- `.gitea/workflows/api-gateway.yml` +- `.gitea/workflows/frontend.yml` +- `.gitea/workflows/service-adapters.yml` +- `.gitea/workflows/api-docs.yml` +- `.gitea/workflows/ci.yml` + +This ensures that cache failures don't cause the entire pipeline to fail. + +### 2. Runner Configuration Fixes + +Created `runners/config_cache_fixed.yaml` with: +- **Fixed Host**: `host.docker.internal` (allows containers to access host) +- **Fixed Port**: `44029` (instead of random port 0) +- **Host Network**: Uses host networking for better connectivity + +### 3. Troubleshooting Tools + +Created diagnostic scripts: +- `runners/fix-cache-issues.sh` (Linux/macOS) +- `runners/fix-cache-issues.ps1` (Windows) + +These scripts help diagnose and fix cache issues. + +## How to Apply the Fixes + +### Option 1: Use the Fixed Configuration + +1. Stop your current runner: + ```bash + pkill -f act_runner + ``` + +2. Start with the fixed configuration: + ```bash + ./act_runner daemon --config config_cache_fixed.yaml + ``` + +### Option 2: Run the Troubleshooting Script + +**Linux/macOS:** +```bash +cd runners +./fix-cache-issues.sh +``` + +**Windows:** +```powershell +cd runners +.\fix-cache-issues.ps1 +``` + +### Option 3: Manual Configuration + +Update your runner configuration with these key changes: + +```yaml +cache: + enabled: true + host: "host.docker.internal" # Fixed host + port: 44029 # Fixed port + +container: + network: "host" # Use host networking +``` + +## Verification + +After applying the fixes: + +1. **Check Runner Logs**: Look for cache service startup messages +2. **Test a Workflow**: Run a simple workflow to verify cache works +3. **Monitor Cache Hits**: Check if dependencies are being cached properly + +## Expected Results + +- ✅ No more `ETIMEDOUT` errors +- ✅ Cache hits show "✅ Cache hit!" messages +- ✅ Faster build times due to dependency caching +- ✅ Workflows continue even if cache fails + +## Troubleshooting + +If issues persist: + +1. **Check Docker Networking**: + ```bash + docker network ls + docker network inspect bridge + ``` + +2. **Verify Cache Service**: + ```bash + netstat -tlnp | grep 44029 + ``` + +3. **Test Connectivity**: + ```bash + curl http://host.docker.internal:44029/ + ``` + +4. **Check Runner Logs**: + ```bash + tail -f runner.log + ``` + +## Additional Resources + +- [Gitea Act Runner Documentation](https://gitea.com/gitea/act_runner/src/branch/main/docs/configuration.md) +- [GitHub Actions Cache Documentation](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows) +- [Docker Networking Documentation](https://docs.docker.com/network/) + +## Support + +If you continue to experience cache issues after applying these fixes, please: +1. Run the troubleshooting script and share the output +2. Check the runner logs for any error messages +3. Verify your Docker and network configuration