Files
labFusion/docs/CACHE_TROUBLESHOOTING.md
GSRN 79250ea3ab
Some checks failed
Docker Build and Push / build-and-push (push) Failing after 31s
API Docs (Node.js Express) / test (20) (push) Successful in 3m56s
API Docs (Node.js Express) / test (16) (push) Successful in 4m4s
API Docs (Node.js Express) / test (18) (push) Successful in 4m10s
LabFusion CI/CD Pipeline / api-gateway (push) Failing after 1m22s
LabFusion CI/CD Pipeline / api-docs (push) Successful in 1m2s
API Gateway (Java Spring Boot) / test (17) (push) Failing after 2m39s
API Gateway (Java Spring Boot) / test (21) (push) Failing after 2m45s
API Gateway (Java Spring Boot) / build (push) Has been skipped
API Gateway (Java Spring Boot) / security (push) Has been skipped
LabFusion CI/CD Pipeline / service-adapters (push) Failing after 3m21s
Frontend (React) / test (16) (push) Failing after 1m46s
LabFusion CI/CD Pipeline / frontend (push) Failing after 1m59s
LabFusion CI/CD Pipeline / integration-tests (push) Has been skipped
Frontend (React) / test (18) (push) Failing after 1m50s
Integration Tests / integration-tests (push) Failing after 49s
Integration Tests / performance-tests (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.1) (push) Failing after 1m7s
Frontend (React) / test (20) (push) Failing after 2m30s
Frontend (React) / build (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.11) (push) Failing after 1m43s
Frontend (React) / lighthouse (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.9) (push) Failing after 1m2s
Service Adapters (Python FastAPI) / test (3.12) (push) Failing after 1m43s
Service Adapters (Python FastAPI) / build (push) Has been skipped
API Docs (Node.js Express) / build (push) Successful in 59s
refactor: Apply cache fixes directly to existing runner configs
- Update all runner configuration files with cache networking fixes:
  - config_docker.yaml
  - config_heavy.yaml
  - config_light.yaml
  - config_security.yaml
- Remove separate config_cache_fixed.yaml file
- Update troubleshooting scripts to use updated configs
- Update documentation to reference existing config files

All runner configs now have:
- Fixed cache host: host.docker.internal
- Fixed cache port: 44029
- Host networking for better container connectivity

This provides a cleaner approach by updating existing configs
instead of maintaining a separate fixed configuration file.
2025-09-15 16:44:16 +02:00

153 lines
3.8 KiB
Markdown

# Cache Troubleshooting Guide
## Problem Description
The LabFusion CI/CD pipelines were experiencing cache timeout errors:
```
::warning::Failed to restore: getCacheEntry failed: connect ETIMEDOUT 172.31.0.3:44029
```
This error occurs when the cache service is not accessible from the job containers due to Docker networking issues.
## Root Cause
The issue is caused by:
1. **Docker Networking**: Containers can't reach the cache server on the host
2. **Random Port Assignment**: Using port 0 causes unpredictable port assignments
3. **Cache Service Location**: The cache service binds to an IP that containers can't access
## Solutions Implemented
### 1. Workflow-Level Fixes
Added `fail-on-cache-miss: false` to all cache actions in:
- `.gitea/workflows/api-gateway.yml`
- `.gitea/workflows/frontend.yml`
- `.gitea/workflows/service-adapters.yml`
- `.gitea/workflows/api-docs.yml`
- `.gitea/workflows/ci.yml`
This ensures that cache failures don't cause the entire pipeline to fail.
### 2. Runner Configuration Fixes
Updated all existing runner configuration files with:
- **Fixed Host**: `host.docker.internal` (allows containers to access host)
- **Fixed Port**: `44029` (instead of random port 0)
- **Host Network**: Uses host networking for better connectivity
Updated files:
- `runners/config_docker.yaml`
- `runners/config_heavy.yaml`
- `runners/config_light.yaml`
- `runners/config_security.yaml`
### 3. Troubleshooting Tools
Created diagnostic scripts:
- `runners/fix-cache-issues.sh` (Linux/macOS)
- `runners/fix-cache-issues.ps1` (Windows)
These scripts help diagnose and fix cache issues.
## How to Apply the Fixes
### Option 1: Use the Updated Configuration
1. Stop your current runner:
```bash
pkill -f act_runner
```
2. Start with an updated configuration:
```bash
./act_runner daemon --config config_docker.yaml
# or
./act_runner daemon --config config_heavy.yaml
# or
./act_runner daemon --config config_light.yaml
# or
./act_runner daemon --config config_security.yaml
```
### Option 2: Run the Troubleshooting Script
**Linux/macOS:**
```bash
cd runners
./fix-cache-issues.sh
```
**Windows:**
```powershell
cd runners
.\fix-cache-issues.ps1
```
### Option 3: Manual Configuration
Update your runner configuration with these key changes:
```yaml
cache:
enabled: true
host: "host.docker.internal" # Fixed host
port: 44029 # Fixed port
container:
network: "host" # Use host networking
```
## Verification
After applying the fixes:
1. **Check Runner Logs**: Look for cache service startup messages
2. **Test a Workflow**: Run a simple workflow to verify cache works
3. **Monitor Cache Hits**: Check if dependencies are being cached properly
## Expected Results
- ✅ No more `ETIMEDOUT` errors
- ✅ Cache hits show "✅ Cache hit!" messages
- ✅ Faster build times due to dependency caching
- ✅ Workflows continue even if cache fails
## Troubleshooting
If issues persist:
1. **Check Docker Networking**:
```bash
docker network ls
docker network inspect bridge
```
2. **Verify Cache Service**:
```bash
netstat -tlnp | grep 44029
```
3. **Test Connectivity**:
```bash
curl http://host.docker.internal:44029/
```
4. **Check Runner Logs**:
```bash
tail -f runner.log
```
## Additional Resources
- [Gitea Act Runner Documentation](https://gitea.com/gitea/act_runner/src/branch/main/docs/configuration.md)
- [GitHub Actions Cache Documentation](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows)
- [Docker Networking Documentation](https://docs.docker.com/network/)
## Support
If you continue to experience cache issues after applying these fixes, please:
1. Run the troubleshooting script and share the output
2. Check the runner logs for any error messages
3. Verify your Docker and network configuration