Files

GSRN 65c93ae685 fix: Resolve host.docker.internal hostname resolution issue

- Change cache host from 'host.docker.internal' to empty string
- Allow act_runner to auto-detect the correct host IP address
- Update all runner configs: docker, heavy, light, security
- Improve troubleshooting scripts with host IP detection:
  - Linux/macOS: Use ip route, hostname -I, or ifconfig
  - Windows: Use Get-NetIPAddress PowerShell cmdlets
- Update documentation to reflect auto-detection approach

This resolves the 'getaddrinfo ENOTFOUND host.docker.internal' error
by using a more compatible approach that works across different
Docker setups and operating systems.

2025-09-15 17:00:04 +02:00

3.9 KiB

Raw Blame History

Cache Troubleshooting Guide

Problem Description

The LabFusion CI/CD pipelines were experiencing cache timeout errors:

::warning::Failed to restore: getCacheEntry failed: connect ETIMEDOUT 172.31.0.3:44029

This error occurs when the cache service is not accessible from the job containers due to Docker networking issues.

Root Cause

The issue is caused by:

Docker Networking: Containers can't reach the cache server on the host
Random Port Assignment: Using port 0 causes unpredictable port assignments
Cache Service Location: The cache service binds to an IP that containers can't access

Solutions Implemented

1. Workflow-Level Fixes

Added fail-on-cache-miss: false to all cache actions in:

.gitea/workflows/api-gateway.yml
.gitea/workflows/frontend.yml
.gitea/workflows/service-adapters.yml
.gitea/workflows/api-docs.yml
.gitea/workflows/ci.yml

This ensures that cache failures don't cause the entire pipeline to fail.

2. Runner Configuration Fixes

Updated all existing runner configuration files with:

Auto-detect Host: Empty host field (allows act_runner to auto-detect the correct IP)
Fixed Port: 44029 (instead of random port 0)
Host Network: Uses host networking for better connectivity

Updated files:

runners/config_docker.yaml
runners/config_heavy.yaml
runners/config_light.yaml
runners/config_security.yaml

3. Troubleshooting Tools

Created diagnostic scripts:

runners/fix-cache-issues.sh (Linux/macOS)
runners/fix-cache-issues.ps1 (Windows)

These scripts help diagnose and fix cache issues.

How to Apply the Fixes

Option 1: Use the Updated Configuration

Stop your current runner:
```
pkill -f act_runner
```

Start with an updated configuration:

./act_runner daemon --config config_docker.yaml
# or
./act_runner daemon --config config_heavy.yaml
# or
./act_runner daemon --config config_light.yaml
# or
./act_runner daemon --config config_security.yaml

Option 2: Run the Troubleshooting Script

Linux/macOS:

cd runners
./fix-cache-issues.sh

Windows:

cd runners
.\fix-cache-issues.ps1

Option 3: Manual Configuration

Update your runner configuration with these key changes:

cache:
  enabled: true
  host: ""                      # Auto-detect host IP
  port: 44029                   # Fixed port

container:
  network: "host"               # Use host networking

Verification

After applying the fixes:

Check Runner Logs: Look for cache service startup messages
Test a Workflow: Run a simple workflow to verify cache works
Monitor Cache Hits: Check if dependencies are being cached properly

Expected Results

✅ No more ETIMEDOUT errors
✅ Cache hits show "✅ Cache hit!" messages
✅ Faster build times due to dependency caching
✅ Workflows continue even if cache fails

Troubleshooting

If issues persist:

Check Docker Networking:

docker network ls
docker network inspect bridge

Verify Cache Service:
```
netstat -tlnp | grep 44029
```

Test Connectivity:

curl http://host.docker.internal:44029/

Check Runner Logs:
```
tail -f runner.log
```

Additional Resources

Support

If you continue to experience cache issues after applying these fixes, please:

Run the troubleshooting script and share the output
Check the runner logs for any error messages
Verify your Docker and network configuration

3.9 KiB Raw Blame History