Files
labFusion/docs/CACHE_TROUBLESHOOTING.md
GSRN 65c93ae685 fix: Resolve host.docker.internal hostname resolution issue
- Change cache host from 'host.docker.internal' to empty string
- Allow act_runner to auto-detect the correct host IP address
- Update all runner configs: docker, heavy, light, security
- Improve troubleshooting scripts with host IP detection:
  - Linux/macOS: Use ip route, hostname -I, or ifconfig
  - Windows: Use Get-NetIPAddress PowerShell cmdlets
- Update documentation to reflect auto-detection approach

This resolves the 'getaddrinfo ENOTFOUND host.docker.internal' error
by using a more compatible approach that works across different
Docker setups and operating systems.
2025-09-15 17:00:04 +02:00

3.9 KiB

Cache Troubleshooting Guide

Problem Description

The LabFusion CI/CD pipelines were experiencing cache timeout errors:

::warning::Failed to restore: getCacheEntry failed: connect ETIMEDOUT 172.31.0.3:44029

This error occurs when the cache service is not accessible from the job containers due to Docker networking issues.

Root Cause

The issue is caused by:

  1. Docker Networking: Containers can't reach the cache server on the host
  2. Random Port Assignment: Using port 0 causes unpredictable port assignments
  3. Cache Service Location: The cache service binds to an IP that containers can't access

Solutions Implemented

1. Workflow-Level Fixes

Added fail-on-cache-miss: false to all cache actions in:

  • .gitea/workflows/api-gateway.yml
  • .gitea/workflows/frontend.yml
  • .gitea/workflows/service-adapters.yml
  • .gitea/workflows/api-docs.yml
  • .gitea/workflows/ci.yml

This ensures that cache failures don't cause the entire pipeline to fail.

2. Runner Configuration Fixes

Updated all existing runner configuration files with:

  • Auto-detect Host: Empty host field (allows act_runner to auto-detect the correct IP)
  • Fixed Port: 44029 (instead of random port 0)
  • Host Network: Uses host networking for better connectivity

Updated files:

  • runners/config_docker.yaml
  • runners/config_heavy.yaml
  • runners/config_light.yaml
  • runners/config_security.yaml

3. Troubleshooting Tools

Created diagnostic scripts:

  • runners/fix-cache-issues.sh (Linux/macOS)
  • runners/fix-cache-issues.ps1 (Windows)

These scripts help diagnose and fix cache issues.

How to Apply the Fixes

Option 1: Use the Updated Configuration

  1. Stop your current runner:

    pkill -f act_runner
    
  2. Start with an updated configuration:

    ./act_runner daemon --config config_docker.yaml
    # or
    ./act_runner daemon --config config_heavy.yaml
    # or
    ./act_runner daemon --config config_light.yaml
    # or
    ./act_runner daemon --config config_security.yaml
    

Option 2: Run the Troubleshooting Script

Linux/macOS:

cd runners
./fix-cache-issues.sh

Windows:

cd runners
.\fix-cache-issues.ps1

Option 3: Manual Configuration

Update your runner configuration with these key changes:

cache:
  enabled: true
  host: ""                      # Auto-detect host IP
  port: 44029                   # Fixed port

container:
  network: "host"               # Use host networking

Verification

After applying the fixes:

  1. Check Runner Logs: Look for cache service startup messages
  2. Test a Workflow: Run a simple workflow to verify cache works
  3. Monitor Cache Hits: Check if dependencies are being cached properly

Expected Results

  • No more ETIMEDOUT errors
  • Cache hits show " Cache hit!" messages
  • Faster build times due to dependency caching
  • Workflows continue even if cache fails

Troubleshooting

If issues persist:

  1. Check Docker Networking:

    docker network ls
    docker network inspect bridge
    
  2. Verify Cache Service:

    netstat -tlnp | grep 44029
    
  3. Test Connectivity:

    curl http://host.docker.internal:44029/
    
  4. Check Runner Logs:

    tail -f runner.log
    

Additional Resources

Support

If you continue to experience cache issues after applying these fixes, please:

  1. Run the troubleshooting script and share the output
  2. Check the runner logs for any error messages
  3. Verify your Docker and network configuration