Files
labFusion/services/service-adapters/HEALTH_CHECKING.md
GSRN 7373ccfa1d
Some checks failed
Integration Tests / integration-tests (push) Failing after 20s
Integration Tests / performance-tests (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.11) (push) Failing after 23s
Frontend (React) / test (20) (push) Failing after 1m3s
Frontend (React) / build (push) Has been skipped
Frontend (React) / lighthouse (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.12) (push) Failing after 23s
Service Adapters (Python FastAPI) / test (3.13) (push) Failing after 20s
Service Adapters (Python FastAPI) / build (push) Has been skipped
feat: Enhance frontend loading experience and service status handling
### Summary of Changes
- Removed proxy configuration in `rsbuild.config.js` as the API Gateway is not running.
- Added smooth transitions and gentle loading overlays in CSS for improved user experience during data loading.
- Updated `Dashboard` component to conditionally display loading spinner and gentle loading overlay based on data fetching state.
- Enhanced `useOfflineAwareServiceStatus` and `useOfflineAwareSystemData` hooks to manage loading states and service status more effectively.
- Increased refresh intervals for service status and system data to reduce API call frequency.

### Expected Results
- Improved user experience with smoother loading transitions and better feedback during data refreshes.
- Enhanced handling of service status checks, providing clearer information when services are unavailable.
- Streamlined code for managing loading states, making it easier to maintain and extend in the future.
2025-09-18 11:09:51 +02:00

7.6 KiB

Health Checking System

This document describes the generalized health checking system for LabFusion Service Adapters.

Overview

The health checking system is designed to be flexible and extensible, supporting different types of health checks for different services. It uses a strategy pattern with pluggable health checkers.

Architecture

Core Components

  1. BaseHealthChecker: Abstract base class for all health checkers
  2. HealthCheckResult: Standardized result object
  3. HealthCheckerRegistry: Registry for different checker types
  4. HealthCheckerFactory: Factory for creating checker instances
  5. ServiceStatusChecker: Main orchestrator

Health Checker Types

1. API Health Checker (APIHealthChecker)

  • Purpose: Check services with HTTP health endpoints
  • Use Case: Most REST APIs, microservices
  • Configuration:
    {
        "health_check_type": "api",
        "health_endpoint": "/api/health",
        "url": "https://service.example.com"
    }
    

2. Sensor Health Checker (SensorHealthChecker)

  • Purpose: Check services via sensor data (e.g., Home Assistant entities)
  • Use Case: Home Assistant, IoT devices, sensor-based monitoring
  • Configuration:
    {
        "health_check_type": "sensor",
        "sensor_entity": "sensor.system_uptime",
        "url": "https://homeassistant.example.com"
    }
    

3. Custom Health Checker (CustomHealthChecker)

  • Purpose: Complex health checks with multiple validation steps
  • Use Case: Services requiring multiple checks, custom logic
  • Configuration:
    {
        "health_check_type": "custom",
        "health_checks": [
            {
                "type": "api",
                "name": "main_api",
                "url": "https://service.example.com/api/health"
            },
            {
                "type": "sensor",
                "name": "uptime_sensor",
                "sensor_entity": "sensor.service_uptime"
            }
        ]
    }
    

Configuration

Service Configuration Structure

SERVICES = {
    "service_name": {
        "url": "https://service.example.com",
        "enabled": True,
        "health_check_type": "api|sensor|custom",
        
        # API-specific
        "health_endpoint": "/api/health",
        "token": "auth_token",
        "api_key": "api_key",
        
        # Sensor-specific
        "sensor_entity": "sensor.entity_name",
        
        # Custom-specific
        "health_checks": [
            {
                "type": "api",
                "name": "check_name",
                "url": "https://endpoint.com/health"
            }
        ]
    }
}

Environment Variables

# Service URLs
HOME_ASSISTANT_URL=https://ha.example.com
FRIGATE_URL=http://frigate.local:5000
IMMICH_URL=http://immich.local:2283
N8N_URL=http://n8n.local:5678

# Authentication
HOME_ASSISTANT_TOKEN=your_token
FRIGATE_TOKEN=your_token
IMMICH_API_KEY=your_key
N8N_API_KEY=your_key

Usage Examples

Basic API Health Check

from services.health_checkers import factory

# Create API checker
checker = factory.create_checker("api", timeout=5.0)

# Check service
config = {
    "url": "https://api.example.com",
    "health_endpoint": "/health",
    "enabled": True
}
result = await checker.check_health("example_service", config)
print(f"Status: {result.status}")
print(f"Response time: {result.response_time}s")

Sensor-Based Health Check

# Create sensor checker
checker = factory.create_checker("sensor", timeout=5.0)

# Check Home Assistant sensor
config = {
    "url": "https://ha.example.com",
    "sensor_entity": "sensor.system_uptime",
    "token": "your_token",
    "enabled": True
}
result = await checker.check_health("home_assistant", config)
print(f"Uptime: {result.metadata.get('sensor_state')}")

Custom Health Check

# Create custom checker
checker = factory.create_checker("custom", timeout=10.0)

# Check with multiple validations
config = {
    "url": "https://service.example.com",
    "enabled": True,
    "health_checks": [
        {
            "type": "api",
            "name": "main_api",
            "url": "https://service.example.com/api/health"
        },
        {
            "type": "api",
            "name": "database",
            "url": "https://service.example.com/api/db/health"
        }
    ]
}
result = await checker.check_health("complex_service", config)
print(f"Overall status: {result.status}")
print(f"Individual checks: {result.metadata.get('check_results')}")

Health Check Results

HealthCheckResult Structure

{
    "status": "healthy|unhealthy|disabled|error|timeout|unauthorized|forbidden",
    "response_time": 0.123,  # seconds
    "error": "Error message if applicable",
    "metadata": {
        "http_status": 200,
        "response_size": 1024,
        "sensor_state": "12345",
        "last_updated": "2024-01-15T10:30:00Z"
    }
}

Status Values

  • healthy: Service is responding normally
  • unhealthy: Service responded but with error status
  • disabled: Service is disabled in configuration
  • timeout: Request timed out
  • unauthorized: Authentication required (HTTP 401)
  • forbidden: Access forbidden (HTTP 403)
  • error: Network or other error occurred

Extending the System

Adding a New Health Checker

  1. Create the checker class:

    from .base import BaseHealthChecker, HealthCheckResult
    
    class MyCustomChecker(BaseHealthChecker):
        async def check_health(self, service_name: str, config: Dict) -> HealthCheckResult:
            # Implementation
            pass
    
  2. Register the checker:

    from services.health_checkers import registry
    
    registry.register("my_custom", MyCustomChecker)
    
  3. Use in configuration:

    {
        "health_check_type": "my_custom",
        "custom_param": "value"
    }
    

Service-Specific Logic

The factory automatically selects the appropriate checker based on:

  1. health_check_type in configuration
  2. Service name patterns
  3. Configuration presence (e.g., sensor_entity → sensor checker)

Performance Considerations

  • Concurrent Checking: All services are checked simultaneously
  • Checker Caching: Checkers are cached per service to avoid recreation
  • Timeout Management: Configurable timeouts per checker type
  • Resource Cleanup: Proper cleanup of HTTP clients

Monitoring and Logging

  • Debug Logs: Detailed operation logs for troubleshooting
  • Performance Metrics: Response times and success rates
  • Error Tracking: Comprehensive error logging with context
  • Health Summary: Overall system health statistics

Best Practices

  1. Choose Appropriate Checker: Use the right checker type for your service
  2. Set Reasonable Timeouts: Balance responsiveness with reliability
  3. Handle Errors Gracefully: Always provide meaningful error messages
  4. Monitor Performance: Track response times and success rates
  5. Test Thoroughly: Verify health checks work in all scenarios
  6. Document Configuration: Keep service configurations well-documented

Troubleshooting

Common Issues

  1. Timeout Errors: Increase timeout or check network connectivity
  2. Authentication Failures: Verify tokens and API keys
  3. Sensor Not Found: Check entity names and permissions
  4. Configuration Errors: Validate service configuration structure

Debug Tools

  • Debug Endpoint: /debug/logging to test logging configuration
  • Health Check Logs: Detailed logs for each health check operation
  • Metadata Inspection: Check metadata for additional context