Files
labFusion/services/service-adapters/HEALTH_CHECKING.md
GSRN 7373ccfa1d
Some checks failed
Integration Tests / integration-tests (push) Failing after 20s
Integration Tests / performance-tests (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.11) (push) Failing after 23s
Frontend (React) / test (20) (push) Failing after 1m3s
Frontend (React) / build (push) Has been skipped
Frontend (React) / lighthouse (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.12) (push) Failing after 23s
Service Adapters (Python FastAPI) / test (3.13) (push) Failing after 20s
Service Adapters (Python FastAPI) / build (push) Has been skipped
feat: Enhance frontend loading experience and service status handling
### Summary of Changes
- Removed proxy configuration in `rsbuild.config.js` as the API Gateway is not running.
- Added smooth transitions and gentle loading overlays in CSS for improved user experience during data loading.
- Updated `Dashboard` component to conditionally display loading spinner and gentle loading overlay based on data fetching state.
- Enhanced `useOfflineAwareServiceStatus` and `useOfflineAwareSystemData` hooks to manage loading states and service status more effectively.
- Increased refresh intervals for service status and system data to reduce API call frequency.

### Expected Results
- Improved user experience with smoother loading transitions and better feedback during data refreshes.
- Enhanced handling of service status checks, providing clearer information when services are unavailable.
- Streamlined code for managing loading states, making it easier to maintain and extend in the future.
2025-09-18 11:09:51 +02:00

281 lines
7.6 KiB
Markdown

# Health Checking System
This document describes the generalized health checking system for LabFusion Service Adapters.
## Overview
The health checking system is designed to be flexible and extensible, supporting different types of health checks for different services. It uses a strategy pattern with pluggable health checkers.
## Architecture
### Core Components
1. **BaseHealthChecker**: Abstract base class for all health checkers
2. **HealthCheckResult**: Standardized result object
3. **HealthCheckerRegistry**: Registry for different checker types
4. **HealthCheckerFactory**: Factory for creating checker instances
5. **ServiceStatusChecker**: Main orchestrator
### Health Checker Types
#### 1. API Health Checker (`APIHealthChecker`)
- **Purpose**: Check services with HTTP health endpoints
- **Use Case**: Most REST APIs, microservices
- **Configuration**:
```python
{
"health_check_type": "api",
"health_endpoint": "/api/health",
"url": "https://service.example.com"
}
```
#### 2. Sensor Health Checker (`SensorHealthChecker`)
- **Purpose**: Check services via sensor data (e.g., Home Assistant entities)
- **Use Case**: Home Assistant, IoT devices, sensor-based monitoring
- **Configuration**:
```python
{
"health_check_type": "sensor",
"sensor_entity": "sensor.system_uptime",
"url": "https://homeassistant.example.com"
}
```
#### 3. Custom Health Checker (`CustomHealthChecker`)
- **Purpose**: Complex health checks with multiple validation steps
- **Use Case**: Services requiring multiple checks, custom logic
- **Configuration**:
```python
{
"health_check_type": "custom",
"health_checks": [
{
"type": "api",
"name": "main_api",
"url": "https://service.example.com/api/health"
},
{
"type": "sensor",
"name": "uptime_sensor",
"sensor_entity": "sensor.service_uptime"
}
]
}
```
## Configuration
### Service Configuration Structure
```python
SERVICES = {
"service_name": {
"url": "https://service.example.com",
"enabled": True,
"health_check_type": "api|sensor|custom",
# API-specific
"health_endpoint": "/api/health",
"token": "auth_token",
"api_key": "api_key",
# Sensor-specific
"sensor_entity": "sensor.entity_name",
# Custom-specific
"health_checks": [
{
"type": "api",
"name": "check_name",
"url": "https://endpoint.com/health"
}
]
}
}
```
### Environment Variables
```bash
# Service URLs
HOME_ASSISTANT_URL=https://ha.example.com
FRIGATE_URL=http://frigate.local:5000
IMMICH_URL=http://immich.local:2283
N8N_URL=http://n8n.local:5678
# Authentication
HOME_ASSISTANT_TOKEN=your_token
FRIGATE_TOKEN=your_token
IMMICH_API_KEY=your_key
N8N_API_KEY=your_key
```
## Usage Examples
### Basic API Health Check
```python
from services.health_checkers import factory
# Create API checker
checker = factory.create_checker("api", timeout=5.0)
# Check service
config = {
"url": "https://api.example.com",
"health_endpoint": "/health",
"enabled": True
}
result = await checker.check_health("example_service", config)
print(f"Status: {result.status}")
print(f"Response time: {result.response_time}s")
```
### Sensor-Based Health Check
```python
# Create sensor checker
checker = factory.create_checker("sensor", timeout=5.0)
# Check Home Assistant sensor
config = {
"url": "https://ha.example.com",
"sensor_entity": "sensor.system_uptime",
"token": "your_token",
"enabled": True
}
result = await checker.check_health("home_assistant", config)
print(f"Uptime: {result.metadata.get('sensor_state')}")
```
### Custom Health Check
```python
# Create custom checker
checker = factory.create_checker("custom", timeout=10.0)
# Check with multiple validations
config = {
"url": "https://service.example.com",
"enabled": True,
"health_checks": [
{
"type": "api",
"name": "main_api",
"url": "https://service.example.com/api/health"
},
{
"type": "api",
"name": "database",
"url": "https://service.example.com/api/db/health"
}
]
}
result = await checker.check_health("complex_service", config)
print(f"Overall status: {result.status}")
print(f"Individual checks: {result.metadata.get('check_results')}")
```
## Health Check Results
### HealthCheckResult Structure
```python
{
"status": "healthy|unhealthy|disabled|error|timeout|unauthorized|forbidden",
"response_time": 0.123, # seconds
"error": "Error message if applicable",
"metadata": {
"http_status": 200,
"response_size": 1024,
"sensor_state": "12345",
"last_updated": "2024-01-15T10:30:00Z"
}
}
```
### Status Values
- **healthy**: Service is responding normally
- **unhealthy**: Service responded but with error status
- **disabled**: Service is disabled in configuration
- **timeout**: Request timed out
- **unauthorized**: Authentication required (HTTP 401)
- **forbidden**: Access forbidden (HTTP 403)
- **error**: Network or other error occurred
## Extending the System
### Adding a New Health Checker
1. **Create the checker class**:
```python
from .base import BaseHealthChecker, HealthCheckResult
class MyCustomChecker(BaseHealthChecker):
async def check_health(self, service_name: str, config: Dict) -> HealthCheckResult:
# Implementation
pass
```
2. **Register the checker**:
```python
from services.health_checkers import registry
registry.register("my_custom", MyCustomChecker)
```
3. **Use in configuration**:
```python
{
"health_check_type": "my_custom",
"custom_param": "value"
}
```
### Service-Specific Logic
The factory automatically selects the appropriate checker based on:
1. `health_check_type` in configuration
2. Service name patterns
3. Configuration presence (e.g., `sensor_entity` → sensor checker)
## Performance Considerations
- **Concurrent Checking**: All services are checked simultaneously
- **Checker Caching**: Checkers are cached per service to avoid recreation
- **Timeout Management**: Configurable timeouts per checker type
- **Resource Cleanup**: Proper cleanup of HTTP clients
## Monitoring and Logging
- **Debug Logs**: Detailed operation logs for troubleshooting
- **Performance Metrics**: Response times and success rates
- **Error Tracking**: Comprehensive error logging with context
- **Health Summary**: Overall system health statistics
## Best Practices
1. **Choose Appropriate Checker**: Use the right checker type for your service
2. **Set Reasonable Timeouts**: Balance responsiveness with reliability
3. **Handle Errors Gracefully**: Always provide meaningful error messages
4. **Monitor Performance**: Track response times and success rates
5. **Test Thoroughly**: Verify health checks work in all scenarios
6. **Document Configuration**: Keep service configurations well-documented
## Troubleshooting
### Common Issues
1. **Timeout Errors**: Increase timeout or check network connectivity
2. **Authentication Failures**: Verify tokens and API keys
3. **Sensor Not Found**: Check entity names and permissions
4. **Configuration Errors**: Validate service configuration structure
### Debug Tools
- **Debug Endpoint**: `/debug/logging` to test logging configuration
- **Health Check Logs**: Detailed logs for each health check operation
- **Metadata Inspection**: Check metadata for additional context