Some checks failed
Integration Tests / integration-tests (push) Failing after 20s
Integration Tests / performance-tests (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.11) (push) Failing after 23s
Frontend (React) / test (20) (push) Failing after 1m3s
Frontend (React) / build (push) Has been skipped
Frontend (React) / lighthouse (push) Has been skipped
Service Adapters (Python FastAPI) / test (3.12) (push) Failing after 23s
Service Adapters (Python FastAPI) / test (3.13) (push) Failing after 20s
Service Adapters (Python FastAPI) / build (push) Has been skipped
### Summary of Changes - Removed proxy configuration in `rsbuild.config.js` as the API Gateway is not running. - Added smooth transitions and gentle loading overlays in CSS for improved user experience during data loading. - Updated `Dashboard` component to conditionally display loading spinner and gentle loading overlay based on data fetching state. - Enhanced `useOfflineAwareServiceStatus` and `useOfflineAwareSystemData` hooks to manage loading states and service status more effectively. - Increased refresh intervals for service status and system data to reduce API call frequency. ### Expected Results - Improved user experience with smoother loading transitions and better feedback during data refreshes. - Enhanced handling of service status checks, providing clearer information when services are unavailable. - Streamlined code for managing loading states, making it easier to maintain and extend in the future.
281 lines
7.6 KiB
Markdown
281 lines
7.6 KiB
Markdown
# Health Checking System
|
|
|
|
This document describes the generalized health checking system for LabFusion Service Adapters.
|
|
|
|
## Overview
|
|
|
|
The health checking system is designed to be flexible and extensible, supporting different types of health checks for different services. It uses a strategy pattern with pluggable health checkers.
|
|
|
|
## Architecture
|
|
|
|
### Core Components
|
|
|
|
1. **BaseHealthChecker**: Abstract base class for all health checkers
|
|
2. **HealthCheckResult**: Standardized result object
|
|
3. **HealthCheckerRegistry**: Registry for different checker types
|
|
4. **HealthCheckerFactory**: Factory for creating checker instances
|
|
5. **ServiceStatusChecker**: Main orchestrator
|
|
|
|
### Health Checker Types
|
|
|
|
#### 1. API Health Checker (`APIHealthChecker`)
|
|
- **Purpose**: Check services with HTTP health endpoints
|
|
- **Use Case**: Most REST APIs, microservices
|
|
- **Configuration**:
|
|
```python
|
|
{
|
|
"health_check_type": "api",
|
|
"health_endpoint": "/api/health",
|
|
"url": "https://service.example.com"
|
|
}
|
|
```
|
|
|
|
#### 2. Sensor Health Checker (`SensorHealthChecker`)
|
|
- **Purpose**: Check services via sensor data (e.g., Home Assistant entities)
|
|
- **Use Case**: Home Assistant, IoT devices, sensor-based monitoring
|
|
- **Configuration**:
|
|
```python
|
|
{
|
|
"health_check_type": "sensor",
|
|
"sensor_entity": "sensor.system_uptime",
|
|
"url": "https://homeassistant.example.com"
|
|
}
|
|
```
|
|
|
|
#### 3. Custom Health Checker (`CustomHealthChecker`)
|
|
- **Purpose**: Complex health checks with multiple validation steps
|
|
- **Use Case**: Services requiring multiple checks, custom logic
|
|
- **Configuration**:
|
|
```python
|
|
{
|
|
"health_check_type": "custom",
|
|
"health_checks": [
|
|
{
|
|
"type": "api",
|
|
"name": "main_api",
|
|
"url": "https://service.example.com/api/health"
|
|
},
|
|
{
|
|
"type": "sensor",
|
|
"name": "uptime_sensor",
|
|
"sensor_entity": "sensor.service_uptime"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Service Configuration Structure
|
|
|
|
```python
|
|
SERVICES = {
|
|
"service_name": {
|
|
"url": "https://service.example.com",
|
|
"enabled": True,
|
|
"health_check_type": "api|sensor|custom",
|
|
|
|
# API-specific
|
|
"health_endpoint": "/api/health",
|
|
"token": "auth_token",
|
|
"api_key": "api_key",
|
|
|
|
# Sensor-specific
|
|
"sensor_entity": "sensor.entity_name",
|
|
|
|
# Custom-specific
|
|
"health_checks": [
|
|
{
|
|
"type": "api",
|
|
"name": "check_name",
|
|
"url": "https://endpoint.com/health"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Service URLs
|
|
HOME_ASSISTANT_URL=https://ha.example.com
|
|
FRIGATE_URL=http://frigate.local:5000
|
|
IMMICH_URL=http://immich.local:2283
|
|
N8N_URL=http://n8n.local:5678
|
|
|
|
# Authentication
|
|
HOME_ASSISTANT_TOKEN=your_token
|
|
FRIGATE_TOKEN=your_token
|
|
IMMICH_API_KEY=your_key
|
|
N8N_API_KEY=your_key
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Basic API Health Check
|
|
|
|
```python
|
|
from services.health_checkers import factory
|
|
|
|
# Create API checker
|
|
checker = factory.create_checker("api", timeout=5.0)
|
|
|
|
# Check service
|
|
config = {
|
|
"url": "https://api.example.com",
|
|
"health_endpoint": "/health",
|
|
"enabled": True
|
|
}
|
|
result = await checker.check_health("example_service", config)
|
|
print(f"Status: {result.status}")
|
|
print(f"Response time: {result.response_time}s")
|
|
```
|
|
|
|
### Sensor-Based Health Check
|
|
|
|
```python
|
|
# Create sensor checker
|
|
checker = factory.create_checker("sensor", timeout=5.0)
|
|
|
|
# Check Home Assistant sensor
|
|
config = {
|
|
"url": "https://ha.example.com",
|
|
"sensor_entity": "sensor.system_uptime",
|
|
"token": "your_token",
|
|
"enabled": True
|
|
}
|
|
result = await checker.check_health("home_assistant", config)
|
|
print(f"Uptime: {result.metadata.get('sensor_state')}")
|
|
```
|
|
|
|
### Custom Health Check
|
|
|
|
```python
|
|
# Create custom checker
|
|
checker = factory.create_checker("custom", timeout=10.0)
|
|
|
|
# Check with multiple validations
|
|
config = {
|
|
"url": "https://service.example.com",
|
|
"enabled": True,
|
|
"health_checks": [
|
|
{
|
|
"type": "api",
|
|
"name": "main_api",
|
|
"url": "https://service.example.com/api/health"
|
|
},
|
|
{
|
|
"type": "api",
|
|
"name": "database",
|
|
"url": "https://service.example.com/api/db/health"
|
|
}
|
|
]
|
|
}
|
|
result = await checker.check_health("complex_service", config)
|
|
print(f"Overall status: {result.status}")
|
|
print(f"Individual checks: {result.metadata.get('check_results')}")
|
|
```
|
|
|
|
## Health Check Results
|
|
|
|
### HealthCheckResult Structure
|
|
|
|
```python
|
|
{
|
|
"status": "healthy|unhealthy|disabled|error|timeout|unauthorized|forbidden",
|
|
"response_time": 0.123, # seconds
|
|
"error": "Error message if applicable",
|
|
"metadata": {
|
|
"http_status": 200,
|
|
"response_size": 1024,
|
|
"sensor_state": "12345",
|
|
"last_updated": "2024-01-15T10:30:00Z"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Status Values
|
|
|
|
- **healthy**: Service is responding normally
|
|
- **unhealthy**: Service responded but with error status
|
|
- **disabled**: Service is disabled in configuration
|
|
- **timeout**: Request timed out
|
|
- **unauthorized**: Authentication required (HTTP 401)
|
|
- **forbidden**: Access forbidden (HTTP 403)
|
|
- **error**: Network or other error occurred
|
|
|
|
## Extending the System
|
|
|
|
### Adding a New Health Checker
|
|
|
|
1. **Create the checker class**:
|
|
```python
|
|
from .base import BaseHealthChecker, HealthCheckResult
|
|
|
|
class MyCustomChecker(BaseHealthChecker):
|
|
async def check_health(self, service_name: str, config: Dict) -> HealthCheckResult:
|
|
# Implementation
|
|
pass
|
|
```
|
|
|
|
2. **Register the checker**:
|
|
```python
|
|
from services.health_checkers import registry
|
|
|
|
registry.register("my_custom", MyCustomChecker)
|
|
```
|
|
|
|
3. **Use in configuration**:
|
|
```python
|
|
{
|
|
"health_check_type": "my_custom",
|
|
"custom_param": "value"
|
|
}
|
|
```
|
|
|
|
### Service-Specific Logic
|
|
|
|
The factory automatically selects the appropriate checker based on:
|
|
1. `health_check_type` in configuration
|
|
2. Service name patterns
|
|
3. Configuration presence (e.g., `sensor_entity` → sensor checker)
|
|
|
|
## Performance Considerations
|
|
|
|
- **Concurrent Checking**: All services are checked simultaneously
|
|
- **Checker Caching**: Checkers are cached per service to avoid recreation
|
|
- **Timeout Management**: Configurable timeouts per checker type
|
|
- **Resource Cleanup**: Proper cleanup of HTTP clients
|
|
|
|
## Monitoring and Logging
|
|
|
|
- **Debug Logs**: Detailed operation logs for troubleshooting
|
|
- **Performance Metrics**: Response times and success rates
|
|
- **Error Tracking**: Comprehensive error logging with context
|
|
- **Health Summary**: Overall system health statistics
|
|
|
|
## Best Practices
|
|
|
|
1. **Choose Appropriate Checker**: Use the right checker type for your service
|
|
2. **Set Reasonable Timeouts**: Balance responsiveness with reliability
|
|
3. **Handle Errors Gracefully**: Always provide meaningful error messages
|
|
4. **Monitor Performance**: Track response times and success rates
|
|
5. **Test Thoroughly**: Verify health checks work in all scenarios
|
|
6. **Document Configuration**: Keep service configurations well-documented
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Timeout Errors**: Increase timeout or check network connectivity
|
|
2. **Authentication Failures**: Verify tokens and API keys
|
|
3. **Sensor Not Found**: Check entity names and permissions
|
|
4. **Configuration Errors**: Validate service configuration structure
|
|
|
|
### Debug Tools
|
|
|
|
- **Debug Endpoint**: `/debug/logging` to test logging configuration
|
|
- **Health Check Logs**: Detailed logs for each health check operation
|
|
- **Metadata Inspection**: Check metadata for additional context
|