Performance Analysis Demo
๐๏ธ System Architecture Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Client Layer โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Web UI โ โ CLI Toolsโ โ REST API โ โMCP Clientโ โ
โ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โ
โโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AI Agent Layer (FastAPI + LangGraph) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ๐ค Chat Agent (Conversational Interaction) โ โ
โ โ ๐ Report Agent (Auto Report Generation) โ โ
โ โ ๐พ Storage Agent (Data Persistence) โ โ
โ โ โข Streaming Response (SSE) โข Tool Orchestration โ โ
โ โ โข Context Memory โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Protocol Communication โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Server Layer (FastMCP Framework) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ ETCD Analyzer Server (Port 8001) โโ
โ โ โข 15 Analysis Tools โโ
โ โ โข Cluster Status โข WAL Fsync โข Backend Commit โโ
โ โ โข Disk I/O โข Network I/O โข Deep Analysis โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Tools Layer (Tools & Collectors) โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โPrometheusโ โ PromQL โ โ Metric โ โ Data โ โ
โ โIntegrationโ โ Query โ โCollector โ โTransform โ โ
โ โ โ โ Executionโ โ (51+) โ โ (ELT) โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Analysis Layer (Analysis & Storage) โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โBottleneckโ โPerformanceโ โ DuckDB โ โ HTML โ โ
โ โDetection โ โ Analysis โ โ Storage โ โ Report โ โ
โ โ Engine โ โAlgorithm โ โ(TimeSeries)โ โGenerationโ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OpenShift/Kubernetes Cluster Infrastructure โ
โ โข ETCD Cluster (3-5 nodes) โข Prometheus/Thanos Monitoring โ
โ โข Master Nodes โข Kubernetes API Server โ
โ โข OVN-Kubernetes Network โข Container Runtime โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ง Key Component Description
1. MCP Server (FastMCP) - Model Context Protocol-based server exposing 15+ analysis tools for AI agent invocation
2. AI Agent (LangGraph) - Intelligent agent using OpenAI/compatible LLM for conversational analysis and automated report generation
3. Metric Collector - Queries Prometheus via PromQL to collect 51+ ETCD performance metrics
4. ELT Pipeline - Transforms raw JSON data into structured HTML tables for easy display and analysis
5. DuckDB Storage - Time-series database storing historical performance data for trend analysis
๐ฏ Core Features Demo
1๏ธโฃ Real-time Performance Monitoring
Feature: Real-time ETCD cluster performance metrics collection via Prometheus
Metric Coverage: 51 core ETCD metrics covering disk, network, CPU, memory, etc.
Update Frequency: 2-minute sampling window with real-time streaming response
Metric Coverage: 51 core ETCD metrics covering disk, network, CPU, memory, etc.
Update Frequency: 2-minute sampling window with real-time streaming response
2๏ธโฃ AI Conversational Analysis
# User Query Example
User: "Analyze ETCD WAL fsync performance over the past 1 hour"
# AI Execution Flow
1. Call Tool: get_etcd_disk_wal_fsync(duration="1h")
2. Collect Metrics: P50/P90/P99 latency distribution
3. Analyze Data: Compare against thresholds (target: P99 < 10ms)
4. Generate Recommendations: Provide optimization suggestions if thresholds exceeded
# Response
๐ Data Result:
- WAL Fsync P99: 8.2ms โ (Excellent)
- WAL Fsync P90: 5.1ms
- WAL Fsync P50: 2.3ms
๐ค AI Analysis:
WAL fsync performance is excellent, P99 latency of 8.2ms is well below 10ms threshold.
Disk write performance is stable, no optimization needed.
3๏ธโฃ Automatic Bottleneck Detection
Detection Dimensions:
โข CPU Usage (Threshold: Warning 70%, Critical 85%)
โข Memory Usage (Threshold: Warning 70%, Critical 85%)
โข WAL Fsync Latency (Target: P99 < 10ms)
โข Backend Commit Latency (Target: P99 < 25ms)
โข Disk I/O Wait (Target: < 10%)
โข Network Latency (Target: Peer < 50ms)
โข CPU Usage (Threshold: Warning 70%, Critical 85%)
โข Memory Usage (Threshold: Warning 70%, Critical 85%)
โข WAL Fsync Latency (Target: P99 < 10ms)
โข Backend Commit Latency (Target: P99 < 25ms)
โข Disk I/O Wait (Target: < 10%)
โข Network Latency (Target: Peer < 50ms)
4๏ธโฃ Complete Performance Report
# Report Structure
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ ETCD Performance Analysis Reportโ
โ Time Range: 2026-04-12 14:00-15:00โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 1. Executive Summary โ
โ โข Cluster Status: Healthy โ โ
โ โข Issues Found: 1 warning โ
โ โข Priority: Medium โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 2. Key Metrics โ
โ โ WAL Fsync P99: 8.2ms โ
โ โ Backend Commit P99: 32.5ms โ
โ โ CPU Usage: 45% โ
โ โ Memory Usage: 58% โ
โ โ Disk I/O Wait: 15% โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 3. Bottleneck Analysis โ
โ ๐ด Disk I/O Bottleneck โ
โ - I/O Wait 15% too high โ
โ - Affects Backend Commit โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 4. Optimization Recommendations โ
โ 1. Use faster SSD (NVMe) โ
โ 2. Tune I/O scheduler to deadlineโ
โ 3. Check disk contention โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Tool List (15 Tools)
| Tool Name | Function Description | Return Data |
|---|---|---|
| get_server_health | Server health check | Status, timestamp |
| get_etcd_cluster_status | ETCD cluster status | Member list, health status |
| get_ocp_cluster_info | OpenShift cluster info | Version, node count, resources |
| get_etcd_general_info | ETCD general metrics | CPU, memory, DB size |
| get_etcd_disk_wal_fsync | WAL Fsync performance | P50/P90/P99 latency |
| get_etcd_disk_backend_commit | Backend Commit performance | P50/P90/P99 latency |
| get_node_disk_io | Node disk I/O | IOPS, throughput, latency |
| get_etcd_network_io | Network I/O metrics | Bandwidth, packet rate |
| get_etcd_performance_deep_drive | Deep performance analysis | Multi-dimensional analysis |
| get_etcd_bottleneck_analysis | Bottleneck detection | Bottleneck identification & suggestions |
| generate_etcd_performance_report | Generate performance report | Complete HTML report |