Runtime Operations and Observability Plan¶
Why This Page Exists¶
The architecture docs should describe what is implemented today. This page tracks runtime features that are desirable but not yet fully shipped.
Current Baseline (April 2026)¶
Implemented today:
- async MCP tool handlers across the server and tool modules
- fixed-size connection pooling via
ConnectionPoolAdapter - circuit breaker protection via
CircuitBreakerAdapter - adapter health checks and basic pool metrics
- batch-oriented tools for export, file operations, template application, macro execution, and automation
- agent-side SQLite logging for prompt runs, tool events, and error history
Not fully implemented today:
- multi-level runtime caches for results, feature trees, properties, or queries
- TTL/event/manual cache invalidation workflow
- explicit cancellation of long-running MCP operations
- generic progress streaming for long-running operations
- alerting or notification hooks for failures and degraded service
- autoscaling or distributed runtime orchestration
Planned Areas¶
1. Runtime Caching¶
Potential scope:
- result cache for expensive read operations
- feature-tree cache keyed by active file + change fingerprint
- property cache for repeated metadata/mass-property reads
- query cache for expensive docs/search analysis
Open design questions:
- what events should invalidate cache entries?
- should cache be per-session, per-file, or shared across sessions?
- should the server expose explicit cache-clear tools?
2. Progress and Cancellation¶
Potential scope:
- progress callbacks or structured progress events for long-running batch tools
- cancellation tokens for long-running operations
- checkpoint support for multi-step workflows
- resumable batch execution for partial failures
Candidate first targets:
batch_exportbatch_process_filesbatch_file_operationsbatch_apply_templatebatch_execute_macros
3. Observability and Alerting¶
Potential scope:
- runtime metrics sink for health and latency
- adapter pool dashboards
- warning/error thresholds with notification hooks
- structured aggregation of retry and circuit-breaker trips
- per-tool latency percentiles and failure rates
Candidate first metrics:
- adapter acquisition latency
- per-tool execution time
- retry counts
- circuit breaker open/half-open durations
- pool utilization
- batch success/failure counts
4. Deployment and Scaling¶
Potential scope:
- multiple runtime instances with external coordination
- distributed state/caching if remote deployments become common
- queue-backed worker model for background operations
- deployment-specific configuration profiles
Documentation Rule¶
If a feature is described as implemented in the architecture guide, it should have a concrete code path in src/.
If it is only an intended capability, it belongs in planning docs like this page.
Candidate Milestones¶
Milestone A¶
- document per-tool runtime support matrix
- add explicit progress fields to selected batch tools
- add a cache strategy proposal with invalidation rules
Milestone B¶
- implement one real cache for repeated read-only model queries
- add manual cache clear command/tool
- add runtime metric aggregation for adapter pool health
Milestone C¶
- add cancellation for long-running batch operations where feasible
- add notification hooks or external alert adapters
- validate the model with real SolidWorks multi-file workflows