Interactive Design Foundation Tracker¶

Last updated: 2026-04-11

Purpose¶

Track what is done vs. what still needs work for:

Interactive UI research (Prefab/FastMCP app direction)
RAG + orchestration research for stronger tool selection and planning

Current Status Snapshot¶

Workstream	Status	Notes
Dedicated feature-tree reconstruction skill	Done (rough cut)	Added workspace skill file for inspect-classify-delegate routing
Docs migration from silhouette-first	In progress	Core plan exists; tutorial/demo pages still need updates
Code-backed feature-family classification helper	Done (initial)	`feature_tree_classifier` exists and is wired in tooling; needs more evals and fixtures
Interactive UI for guided choices	In progress	Added concrete app workflow and prompt/response patterns
Local memory DB for errors/remediation	Done (expanded)	Added sessions/checkpoints/tool calls/evidence/snapshots/sketch-graph persistence
ResearchTODO: Open CAD datasets and LMM path	In progress	Added dataset/model landscape and evaluation questions for open-source direction

Prefab App Component Status (2026-04-11)¶

Component	Current Status	Notes
Backend API server (FastAPI)	Working	Routes respond and `/docs` is available
Frontend launch (Prefab)	Working with caveats	UTF-8/console launch issues were mitigated; validate on user machine each run
Design Intent panel	Partially working	Goal input and brief approval wired
Clarifying Questions action	Partially working	Calls backend LLM path; depends on provider credentials/config
Strategy checkpoint card (renamed from Family Classification Gate)	Partially working	`Go: Plan Next Steps`, inspect, and accept actions wired
Checkpoint queue rendering	Working with guards	Structured table restored behind schema/array guards with safe text fallback
Evidence rendering	Working with guards	Structured table restored behind schema/array guards with safe text fallback
Execute next checkpoint	Partially working	Runs supported adapter tools; unsupported operations are marked mocked
Manual SolidWorks sync	Partially working	Reconcile endpoint exists; diff logic is currently simplified
3D model preview	Partially working	PNG snapshot refresh works when export path is valid; embedded 3D viewport is mocked
Context window bar	Cosmetic only	Static placeholder values
LLM provider selection in UI	Partially working	Provider + model profile controls now persist in dashboard metadata
Local model dropdown (Gemma-class models)	Partially working	Local small/balanced/large defaults map to Gemma-family model names
Assumptions editor	Working	Dedicated editable assumptions section with persistent save action
Readiness panel	Working	Reports provider credentials, adapter mode, preview readiness, and DB readiness
Inline card-level error surface	Working	`latest_error_text` + `remediation_hint` shown in key cards
RAG ingestion for user-owned engineering books	Partially working	Path-based PDF/text ingestion scaffold writes a local retrieval index and provenance summary
Web research fallback	Missing	No constrained web-search tool integration in UI loop yet
Feature-tree target highlighting (`@feature-id`)	Partially working	UI now persists grounded feature-target refs and validates them against the attached model tree
Attached target-model workflow	Working	Users can attach a `.sldprt`/`.sldasm`, inspect it, and seed planning/preview from that model

Prefab App Todo Backlog (Prioritized)¶

P0 — Make core loop reliable¶

Add an explicit "Plan Next Steps" primary path from Design Intent to strategy checkpoint (done).
Restore structured checkpoint/evidence rendering safely (done: guarded DataTable + fallback text).
Add a backend/UI readiness panel that reports: provider configured, SolidWorks adapter mode, preview export readiness, and DB session status (done).
Add robust error surfaces in UI cards (inline error + remediation hint), not only toasts (done).

P1 — Clarify user workflow and naming¶

Keep the renamed "Modeling Strategy Checkpoint" wording and add helper text for family meaning and confidence interpretation.
Add a linear flow header: Goal -> Assumptions -> Clarify -> Plan -> Execute (done).
Add a dedicated assumptions editor section so users can inspect/edit accepted assumptions before planning (done).

P2 — LLM/model controls¶

Add model/provider selector in UI state and backend request payloads (done: preferences endpoint + persisted UI state).
Add local-model profiles (small/medium/large) with hardware-aware defaults and warnings (partially done: profile selection and default mapping).
Add provider adapters for local inference endpoints (OpenAI-compatible local server first) and test with Gemma-family models where available (in progress).

CADAM-derived note:

CADAM's parametric-control UX reinforces keeping assumptions/model controls editable at runtime. We now mirror that pattern in the dashboard and should extend it to geometry-level slider controls in P4/P5.

P3 — RAG for user-owned technical books¶

Build a "bring-your-own-content" ingestion flow: file picker, chunking profile, embedding profile, and index namespace (partially done: path-based scaffold and local JSON index).
Add explicit copyright-safe mode: only index user-provided files; no bundled proprietary corpora.
Add retrieval provenance panel in UI for plan steps (source file/chunk/score) (partially done: session-level provenance summary).

P4 — Research fallback and advanced interaction¶

Add constrained engineering web-research fallback (query templates + source allowlist + citation capture).
Add feature-tree selection handoff (@feature-id) from UI to planning context (partially done: persisted target refs + validation against attached model tree).
Add assembly mate targeting by selected component references.

P5 — Validation and evaluation¶

Add benchmark run mode from UI session logs (Bat, U-Joint Pin, Paper Airplane, practical printable part).
Track metrics: classification accuracy, first-feature correctness, correction count, rollback success.

Execution Todo Ledger (Do-Not-Lose)¶

This section is the durable running list for in-flight implementation work so context-window truncation does not lose requirements.

Active now¶

Apply Pydantic-first contracts across dashboard UI/backend payloads (shared schema module, explicit field docs, validation).
Complete P0 reliability items: readiness panel, safe structured checkpoint/evidence rendering, and inline per-card error surfaces.
Keep the core interaction loop clear for users: Design Intent -> Assumptions -> Clarify -> Plan -> Execute and an explicit primary action button for planning.

Requested by user and accepted into backlog¶

Use pydantic-ai where LLM response structures are generated/validated (clarify/inspect/planning payloads).
Add local model selection UX: provider dropdown, small/medium/large profile selector, and Gemma-family local endpoint integration path.
Add user-owned engineering-book RAG ingestion (copyright-safe BYO corpus only).
Add constrained engineering web research fallback with source provenance.
Add feature-tree targeting workflow (@feature-id) and assembly mate target selection.
Add global parameter/slider experimentation workflow for part-feature sensitivity.
Evaluate Python-native simple 3D rendering equivalents for frontend integration (pyvista, trimesh, plotly mesh viewers).
Test the dashboard against a how-to document and record how much retrieved guidance actually changes planning quality.
Document separate workflows clearly: MCP server only, UI-driven workflow, and hybrid workflow.
Validate the concrete saved-part path workflow using .generated/part_1.sldprt and a grounded target such as @Boss-Extrude1.

Context Window Budget¶

Use a predictable budget per turn so the workflow remains stable as retrieval and tool logs grow.

Global target: keep active prompt + retrieved evidence under 14k tokens before tool execution.
Hard cap policy: trim low-relevance evidence once context exceeds 16k tokens.
Orchestrator budget: 6k-8k tokens.
Classifier + routing budget: 2k-3k tokens.
Printability/clearance specialist budget: 2k-3k tokens each.
Tool-call trace budget: 1k-2k tokens (summarized, never raw full logs by default).

Token allocation order:

mandatory: current goal, accepted family, latest checkpoint, rollback pointer
high priority: top 3-6 evidence chunks with provenance
medium priority: prior similar failures + remediation
low priority: older conversation turns and verbose intermediate output

When over budget:

collapse prior turns to structured summaries
keep only top-scoring evidence per source type
replace raw payloads with compact key-value extracts
defer non-critical explanation until after execution

What Was Implemented (Items 1, 2, 3)¶

1) Dedicated reconstruction skill¶

Added: .github/skills/feature-tree-reconstruction/SKILL.md
Purpose: enforce inspect-classify-delegate behavior, confidence gating, and checkpoint execution.
Includes triggers for reverse engineering, feature-tree classification, and VBA fallback routing.

2) Family-gated docs page and nav wiring¶

Added: docs/agents/family-gated-tool-routing.md
Added to docs nav under Agents and Skills in mkdocs.yml.
Includes:
family -> allowed tool shortlist table
checkpoint handoff prompts
human-in-SolidWorks edit handback and diff workflow
printability handoff assumptions/outputs

3) SQLite schema/API expansion¶

Extended src/solidworks_mcp/agents/history_db.py with new persistent entities:
DesignSession
PlanCheckpoint
ToolCallRecord
EvidenceLink
ModelStateSnapshot
SketchGraphSnapshot
Added helper APIs for insert/list/update/upsert operations.
Exported the new APIs via src/solidworks_mcp/agents/__init__.py.
Added tests to tests/test_agents_history_db.py for new tables and APIs.

Section F implementation note:

SketchGraphs-style relational data is now persisted in lightweight SQLite via SketchGraphSnapshot (nodes_json, edges_json, metadata).
This keeps storage local and simple while still exposing graph semantics to retrieval/planning.

Final Research Leg: Market + Research Signals¶

These are reference notes only (non-actionable by themselves), added to ground architecture decisions.

Market/workflow signals¶

Prefab (Prefect)
- Link: https://github.com/PrefectHQ/prefab
- Takeaway: Python-declared, protocol-first UI for MCP apps; strong fit for agent-readable, interactive checkpoint UIs.
Onshape ecosystem trend toward AI + automation + branching workflows
- AI/agents blog index signal: https://www.onshape.com/en/blog
- Example post references from index:
- https://www.onshape.com/en/blog/ai-artificial-intelligence-cloud-native-cad-pdm-platform
- https://www.onshape.com/en/blog/adam-ai-app-store-cad-co-pilot
- Takeaway: practical CAD AI workflows are converging on collaboration primitives (branch/merge/history), not one-shot generation.

Research signals¶

SketchGraphs (constraint-graph representation)
- Link: https://arxiv.org/abs/2007.08506
- Takeaway: sketches are relational constraint graphs; retrieval should index entities + constraints, not just text.
DeepCAD (operation-sequence representation)
- Link: https://arxiv.org/abs/2105.09492
- Takeaway: CAD as operation sequences is a viable modeling space; supports plan/checkpoint generation grounded in feature history.
ReAct (interleaved reason/act loops)
- Link: https://arxiv.org/abs/2210.03629
- Takeaway: inspect/act/observe loops are better than one-shot planning for tool-rich CAD tasks.
RAG (evidence retrieval over latent memory)
- Link: https://arxiv.org/abs/2005.11401
- Takeaway: retrieval with provenance is critical for safe tool routing and explainable CAD planning.

ResearchTODO: Open CAD Dataset and LMM Landscape¶

Goal: evaluate whether we can build an open, SolidWorks-compatible "Large Mechanical Model" style stack using open datasets, open methods, and local retrieval.

Why this is added now¶

Commercial positioning (example: Leo) points to an LMM concept with multi-modal input and engineering-aware generation. We need an open-source path that remains auditable and reproducible.

Reference:

Leo about page: https://www.getleo.ai/about

Candidate open datasets to evaluate first¶

Dataset: SketchGraphs.
Link: https://arxiv.org/abs/2007.08506.
Signal: large-scale relational sketch constraints suitable for sketch-level reasoning.
Dataset: DeepCAD dataset and sequence formulation.
Link: https://arxiv.org/abs/2105.09492.
Signal: CAD operation-sequence representation (Transformer-friendly) and public dataset claim.
Dataset: Fusion 360 Gallery.
Link: https://arxiv.org/abs/2010.02392.
Signal: human design sequences and a programmatic reconstruction environment.

Research questions (non-actionable for now)¶

Which dataset best transfers to SolidWorks feature semantics (extrude/revolve/sheet metal/assembly)?
How do we map dataset operation vocabularies to SolidWorks MCP tool families and VBA fallback boundaries?
Can a hybrid model work better than one monolithic LMM?
Candidate architecture for that hybrid: retrieval-first planner + small CAD sequence model + strict tool router.
What is the minimum viable benchmark to compare open approach vs commercial copilots?
Candidate benchmark metric: family classification accuracy.
Candidate benchmark metric: first-feature correctness.
Candidate benchmark metric: correction count before valid build.
Candidate benchmark metric: rollback success rate after manual SolidWorks edits.

Suggested open architecture experiments¶

Retrieval-only baseline.
Method: no fine-tuned CAD model; rely on retrieval + classifier + tool gating.
Sequence-model augmentation.
Method: add a lightweight CAD-sequence model for checkpoint suggestions only.
Sketch-graph augmentation.
Method: use stored sketch graph snapshots from SQLite as structured evidence in prompts.
Human-edit reconciliation loop.
Method: measure plan recovery quality after out-of-band manual SolidWorks changes.

Data and governance checks to include in future research phase¶

Licensing and redistribution limits for each dataset.
PII/IP leakage risk in training examples.
Reproducibility of preprocessing pipelines and tokenization rules.
Traceability from generated plan step back to evidence chunk.

Rough-Cut App Workflow (CAD Assistant + Orchestrator)¶

This is the current target behavior for an interactive assistant that works with SolidWorks.

Phase 0: Idea capture / optional 2D-to-3D concept preview¶

User enters goal and constraints (function, dimensions, material, printer profile).
Assistant can generate a concept preview path (text + sketch concept), clearly marked provisional.
No direct model execution until family is accepted.

Phase 1: Inspect and classify¶

If model exists, agent runs inspect tools and classifies family with confidence + evidence.
User gets prompted to approve family or request re-inspection.

Phase 2: Orchestrated planning¶

Orchestrator generates 3-6 checkpoint plan.
Each checkpoint has allowed tools, success criteria, and rollback target.

Phase 3: Interactive execution and specialist sub-agents¶

For each checkpoint, orchestrator calls specialists as needed:
printability/tolerance agent
clearance/fit checker
VBA fallback reconstructor for unsupported families
Results are surfaced in Q/A style before execution:
"Here is what I found"
"Here are options and risks"
"Approve option ½/3 or request changes"

Phase 4: Human SolidWorks edit handoff and diff sync¶

User can pause and edit directly in SolidWorks.
User signals completion of manual edits.
Agent runs a diff pass against last accepted snapshot and offers reconciliation:
accept user edits and update remaining plan
patch only deltas to stay on goal
rollback to prior snapshot

Phase 5: Persist and learn¶

Persist session/checkpoints/tool calls/evidence/snapshots/sketch-graph records.
Store failures with root cause/remediation and reuse in future retrieval.

UI Composition Draft (Prefab-Style)¶

This composes the app directly from the rough-cut workflow so we can test behavior quickly.

UI layout¶

Panel A: goal and constraints
part intent
printer/material/nozzle/layer-height
required envelope and joint type
Panel B: inspect -> classify -> checkpoints
family card with confidence and warnings
checkpoint queue with approve/reject controls
execution status per checkpoint
Panel C: evidence and diffs
retrieved evidence list with source links
latest model snapshot and delta summary after manual SolidWorks edits
rollback target selector

Prompt and response flow (rough cut)¶

Prompt A (intent capture):
"Design a printable U-bracket assembly for [use case] with [constraints]."
Assistant response A:
asks only missing constraints and returns a normalized design brief
Prompt B (classification gate):
"Inspect current model state and classify family before any build actions."
Assistant response B:
family, confidence, evidence, warnings, and first 3 checkpoints
Prompt C (checkpoint execution):
"Execute checkpoint [n] with allowed tools only and report verification."
Assistant response C:
tool calls, verification summary, and next checkpoint options
Prompt D (manual edit sync):
"I finished edits in SolidWorks. Reconcile changes with the goal."
Assistant response D:
diff summary and options:
- accept manual edits and replan forward
- patch to realign with goal
- rollback to checkpoint snapshot

Try-now path¶

Step 1: install Prefab package in test environment.
pip install prefab-ui
Step 2: validate basic component composition using the welcome-card pattern (Card, Input, Rx) from Prefab docs/README.
Step 3: map the three-panel workflow above into a first internal UI prototype.
Step 4: wire panel actions to existing persistence APIs in history_db.py.
Step 5: run one end-to-end U-bracket session and capture:
checkpoint approvals
tool-call logs
snapshot diffs after manual SolidWorks edits

Current limitation:

Prefab docs page extraction was partial via crawler, so local app-run command details should be confirmed against the live Prefab docs and examples when implementing the first executable UI file.

UI Prototype Status (Implemented)¶

Prototype file created:

examples/prefab_cad_assistant/cad_assistant_dashboard.py

What it currently demonstrates:

Design-intent capture card and classification gate
Checkpoint queue with allowed-tool visibility
Context-window progress card (token budget visualization)
Evidence/retrieval panel
Manual SolidWorks edit sync card with diff/reconcile trigger

How to try it locally:

Install Prefab package in the active environment: python -m pip install prefab-ui.
Serve the prototype (repo-local venv): .venv\\Scripts\\prefab.exe serve examples/prefab_cad_assistant/cad_assistant_dashboard.py.
Export static output (repo-local venv): .venv\\Scripts\\prefab.exe export examples/prefab_cad_assistant/cad_assistant_dashboard.py.

First integration wiring targets:

Replace static checkpoint table rows with records from PlanCheckpoint
Write execute/rollback actions into ToolCallRecord and ModelStateSnapshot
Populate evidence panel from EvidenceLink
On manual sync, run diff workflow and persist reconciliation decision

1) Research: Prefab-Like Interactive App for Prompting and Agent Guidance¶

What Prefab appears to be¶

Based on current README/docs:

Python-first declarative UI framework (prefab-ui) with prebuilt components
Built for MCP app workflows and agent-generated interfaces
Compiles component tree to a protocol rendered by a bundled React frontend
Reactive state model in Python (no direct JS authoring required for many interactions)

Fit for this repository¶

High fit for a guided "human-in-the-loop CAD planning" UI because we need:

explicit user checkpoints (family classification confirmation)
quick choice chips/cards (hinge type, joint strategy, tolerancing mode)
evidence panes (retrieved docs, failures, feature tree snapshots)
execution gating controls (approve step, replan, delegate to VBA path)

Recommended UX architecture (MVP)¶

Left pane: "Intent + Constraints"

part goal, printer profile, envelope limits, material

Center pane: "Inspect -> Classify -> Plan"

feature-family card with confidence/evidence
first 3-6 planned operations only

Right pane: "Evidence + Errors"

retrieved chunks with provenance
prior similar failures and remediations

Bottom action rail:

Approve classification
Request alternatives
Execute next step
Roll back to checkpoint

Prefab POC scope (small)¶

Build one app focused on a single task: "Reconstruct existing part from feature tree"
Use mocked adapter outputs first
UI emits strict action payloads (approve/reject/replan/delegate)
Persist each decision and correction into DesignIntentSession

Risks¶

New UI stack adds maintenance overhead
Agent-generated UI still needs strict schema checks to avoid malformed controls
Should not bypass existing docs-first workflows; must be additive and optional

3) Research: RAG + Orchestration for CAD Idea -> Plan -> Build¶

Problem to solve¶

Tool count is high and LLMs degrade when selecting among many tools without tight context/routing. We need constrained retrieval and explicit delegation boundaries.

Recommended architecture (practical)¶

A. Orchestrator-first flow¶

Observe state (model/image/doc)
Classify feature family
Retrieve bounded evidence set
Produce short checkpoint plan
Request human approval at boundaries
Execute next checkpoint only
Verify and log outcomes

B. Retrieval tiers and storage¶

Structured evidence (highest priority)

feature tree snapshots
mass properties
tool call traces + normalized params
error records with root cause + remediation

Semi-structured docs

local how-tos, worked examples, tool-catalog pages

External/tutorial corpus

transcript chunks with operation/family tags

C. Suggested SQLite schema expansion (incremental)¶

design_sessions
id, objective, accepted_family, status, created_at, updated_at
plan_checkpoints
session_id, step_index, planned_action, approved_by_user, executed, result
tool_calls
session_id, checkpoint_id, tool_name, input_json, output_json, success, latency_ms
failures
session_id, tool_call_id, error_type, root_cause, remediation, recovered
evidence_links
session_id, checkpoint_id, source_type, source_id, relevance_score, rationale

D. Retrieval strategy¶

Hybrid retrieval with explicit filters:

lexical exact-match for API/tool names
embeddings for conceptual similarity
metadata filtering by feature family + document type
failure-memory retrieval keyed by tool and error class

E. Tool overload mitigation¶

Family-gated tool shortlist per step (do not expose all tools at once)
Confidence thresholds that force additional inspect steps
Delegate unsupported families (sheet metal/surface-heavy) to VBA-aware reconstructor
Prefer checkpoint plans of 3-6 steps over monolithic 20-step plans

F. SketchGraphs-informed direction¶

Treat sketches/features as relational structures, not only text/image:

index entities + constraints + parent-child links
store sketch completeness flags (under-constrained, unresolved references)
retrieve prior sketch patterns when proposing first feature

U-Bracket Assembly Demo Path (Target End-to-End Example)¶

Use this as the benchmark narrative for prompting and UX.

Goal¶

Prompt from idea to a robust U-bracket assembly plan with inspectable evidence and minimal rework.

Recommended prompt flow¶

User intent prompt

"Design a printable U-bracket assembly with a pin joint, sized for [load/use], printer [model/build volume], material [PLA/PETG/etc]."

Orchestrator asks for missing constraints

clearances, screw standard, wall thickness target, max envelope

Classification/planning prompt

identify if single-part bracket vs multi-part assembly
show confidence and evidence

Checkpoint plan prompt

show first 4-6 operations (base profile, extrusion, fillets, hole pattern, mating strategy)

Printability prompt

orientation, supports, tolerance band by joint type, split strategy if needed

Execute with approval

one checkpoint at a time, with rollback option

Verify

mass/size checks, fit assumptions, assembly mate sanity checks

Example high-value options UI for U-bracket¶

Bracket style: plain U / gusseted U / lightened pocketed U
Joint style: through pin / shoulder bolt / captive nut pivot
Fastener source strategy: user-provided dimensions vs catalog-driven placeholders
Print profile: speed draft / balanced / strength priority

Success criteria for this demo¶

Correct family and delegation chosen before modeling
User can inspect evidence for each key decision
At least one correction loop completes without losing model state
Final part(s) fit printer envelope and tolerance assumptions are explicit

Next Action Items (Prioritized)¶

Implement dedicated feature-tree reconstruction skill file and bind it to inspect-classify-delegate triggers.
Add a docs page with "family-gated tool shortlist" tables used by orchestrator prompts.
Extend SQLite schema to include design_sessions, plan_checkpoints, and tool_calls.
Add capture_part_state workflow output as structured JSON fixture for benchmark parts.
Create Prefab/FastMCP UI POC page for classification approval and checkpoint execution.
Build benchmark set entries for Baseball Bat, U-Joint Pin, Paper Airplane, and U-bracket assembly.
Add evaluation scripts for: family accuracy, first-feature correctness, correction count before valid build.
Add printability rubric (material/nozzle/layer-height aware) to checkpoint verification.

Open Questions¶

Should the first UI POC be pure Prefab, FastMCP native app UI, or a hybrid where Prefab is optional?
Which U-bracket variant should be canonical for the benchmark (single-part clamp bracket vs multi-part hinge bracket)?
Do we want a strict "no execute until family accepted" hard gate in code, or a soft warning gate during early experimentation?