Interactive Design Foundation Plan¶
Objective¶
Build a reliable human-in-the-loop SolidWorks design workflow where an LLM can:
- inspect an existing part, screenshot, or mock-up drawing
- retrieve the right CAD knowledge instead of guessing
- classify the feature family before building
- propose a sequence the human can critique and refine
- execute only after the workflow is grounded in evidence
- learn from failures so the system improves across parts and users
This is explicitly not a one-off “LLM drives SolidWorks” demo. The goal is a reusable foundation for others.
Progress tracking note:
- Active execution tracker:
docs/planning/INTERACTIVE_DESIGN_FOUNDATION_TRACKER.md
Problem Statement¶
The Paper Airplane failure exposed three root issues:
- Visual similarity is not enough. A sheet metal airplane and a flat plate silhouette can look similar in one view while requiring completely different feature sequences.
- Raw tool access is not enough. An LLM can call
create_sketchandcreate_extrusion, but still misunderstand the modeling root and produce the wrong dependency chain. - Docs-only guidance is not enough. We need code-backed classification, retrieval, and evaluation, not just hand-written prompt advice.
The system must learn the difference between:
- “looks like a bat, so revolve is likely correct”
- “looks like a thin plate, but the feature tree proves it is sheet metal”
- “looks like an assembly, so part-level modeling should stop and assembly planning should begin”
Research Takeaways¶
Retrieval-Augmented Generation (Lewis et al., 2020)¶
The main practical takeaway is that parametric LLM memory is not enough for knowledge-intensive tasks. For SolidWorks, that means the model should retrieve from explicit sources rather than rely on latent memory for API signatures, feature-order rules, or CAD best practices.
Applied here:
- keep a dense and indexable knowledge base of API docs, worked examples, failure patterns, and sample-part audits
- retrieve the most relevant evidence at planning time
- include provenance in the answer so the human can inspect why the plan was chosen
Source:
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks— https://arxiv.org/abs/2005.11401
ReAct (Yao et al., 2022)¶
The main practical takeaway is that reasoning and acting should be interleaved. In our setting, the model should alternate between thought and observation:
- think: “This might be a revolve family”
- act: call
list_features,get_mass_properties,classify_feature_tree - observe: revise the plan from the returned evidence
That is better than either pure chain-of-thought or pure tool execution.
Applied here:
- use inspect-classify-delegate loops instead of “generate a full plan from one prompt”
- keep agent trajectories reviewable by the human
- make failure recovery editable at the reasoning step, not only after geometry is already wrong
Sources:
ReAct: Synergizing Reasoning and Acting in Language Models— https://arxiv.org/abs/2210.03629- project page — https://react-lm.github.io/
SketchGraphs (Seff et al., 2020)¶
The main practical takeaway is that CAD sketches are not just images. They are relational geometry graphs: entities plus constraints plus downstream construction meaning. That supports a design where we index and reason over structured geometry instead of only screenshots and prose.
Applied here:
- store sketches and feature trees as structured records, not just text blobs
- treat lines, arcs, dimensions, constraints, planes, and parent-child relationships as retrieval targets
- use graph-like evidence to classify families and suggest missing constraints
Source:
SketchGraphs: A Large-Scale Dataset for Modeling Relational Geometry in Computer-Aided Design— https://arxiv.org/abs/2007.08506
Foundation Principles¶
- Read before write.
- Classify before plan.
- Retrieve before guess.
- Keep the human in the loop at decision boundaries.
- Evaluate on real sample parts and practical print-ready parts.
- Preserve provenance so users can inspect why the system chose a path.
Proposed System Architecture¶
User image / part / drawing
↓
Observation pass
- open_model / image upload
- get_model_info
- list_features
- get_mass_properties
- classify_feature_tree
↓
Knowledge retrieval
- API docs
- sample-part audits
- tool docs
- error-memory database
- video transcript chunks
↓
Interactive planning
- propose family
- explain evidence
- present next 3–10 steps
- human approves or corrects
↓
Execution
- direct MCP or VBA-backed workflow
↓
Verification
- feature-family match
- mass properties
- image equivalence
- user review
↓
Memory
- save failures, corrections, good plans, and demo artifacts
Knowledge Base Design¶
Tier 1 — Structured Local Evidence¶
This should be the highest-priority retrieval source because it is closest to the actual SolidWorks environment.
- API signatures and tool docs from the repo and generated SolidWorks docs index
- feature-tree snapshots from sample parts
- mass-property snapshots and image exports from validated runs
- tool failure catalog from
.solidworks_mcp/agent_memory.sqlite3 - prompt/plan pairs that successfully rebuilt known parts
Tier 2 — Semi-Structured Learning Assets¶
- docs pages in this repo
- how-to guides written around real sample parts
- macro snippets and VBA generation examples
- evaluation notes from integration runs
Tier 3 — External Instructional Content¶
- SolidWorks how-to video transcripts
- blog/tutorial text chunks about specific feature families
- external CAD best-practice notes for sheet metal, assemblies, and printability
For videos, do not index the raw video URL as one chunk. Build an index of:
- transcript chunk
- inferred operation type
- linked feature family
- timestamp range
- screenshot or slide thumbnail if available
That makes retrieval action-oriented instead of “generic video search”.
Retrieval Strategy¶
Use hybrid retrieval, not embeddings alone.
What to index¶
- exact API names:
FeatureRevolve2,FeatureExtrusion3,Base-Flange,Sketched Bend - repo tool names:
list_features,classify_feature_tree,generate_vba_part_modeling - feature-family tags:
revolve,extrude,sheet_metal,assembly,advanced_solid - part-domain tags:
baseball_bat,u_joint,sprinkler,bracket,enclosure
Retrieval modes¶
- lexical lookup for exact tool/API names
- embedding search for conceptual similarity
- graph/metadata filtering for feature families and document types
- failure-memory lookup for known tool mistakes
Retrieval outputs¶
Every planning cycle should return:
- top evidence used
- why it was selected
- confidence level
- any contradictory evidence
Interactive Human-LLM Design Loop¶
Phase A — Observe¶
If the original model exists:
open_modelget_model_infolist_features(include_suppressed=True)get_mass_propertiesclassify_feature_tree
If only an image exists:
- describe the geometry provisionally
- explicitly mark the result as provisional
- retrieve similar feature-family examples from the knowledge base
- ask the human for one correction or confirmation before build planning
Phase B — Classify¶
Output should be concise and inspectable:
- likely family
- confidence
- evidence
- warnings
- recommended workflow
Phase C — Plan Together¶
The plan should be incremental, not monolithic.
Good:
- “I think this is a revolve family with medium confidence. Here are the first four steps and what evidence they depend on.”
Bad:
- “Here is a 20-step build plan” before the modeling root is agreed.
Phase D — Execute Conservatively¶
Execute only after:
- the family is accepted
- the modeling root is accepted
- unsupported features are clearly routed to VBA or deferred
Phase E — Verify and Learn¶
Capture:
- final family
- whether the classifier was right
- which retrieval items were useful
- which user correction mattered most
- which tool call or assumption failed
Data Models We Should Add¶
FeatureTreeSnapshot¶
- document type
- active configuration
- feature list
- evidence confidence
FeatureFamilyClassification¶
- family
- confidence
- evidence
- warnings
- recommended workflow
DesignIntentSession¶
- user goal
- retrieved evidence
- accepted classification
- accepted plan checkpoints
- execution results
- human corrections
RetrievalEvidence¶
- source type
- source id/path/url
- chunk text or structured data
- relevance score
- why selected
Evaluation Plan¶
We should not call this successful until it passes a repeatable eval set.
Evaluation dimensions¶
- family classification accuracy
- correct delegation path accuracy
- first-feature correctness
- parent-child dependency preservation
- tool-call success rate
- human correction count before first valid build
Suggested benchmark set¶
Easier / should work first¶
- Baseball Bat
- U-Joint Pin
- simple bracket
- print-ready battery cover or snap-fit enclosure
Intermediate¶
- U-Joint yoke or spider
- garden trowel approximation with explicit VBA boundary
- revolve + cut combinations
Advanced¶
- Paper Airplane sheet metal workflow
- Sprinkler sub-parts
- full U-Joint assembly
Demo Roadmap¶
Demo 1 — Baseball Bat¶
Why first:
- clear revolve family
- easy to inspect manually
- direct MCP path exists
- good example of “classifier confidence high, direct MCP okay”
Demo 2 — U-Joint Pin or Yoke¶
Why next:
- introduces more geometric ambiguity
- still bounded enough for interactive planning
Demo 3 — Practical 3D-Printed Part¶
Recommended practical candidate:
- mounting bracket
- snap-fit battery cover
- Raspberry Pi enclosure lid or hinge
These are useful because they combine CAD correctness with printability and are easier to explain to users than some SolidWorks sample-library parts.
Demo 4 — Sprinkler or Full U-Joint¶
Why later:
- good stress test for assembly planning
- valuable as a showcase once the foundation is stable
Implementation Phases¶
Phase 1 — Grounding and Delegation¶
- dedicated feature-tree reconstruction prompt or skill
- code-backed feature-family classifier around
list_features - docs updated to teach inspect-classify-delegate
- Baseball Bat walkthrough migrated to the new flow
Phase 2 — Structured Capture¶
capture_part_stateworkflow- structured feature-tree JSON
- named-sketch geometry read tools
- eval fixtures for sample parts
Phase 3 — Knowledge Base¶
- local retrieval index over docs, audits, failures, and tool docs
- video transcript ingestion with timestamps and operation tags
- source provenance shown in agent answers
Phase 4 — Interactive Design Sessions¶
- persistent
DesignIntentSessionmemory - user correction tracking
- critique-and-replan loop
- executable checkpoint plans instead of one-shot plans
Immediate Next Steps¶
These are the concrete next steps to execute now:
- Add a dedicated feature-tree reconstruction skill or prompt so the main agent can invoke the same inspect-classify-delegate workflow with less ambiguity.
- Update the docs nav and any remaining sample/tutorial pages that still assume silhouette-first reconstruction.
- Build a small structured “feature family classifier” helper around
list_featuresso delegation can become code-backed instead of doc-backed.
Implementation status (2026-04-11):
- Step 1 is implemented via
.github/skills/feature-tree-reconstruction/SKILL.mdand active routing docs. - Step 2 is partially implemented: docs navigation and the family-gated page are updated; remaining tutorial migrations continue.
- Step 3 is implemented as
src/solidworks_mcp/utils/feature_tree_classifier.pyand wired into tooling; evaluation fixtures remain in progress.
Current continuation focus:
- Reliability hardening in the interactive dashboard: readiness checks, guarded structured rendering, and inline remediation surfaces.
- User-guided planning UX: explicit flow (
Goal -> Assumptions -> Clarify -> Plan -> Execute) with persistent assumptions editing. - Provider/model controls for pydantic-ai paths, including local Gemma-family profile defaults and OpenAI-compatible local endpoint routing.
Additional near-term steps:
- Add
capture_part_stateas a first-class workflow target in the roadmap and tooling plans. - Build a small benchmark set around Baseball Bat, U-Joint Pin, Paper Airplane, and one practical print-ready part.
- Start indexing validated feature-tree audits and failure remediations as the first local retrieval corpus.
- Prototype transcript ingestion for a small number of SolidWorks how-to videos, indexed by operation type and feature family instead of by whole-video blobs.
- Select and highlight items from the feature tree so the LLM receives explicit modeling targets. For example: "make the baseplate @baseplate-boss bigger by 2.0 mm on the X axis" should pass
@baseplate-bossinto context so the model can choose the correct MCP tool and apply it to the correct sketch or feature. In assemblies, selecting the parts to mate in the tree should similarly provide grounded references so mate operations are applied to the intended components.
Success Criteria¶
We should consider the foundation solid when:
- the agent can classify the correct family for the benchmark set with high accuracy
- the human can see the evidence behind the chosen workflow
- simpler parts like Baseball Bat and U-Joint Pin can be rebuilt through an inspect-classify-delegate loop with minimal correction
- complex parts like Paper Airplane fail safely into “inspect more” or “VBA-backed sheet metal plan” instead of producing confidently wrong geometry
- the same foundation can support both sample-part reconstruction and practical 3D-printable design sessions