solidworks_mcp.ui.local_llm¶
solidworks_mcp.ui.local_llm ¶
Local LLM integration helpers for the SolidWorks MCP UI.
Provides three layers of typed abstraction:
- Hardware – detect GPU VRAM / system RAM and pick the right Gemma tier. 2.
Config –
LocalLLMConfigis the single source of truth for endpoint, model name, and tier choice, shared by the UI, server endpoints, and the pydantic-ai agent runner. - Agent runner –
run_local_agent()mirrors_run_structured_agentinservice.pybut routes exclusively to a local Ollama server. Both accept anyBaseModelsubclass asresult_typeso callers get a fully typed, validated response regardless of which backend they use.
Model tiers (Gemma 4 family via Ollama's OpenAI-compatible API): small : gemma4:e2b (~0-4 GB VRAM / CPU-capable) — edge and smoke tests balanced: gemma4:e4b (~8 GB VRAM) — recommended default for local planning large : gemma4:26b (~18 GB VRAM) — workstation- class local evaluation
Usage::
from solidworks_mcp.ui.local_llm import probe_local_model, run_local_agent from solidworks_mcp.agents.schemas import ClarificationResponse
probe = await probe_local_model() # LocalModelProbeResult result = await run_local_agent( system_prompt="You are a SolidWorks CAD assistant.", user_prompt="How many sketch constraints are needed for a slot?", result_type=ClarificationResponse, config=probe.to_config(), )
Attributes¶
GEMMA_TIERS
module-attribute
¶
GEMMA_TIERS: dict[str, GemmaTierSpec] = {'small': GemmaTierSpec(ollama='gemma4:e2b', service='local:gemma4:e2b', label='Gemma 4 E2B (small — CPU or 4 GB VRAM)', min_vram_gb=0, min_ram_gb=8), 'balanced': GemmaTierSpec(ollama='gemma4:e4b', service='local:gemma4:e4b', label='Gemma 4 E4B (balanced — 8 GB VRAM)', min_vram_gb=8, min_ram_gb=16), 'large': GemmaTierSpec(ollama='gemma4:26b', service='local:gemma4:26b', label='Gemma 4 26B (large — 18 GB VRAM)', min_vram_gb=18, min_ram_gb=32)}
Classes¶
GemmaTierSpec ¶
Bases: BaseModel
Hardware and model metadata for a single Gemma inference tier.
Attributes:
| Name | Type | Description |
|---|---|---|
label |
str
|
The label value. |
min_ram_gb |
float
|
The min ram gb value. |
min_vram_gb |
float
|
The min vram gb value. |
ollama |
str
|
The ollama value. |
service |
str
|
The service value. |
LocalAgentResult ¶
Bases: BaseModel, Generic[_T]
Typed envelope wrapping a structured pydantic-ai agent response.
data holds the validated result_type instance; config echoes back the
LocalLLMConfig used so callers can log or audit provenance. Set success=False
and error when the agent returned a RecoverableFailure or raised an exception.
Attributes:
| Name | Type | Description |
|---|---|---|
config |
LocalLLMConfig
|
The config value. |
data |
Any
|
The data value. |
error |
str | None
|
The error value. |
retry_hint |
str | None
|
The retry hint value. |
success |
bool
|
The success value. |
LocalLLMConfig ¶
Bases: BaseModel
Runtime configuration for a local Ollama LLM connection.
Passed from the probe result into run_local_agent() or directly into
_build_agent_model() in service.py to keep settings consistent across all layers (UI
state, server endpoints, pydantic-ai agent runner).
Attributes:
| Name | Type | Description |
|---|---|---|
api_key |
str
|
The api key value. |
endpoint |
str
|
The endpoint value. |
ollama_model |
str
|
The ollama model value. |
openai_endpoint |
str
|
The openai endpoint value. |
service_model |
str
|
The service model value. |
tier |
Literal['small', 'balanced', 'large']
|
The tier value. |
Functions¶
from_env
classmethod
¶
Build config from environment variables, falling back to defaults.
Returns:
| Name | Type | Description |
|---|---|---|
LocalLLMConfig |
LocalLLMConfig
|
The result produced by the operation. |
Source code in src/solidworks_mcp/ui/local_llm.py
LocalModelProbeResult ¶
Bases: BaseModel
Full hardware-detection and Ollama availability result.
Returned by probe_local_model() and serialised as the JSON response from GET
/api/ui/local-model/probe. The to_config() helper converts directly into a
LocalLLMConfig ready for run_local_agent().
Attributes:
| Name | Type | Description |
|---|---|---|
all_tiers |
dict[str, str]
|
The all tiers value. |
available |
bool
|
The available value. |
endpoint |
str
|
The endpoint value. |
label |
str
|
The label value. |
ollama_model |
str
|
The ollama model value. |
openai_endpoint |
str
|
The openai endpoint value. |
pull_command |
str
|
The pull command value. |
pulled_models |
list[str]
|
The pulled models value. |
ram_gb |
float
|
The ram gb value. |
service_model |
str
|
The service model value. |
status_message |
str
|
The status message value. |
tier |
Literal['small', 'balanced', 'large']
|
The tier value. |
tier_already_pulled |
bool
|
The tier already pulled value. |
vram_gb |
float
|
The vram gb value. |
Functions¶
to_config ¶
Convert probe result into a ready-to-use LocalLLMConfig.
Returns:
| Name | Type | Description |
|---|---|---|
LocalLLMConfig |
LocalLLMConfig
|
The result produced by the operation. |
Source code in src/solidworks_mcp/ui/local_llm.py
LocalModelPullRequest ¶
Bases: BaseModel
Request body for POST /api/ui/local-model/pull.
Attributes:
| Name | Type | Description |
|---|---|---|
endpoint |
str | None
|
The endpoint value. |
model |
str
|
The model value. |
LocalModelPullResult ¶
Bases: BaseModel
Result from POST /api/ui/local-model/pull.
Attributes:
| Name | Type | Description |
|---|---|---|
error |
str | None
|
The error value. |
model |
str
|
The model value. |
queued |
bool
|
The queued value. |
response |
dict[str, Any] | None
|
The response value. |
LocalModelQueryRequest ¶
Bases: BaseModel
Request body for POST /api/ui/local-model/query.
Attributes:
| Name | Type | Description |
|---|---|---|
endpoint |
str | None
|
The endpoint value. |
model |
str | None
|
The model value. |
prompt |
str
|
The prompt value. |
system_prompt |
str
|
The system prompt value. |
Functions¶
_detect_gpu_vram_gb ¶
Return best-effort GPU VRAM estimate in GB, or 0.0 on failure.
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The computed numeric result. |
Source code in src/solidworks_mcp/ui/local_llm.py
_detect_system_ram_gb ¶
Return total system RAM in GB.
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The computed numeric result. |
Source code in src/solidworks_mcp/ui/local_llm.py
_ollama_health
async
¶
Return True if Ollama HTTP server is responding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoint
|
str
|
The endpoint value. Defaults to OLLAMA_DEFAULT_ENDPOINT. |
OLLAMA_DEFAULT_ENDPOINT
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if ollama health, otherwise False. |
Source code in src/solidworks_mcp/ui/local_llm.py
_ollama_list_models
async
¶
Return list of model names currently pulled in Ollama.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoint
|
str
|
The endpoint value. Defaults to OLLAMA_DEFAULT_ENDPOINT. |
OLLAMA_DEFAULT_ENDPOINT
|
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list containing the resulting items. |
Source code in src/solidworks_mcp/ui/local_llm.py
probe_local_model
async
¶
Probe Ollama for availability and return a typed recommendation result.
The returned LocalModelProbeResult can be forwarded directly as a FastAPI JSON
response (it is a BaseModel). Call .to_config() on the result to build a
LocalLLMConfig for run_local_agent().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoint
|
str | None
|
The endpoint value. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LocalModelProbeResult |
LocalModelProbeResult
|
The result produced by the operation. |
Source code in src/solidworks_mcp/ui/local_llm.py
pull_ollama_model
async
¶
Trigger an Ollama model pull. Runs in a thread; returns immediately.
Returns a typed LocalModelPullResult with queued=True on success.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
The model value. |
required |
endpoint
|
str | None
|
The endpoint value. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LocalModelPullResult |
LocalModelPullResult
|
The result produced by the operation. |
Source code in src/solidworks_mcp/ui/local_llm.py
recommend_model_tier ¶
Return 'small' | 'balanced' | 'large' based on available hardware.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vram_gb
|
float
|
The vram gb value. Defaults to 0.0. |
0.0
|
ram_gb
|
float
|
The ram gb value. Defaults to 0.0. |
0.0
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The resulting text value. |
Source code in src/solidworks_mcp/ui/local_llm.py
run_local_agent
async
¶
run_local_agent(*, system_prompt: str, user_prompt: str, result_type: type[_T], config: LocalLLMConfig | None = None, rag_query: str | None = None, rag_namespace: str = 'solidworks-api-docs') -> LocalAgentResult[_T]
Run a pydantic-ai Agent against the local Ollama server and return a.
typed LocalAgentResult.
This mirrors _run_structured_agent in service.py but is self- contained in this
module so any layer (UI route, service function, or CLI) can call local inference
without importing the full service graph.
Parameters ---------- system_prompt: Instruction preamble for the LLM. user_prompt: The
concrete question or task. result_type: A BaseModel subclass. pydantic-ai validates
the LLM output against this schema and retries automatically on parse failures. config:
Connection settings. Defaults to LocalLLMConfig.from_env(). rag_query: If provided,
the FAISS solidworks-api-docs namespace is queried with this string and the top
results are prepended to system_prompt as grounded API context for the model. Pass
the same text as user_prompt for a simple "augment with API docs" pattern, or a more
specific sub-query for targeted retrieval. rag_namespace: FAISS namespace to query when
rag_query is set. Defaults to "solidworks-api-docs" (the COM/VBA surface
index).
Returns ------- LocalAgentResult[_T] success=True with data set to a validated
result_type instance, or success=False with an error message.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
system_prompt
|
str
|
The system prompt value. |
required |
user_prompt
|
str
|
The user prompt value. |
required |
result_type
|
type[_T]
|
The result type value. |
required |
config
|
LocalLLMConfig | None
|
Configuration values for the operation. Defaults to None. |
None
|
rag_query
|
str | None
|
The rag query value. Defaults to None. |
None
|
rag_namespace
|
str
|
The rag namespace value. Defaults to "solidworks-api-docs". |
'solidworks-api-docs'
|
Returns:
| Type | Description |
|---|---|
LocalAgentResult[_T]
|
LocalAgentResult[_T]: The result produced by the operation. |
Source code in src/solidworks_mcp/ui/local_llm.py
583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 | |