Interaction Modes
How the LLM Chooses
Each time an MCP session starts, the server sends a block of instructions via WithInstructions(). These instructions guide the LLM toward the right tool style for the current task. The text adapts to whichever mode is active — if a style isn't registered, it isn't mentioned.
In all mode (the default), the LLM sees:
This server provides access to the neuroflash API.
- Direct tools (list_twins, get_brand_voice, generate_text, etc.): Use for single, specific operations when you know exactly what you need.
- execute_code + search_api: Use for multi-step workflows requiring 3+ API calls, conditional logic, or data transformation. Write Python code — the server executes it in a sandbox.
- discover → query → compare: Use when exploring what's available or comparing data across resources. Start with discover if unsure which API to call.
Choose the simplest style that fits the task.
In code mode, only the execute_code bullet appears. In tools mode, only the direct tools bullet. This prevents the LLM from trying to use tools that aren't registered and keeps the instruction overhead minimal.
The WithInstructions() call is part of the MCP server initialization. The server builds the instruction string dynamically based on the configured MCP_MODE, so the LLM always receives guidance that matches the tools it can actually use. The user doesn't need to think about modes at all — they just ask their question and the LLM picks the best approach.
Traditional Mode
MCP_MODE=tools — 46 tools — ~10k tokens
Each neuroflash API endpoint is registered as a separate MCP tool with its own full JSON schema, including all parameter names, types, descriptions, and required flags. The LLM calls tools like list_brand_voices, get_twin, generate_text, and get_workspace_quotas directly.
This mode is the best fit for simple, single-step tasks where the user knows exactly what they want. It has the highest token cost because all 46 schemas are injected into the context window on every request.
Code Mode BETA
MCP_MODE=code — 2 tools — ~1k tokens
Instead of 46 individual tool schemas, Code Mode gives the LLM two tools and a compact 47-line API reference. The LLM writes Python code; the server executes it.
search_api
Keyword search against the endpoint catalog. The LLM uses this to discover available endpoints before writing code. No API calls are made — it reads from the in-process registry.
search_api(query="brand voice list")
→ nf.brand_voice.list_brand_voices(workspace_id) → List all brand voices in a workspace
→ nf.brand_voice.get_brand_voice(workspace_id, brand_voice_id) → ...execute_code
The LLM writes Python code using the nf module. The code runs in a sandboxed subprocess. All API calls go through nf.* functions; there is no direct network access from the sandbox.
import nf
voices = nf.brand_voice.list_brand_voices(workspace_id="ws-123")
quotas = nf.usage.get_workspace_quotas(workspace_id="ws-123")
print(f"Brand voices: {len(voices)}")
print(f"Words used: {quotas['words_used']}/{quotas['words_limit']}")print() output is returned as the tool result.
The code runs in an isolated sandbox with no network access or credentials — all API calls are validated and executed by the server.
Why Fewer Tokens
2 tools + the 47-line api_reference.txt (1k tokens) versus 46 full JSON schemas (10k tokens). The token saving is constant regardless of how many endpoints are added.
Exploratory Mode BETA
MCP_MODE=layered — 3 tools — ~500 tokens
Exploratory Mode uses progressive disclosure. The LLM loads context about the API incrementally — it does not need to know all 46 endpoints upfront. Three tools work together in sequence.
discover
Returns available domains and their actions. Without a domain argument, returns all 7 domains with action name lists. With a domain argument, returns full action details including method, description, and parameters.
discover()
→ 7 domains: digital_twins, workspaces, brand_voice, audience, content, image, usage
discover(domain="usage")
→ domain: usage
→ actions: list_usage_types, get_workspace_quotas, get_workspace_quota
(each with method, description, and parameter list)query
Executes a single API call by specifying domain, action, and parameters.
query(domain="usage", action="get_workspace_quotas", params='{"workspace_id": "ws-123"}')
→ { "words_used": 12400, "words_limit": 50000, ... }compare
Executes multiple queries concurrently and returns all results in order. This saves round-trips for tasks that need data from several sources at once. Up to 5 queries run in parallel. If one query fails, the others still complete — partial failure is handled gracefully.
compare(queries='[
{"label": "Team A", "domain": "usage", "action": "get_workspace_quotas",
"params": {"workspace_id": "ws-1"}},
{"label": "Team B", "domain": "usage", "action": "get_workspace_quotas",
"params": {"workspace_id": "ws-2"}}
]')
→ [
{"label": "Team A", "status": "ok", "data": {...}},
{"label": "Team B", "status": "ok", "data": {...}}
]A Typical Exploratory Flow
discover() → 7 domains
discover(domain="brand_voice") → 8 actions with params
query(domain="brand_voice",
action="list_brand_voices",
params='{"workspace_id":"ws-123"}')
→ list of brand voices
compare(queries=[...]) → side-by-side results for multiple workspacesSecurity
All modes enforce the same security model: authentication via OAuth, input validation, and rate limiting through nf-gateway.
Configuring Modes
MCP_MODE selects which tools are registered at startup. The mapping is:
MCP_MODE | Registers |
|---|---|
tools | 46 traditional tools |
code | search_api + execute_code |
layered | discover + query + compare |
smart | All 5 of the above alternative tools (no traditional tools) |
all | All 51 tools (traditional + both alternative sets) |
smart is the highest-efficiency choice for production use when full flexibility isn't needed. all is the default and gives the LLM every available option.
See the Setup page for how to configure MCP_MODE.