Interaction Modes
How the LLM Chooses
Each time an MCP session starts, the server sends a block of instructions via WithInstructions(). These instructions guide the LLM toward the right tool style for the current task. The text adapts to whichever mode is active — if a style isn't registered, it isn't mentioned.
In all mode (the default), the LLM sees:
This server provides access to the neuroflash API.
- Direct tools (list_twins, get_brand_voice, generate_text, etc.): Use for single, specific operations when you know exactly what you need.
- execute_plan + search_api: Use for multi-step workflows requiring 3+ API calls, conditional logic, parallel fan-out, or data transformation. Submit a typed JSON plan — the server executes it server-side.
- discover → query → compare: Use when exploring what's available or comparing data across resources. Start with discover if unsure which API to call.
Choose the simplest style that fits the task.
In plan mode, only the execute_plan bullet appears. In tools mode, only the direct tools bullet. This prevents the LLM from trying to use tools that aren't registered and keeps the instruction overhead minimal.
The WithInstructions() call is part of the MCP server initialization. The server builds the instruction string dynamically based on the configured MCP_MODE, so the LLM always receives guidance that matches the tools it can actually use. The user doesn't need to think about modes at all — they just ask their question and the LLM picks the best approach.
Traditional Mode
MCP_MODE=tools — 90 tools — ~17k tokens
Each neuroflash API endpoint is registered as a separate MCP tool with its own full JSON schema, including all parameter names, types, descriptions, and required flags. The LLM calls tools like list_brand_voices, get_twin, generate_text, and get_workspace_quotas directly. execute_plan and search_api are also registered so the LLM can fall back to a plan when a workflow needs more than one call.
This mode is the best fit for simple, single-step tasks where the user knows exactly what they want. It has the highest token cost because all 90 schemas are injected into the context window on every request.
Plan Mode
MCP_MODE=plan — 2 tools — ~2k tokens
Plan Mode replaces multi-step orchestration with a single typed JSON DSL. Instead of round-tripping per call, the LLM submits one plan and the server runs every step server-side. This is the recommended mode for any workflow that needs more than one API call.
search_api
Keyword search against the endpoint catalog. The LLM uses this to discover available endpoints before writing a plan. No API calls are made — it reads from the in-process registry.
search_api(query="brand voice list")
→ GET /api/brand-voice-service/v1/workspaces/{workspace_id}/brand-voices → list brand voices
→ GET /api/brand-voice-service/v1/workspaces/{workspace_id}/brand-voices/{brand_voice_id} → get oneexecute_plan
The LLM submits a typed JSON plan. The server validates it against the endpoint registry, then evaluates each step in order — calling the neuroflash API, transforming intermediate values, branching on conditions, fanning out in parallel, or looping over collections.
{
"version": "1",
"steps": [
{"id":"bv","call":{"method":"GET","path":"/api/brand-voice-service/v1/workspaces/{workspace_id}/brand-voices"},
"args":{"workspace_id":"$ctx.workspace_id"}},
{"id":"first","let":{"op":"first","from":"$bv.data"}},
{"id":"post","call":{"method":"POST","path":"/api/ds-prototypes/content_generation/chat/completions"},
"args":{"workspace_id":"$ctx.workspace_id","brand_voice_id":"$first.id",
"prompt":"Write a short blog post about Claude Desktop."}}
],
"return": "$post"
}The plan is precompiled before execution — paths are matched against the registry, references are resolved, and everything is bounded by hard limits before any API call is made.
Step Kinds
| Step | Purpose |
|---|---|
call | Invoke a registered API endpoint with substituted args. |
let | Apply a transformation op to a previous value (pick, map, filter, sort_by, take, count, first, last, format). |
if / then / else | Branch on a predicate (eq, gt, in, exists, and, or, not, …). |
foreach | Iterate up to 20 times over a list, accumulating results. |
parallel | Fan out up to 5 independent calls concurrently. |
Limits and Safety
- Total plan ≤ 32 KiB, ≤ 50 steps, nesting depth ≤ 5.
- Total runtime 60s, per-call 30s.
- Paths must match a registered endpoint exactly (full service prefix included). Invalid paths are rejected at validation time.
- Plans run server-side — no sandbox boot, no Python runtime, no extra network surface.
Why Fewer Tokens
2 tools + the compact endpoint reference (2k tokens) versus 88 full JSON schemas (17k tokens). The token saving is constant regardless of how many endpoints are added, and the LLM doesn't pay round-trip cost for each step the way it would with sequential direct tool calls.
Exploratory Mode BETA
MCP_MODE=layered — 3 tools — ~500 tokens
Exploratory Mode uses progressive disclosure. The LLM loads context about the API incrementally — it does not need to know all 88 endpoints upfront. Three tools work together in sequence.
discover
Returns available domains and their actions. Without a domain argument, returns all 7 domains with action name lists. With a domain argument, returns full action details including method, description, and parameters.
discover()
→ 7 domains: digital_twins, workspaces, brand_voice, audience, content, image, usage
discover(domain="usage")
→ domain: usage
→ actions: list_usage_types, get_workspace_quotas, get_workspace_quota
(each with method, description, and parameter list)query
Executes a single API call by specifying domain, action, and parameters.
query(domain="usage", action="get_workspace_quotas", params='{"workspace_id": "ws-123"}')
→ { "words_used": 12400, "words_limit": 50000, ... }compare
Executes multiple queries concurrently and returns all results in order. This saves round-trips for tasks that need data from several sources at once. Up to 5 queries run in parallel. If one query fails, the others still complete — partial failure is handled gracefully.
compare(queries='[
{"label": "Team A", "domain": "usage", "action": "get_workspace_quotas",
"params": {"workspace_id": "ws-1"}},
{"label": "Team B", "domain": "usage", "action": "get_workspace_quotas",
"params": {"workspace_id": "ws-2"}}
]')
→ [
{"label": "Team A", "status": "ok", "data": {...}},
{"label": "Team B", "status": "ok", "data": {...}}
]A Typical Exploratory Flow
discover() → 7 domains
discover(domain="brand_voice") → 19 actions with params
query(domain="brand_voice",
action="list_brand_voices",
params='{"workspace_id":"ws-123"}')
→ list of brand voices
compare(queries=[...]) → side-by-side results for multiple workspacesSecurity
All modes enforce the same security model: authentication via OAuth, input validation, and rate limiting through nf-gateway.
Configuring Modes
MCP_MODE selects which tools are registered at startup. The mapping is:
MCP_MODE | Registers |
|---|---|
tools | 88 traditional tools + execute_plan + search_api |
plan | execute_plan + search_api |
layered | discover + query + compare + execute_plan + search_api |
all | Everything (default) |
plan is the highest-efficiency choice for production use when full flexibility isn't needed. all is the default and registers every option.
See the Setup page for how to configure MCP_MODE.