Interaction Modes

How the LLM Chooses

Each time an MCP session starts, the server sends a block of instructions via WithInstructions(). These instructions guide the LLM toward the right tool style for the current task. The text adapts to whichever mode is active — if a style isn't registered, it isn't mentioned.

In all mode (the default), the LLM sees:

This server provides access to the neuroflash API.

Choose the simplest style that fits the task.

In code mode, only the execute_code bullet appears. In tools mode, only the direct tools bullet. This prevents the LLM from trying to use tools that aren't registered and keeps the instruction overhead minimal.

The WithInstructions() call is part of the MCP server initialization. The server builds the instruction string dynamically based on the configured MCP_MODE, so the LLM always receives guidance that matches the tools it can actually use. The user doesn't need to think about modes at all — they just ask their question and the LLM picks the best approach.


Traditional Mode

MCP_MODE=tools — 46 tools — ~10k tokens

Each neuroflash API endpoint is registered as a separate MCP tool with its own full JSON schema, including all parameter names, types, descriptions, and required flags. The LLM calls tools like list_brand_voices, get_twin, generate_text, and get_workspace_quotas directly.

This mode is the best fit for simple, single-step tasks where the user knows exactly what they want. It has the highest token cost because all 46 schemas are injected into the context window on every request.


Code Mode BETA

MCP_MODE=code — 2 tools — ~1k tokens

Instead of 46 individual tool schemas, Code Mode gives the LLM two tools and a compact 47-line API reference. The LLM writes Python code; the server executes it.

search_api

Keyword search against the endpoint catalog. The LLM uses this to discover available endpoints before writing code. No API calls are made — it reads from the in-process registry.

text
search_api(query="brand voice list")
→ nf.brand_voice.list_brand_voices(workspace_id) → List all brand voices in a workspace
→ nf.brand_voice.get_brand_voice(workspace_id, brand_voice_id) → ...

execute_code

The LLM writes Python code using the nf module. The code runs in a sandboxed subprocess. All API calls go through nf.* functions; there is no direct network access from the sandbox.

python
import nf

voices = nf.brand_voice.list_brand_voices(workspace_id="ws-123")
quotas = nf.usage.get_workspace_quotas(workspace_id="ws-123")

print(f"Brand voices: {len(voices)}")
print(f"Words used: {quotas['words_used']}/{quotas['words_limit']}")

print() output is returned as the tool result.

The code runs in an isolated sandbox with no network access or credentials — all API calls are validated and executed by the server.

Why Fewer Tokens

2 tools + the 47-line api_reference.txt (1k tokens) versus 46 full JSON schemas (10k tokens). The token saving is constant regardless of how many endpoints are added.


Exploratory Mode BETA

MCP_MODE=layered — 3 tools — ~500 tokens

Exploratory Mode uses progressive disclosure. The LLM loads context about the API incrementally — it does not need to know all 46 endpoints upfront. Three tools work together in sequence.

discover

Returns available domains and their actions. Without a domain argument, returns all 7 domains with action name lists. With a domain argument, returns full action details including method, description, and parameters.

text
discover()
→ 7 domains: digital_twins, workspaces, brand_voice, audience, content, image, usage

discover(domain="usage")
→ domain: usage
→ actions: list_usage_types, get_workspace_quotas, get_workspace_quota
   (each with method, description, and parameter list)

query

Executes a single API call by specifying domain, action, and parameters.

text
query(domain="usage", action="get_workspace_quotas", params='{"workspace_id": "ws-123"}')
→ { "words_used": 12400, "words_limit": 50000, ... }

compare

Executes multiple queries concurrently and returns all results in order. This saves round-trips for tasks that need data from several sources at once. Up to 5 queries run in parallel. If one query fails, the others still complete — partial failure is handled gracefully.

text
compare(queries='[
  {"label": "Team A", "domain": "usage", "action": "get_workspace_quotas",
   "params": {"workspace_id": "ws-1"}},
  {"label": "Team B", "domain": "usage", "action": "get_workspace_quotas",
   "params": {"workspace_id": "ws-2"}}
]')
→ [
    {"label": "Team A", "status": "ok", "data": {...}},
    {"label": "Team B", "status": "ok", "data": {...}}
  ]

A Typical Exploratory Flow

text
discover()                          → 7 domains
discover(domain="brand_voice")      → 8 actions with params
query(domain="brand_voice",
      action="list_brand_voices",
      params='{"workspace_id":"ws-123"}')
                                    → list of brand voices
compare(queries=[...])              → side-by-side results for multiple workspaces

Security

All modes enforce the same security model: authentication via OAuth, input validation, and rate limiting through nf-gateway.


Configuring Modes

MCP_MODE selects which tools are registered at startup. The mapping is:

MCP_MODERegisters
tools46 traditional tools
codesearch_api + execute_code
layereddiscover + query + compare
smartAll 5 of the above alternative tools (no traditional tools)
allAll 51 tools (traditional + both alternative sets)

smart is the highest-efficiency choice for production use when full flexibility isn't needed. all is the default and gives the LLM every available option.

See the Setup page for how to configure MCP_MODE.