Skip to main content

MCP Session and Auth Troubleshooting Runbook

Use this runbook when MCP clients fail with 400, 401, missing tools, or unexpected tenant/business context.

Scope covered by this runbook:

  • apps/backend/src/mcp/interfaces/mcp.controller.ts
  • apps/backend/src/mcp/interfaces/guards/mcp-auth.guard.ts
  • apps/backend/src/mcp/interfaces/guards/mcp-api-key.guard.ts
  • apps/backend/src/mcp/interfaces/guards/mcp-token.guard.ts
  • apps/backend/src/mcp/application/mcp-token.service.ts
  • apps/backend/src/mcp/application/mcp-session.service.ts

1. Symptom triage

Start from the first matching symptom:

SymptomLikely layer
401 Authentication required: provide a valid MCP API key or access token on POST /mcpMcpAuthGuard (both API key and JWT rejected)
401 on POST /mcp/tokenFirebase ID token validation in McpTokenService.exchangeFirebaseToken
401 on POST /mcp/token/refreshInvalid MCP JWT or refresh outside 7-day grace window
400 mcp-session-id header requiredRequest is not initialize and missing session header
400 Unknown session IDSession not found in in-memory map (often routing/affinity issue)
Tool missing from tools/listScope or role filtering in ToolRegistry.listFor
Tool appears but returns isError: trueTool handler validation or owning module use-case error
set_active_business has no effectSession principal update not persisted or new session started
AI client expects /mcp/oauth/authorize or /mcp/oauth/tokenClient is following a proposed OAuth design, not the current backend implementation

2. Validate endpoint usage quickly

Expected flow

  1. POST /mcp/token (V2 only) or obtain V1 API key
  2. POST /mcp with initialize body and Authorization
  3. Save mcp-session-id response header
  4. POST /mcp tool calls with both Authorization and mcp-session-id
  5. Optional: GET /mcp SSE with both headers
  6. DELETE /mcp with both headers

Common misuse checks

  • GET /mcp and DELETE /mcp also require Authorization (same McpAuthGuard as POST /mcp)
  • Missing mcp-session-id is valid only for the initial initialize request
  • Reusing a session ID after instance restart or route change will fail with Unknown session ID
  • If clients auto-discover OAuth metadata, verify:
    • GET /.well-known/oauth-authorization-server
    • GET /.well-known/oauth-protected-resource
  • Current discovery metadata points to POST /mcp/token; there is no current /mcp/oauth/authorize or /mcp/oauth/token controller.

3. 401 debugging checklist

A) V1 API key path

  1. Confirm header format:
Authorization: Bearer fp_mcp_<hex>
  1. Verify key status through API:
curl -s "https://api.flowandgrow.tech/mcp/keys?businessId=<business-uuid>" \
-H "Authorization: Bearer <firebase-id-token>"
  1. Confirm:
    • isActive = true
    • expiresAt is null or in the future
    • key scopes include required tool scopes

B) V2 token path

  1. Exchange Firebase token again:
curl -s -X POST https://api.flowandgrow.tech/mcp/token \
-H "Content-Type: application/json" \
-d '{"firebaseIdToken":"<firebase-id-token>"}'
  1. If token exchange fails:

    • 401: Firebase token invalid/expired
    • 403: user has no active memberships (business_user.is_active)
  2. If MCP token is expired, refresh:

curl -s -X POST https://api.flowandgrow.tech/mcp/token/refresh \
-H "Content-Type: application/json" \
-d '{"token":"<expired-or-valid-mcp-token>"}'
  1. Refresh constraints from code:
    • JWT must be validly signed
    • token can be expired, but not more than 7 days
    • memberships are re-resolved from DB at refresh time

4. 400 Unknown session ID debugging checklist

This error is returned when McpSessionService.getSession(sessionId) cannot find an active in-memory session.

Step-by-step checks

  1. Verify client sends exact mcp-session-id returned by initialize response
  2. Verify the request includes Authorization (required before session lookup)
  3. Confirm no backend restart occurred between initialize and tool call
  4. On Cloud Run, confirm --session-affinity is enabled
  5. Confirm client is not opening initialize on one host and tool calls on another host/alias

Important constraint

Redis in V2 stores principal state (mcp:session:{sessionId}), not the full streamable transport. If in-memory session state is gone, Redis alone cannot resurrect the session; client must re-initialize.


5. Missing tools in tools/list

Tool visibility is filtered at session initialization by role/scopes.

Check principal shape

  • role: platform_operator | tenant_developer | merchant
  • scopes: includes required scopes (for example pos:intents)
  • authorizedBusinessIds: affects set_active_business visibility

Known visibility rules

  • set_active_business only appears if authorizedBusinessIds.length > 1
  • Intent tools require pos:intents and are never shown to tenant_developer
  • Operator tools only appear for platform_operator

If scopes changed on the backend, re-create a new session (initialize) so visibility is recalculated.


6. Tool call returns isError: true

At this point auth/session routing worked and the tool handler ran.

Common source-backed cases

  • void_transaction without confirm: true intentionally returns a preview message. Call it again with the same transactionId and confirm: true only after user confirmation.
  • log_hours requires a resolvable MCP principal userId. Merchant callers cannot pass userId; only platform_operator can override it.
  • log_hours delegates to Implementation Portal time-entry rules, including hourly-only steps and minimum 0.25 hours.
  • Date arguments use ISO 8601 date-times on tool schemas such as summarize_period, list_sales, and list_purchases.
  • Detail-by-ID tools delegate to the owning module service. If the tool returns not found for an ID that exists, verify the owning service path and tenant expectations before changing MCP wrapper code.

Next checks

  1. Compare the AI-client arguments against the tool argument table in MCP API Reference.
  2. Re-run the same call with a minimal JSON-RPC payload and saved mcp-session-id.
  3. Inspect the owning module use case or repository path named in the tool factory under apps/backend/src/mcp/tools/.

7. Production-safe recovery actions

Use this order to minimize disruption:

  1. Regenerate token/key (credential reset only)
  2. Re-initialize MCP session (new mcp-session-id)
  3. Reconnect client transport (Cursor/Claude restart if needed)
  4. Validate Cloud Run session affinity and single host usage
  5. Rotate/revoke old API keys if compromise is suspected

8. Preventive practices

  • Always include both headers on non-initialize calls:
    • Authorization
    • mcp-session-id
  • Keep one canonical MCP base URL per environment
  • Re-initialize sessions after deploys/restarts
  • For V2 clients, implement proactive refresh before expiresIn hits zero
  • Track MCP_TOKEN_SECRET and MCP_TOKEN_TTL_SECONDS changes as release notes
  • Re-open MCP sessions when role/scope assignments change (tool list is computed at initialize time)