Skip to content

Response

Response Modes

stream: "delta"

Content-Type: text/event-stream. The server streams SSE events as they are produced. Text is sent as incremental text_delta events; thinking is sent as incremental thinking_delta events.

stream: "message"

Content-Type: text/event-stream. The server streams SSE events, but text is sent as a single complete text/thinking event per message rather than incremental deltas. Tool call events still arrive as they happen.

stream: "none" (default)

Content-Type: application/json. The server returns a single JSON response after the agent finishes. If the agent needs a client-side tool result, it returns a tool_use stop reason in the JSON body and the client re-submits with results — same flow as SSE, just without streaming.

SSE Events (stream: "delta" and stream: "message")

Each event is a JSON object on the data: field.

session_start

The first event in a PUT /session stream, always preceding turn_start. Contains the sessionId the client must store for subsequent turns.

event: session_start
data: {"sessionId": "sess_abc123"}

turn_start

Marks the beginning of the agent's response. For PUT /session, emitted immediately after session_start. For POST /session/:id, this is the first event in the stream.

event: turn_start
data: {}

text_delta

(delta mode only) An incremental delta of the agent's text response. Only emitted between turn_start and turn_stop.

event: text_delta
data: {"delta": "The weather in Tokyo is..."}

thinking_delta

(delta mode only) An incremental delta of the agent's thinking/reasoning. Only emitted between turn_start and turn_stop.

event: thinking_delta
data: {"delta": "The user is asking about Tokyo weather, I should..."}

text

(message mode only) The complete agent text response. Only emitted between turn_start and turn_stop.

event: text
data: {"text": "The weather in Tokyo is 18°C, partly cloudy."}

thinking

(message mode only) The complete agent thinking/reasoning. Only emitted between turn_start and turn_stop.

event: thinking
data: {"thinking": "The user is asking about Tokyo weather, I should use the weather tool..."}

tool_call

Only emitted between turn_start and turn_stop. The agent wants to invoke a tool. Multiple tool_call events may be emitted before turn_stop — the client should collect all of them and handle in parallel.

For application-side tools, the client executes the tool and submits the results in a subsequent POST /session/:id request.

For server-side tools where trust: true, the server invokes the tool inline and emits a tool_result event with the result — no client round-trip needed. The agent continues streaming without stopping.

For server-side tools where trust: false, the server stops and the client submits a permission decision in a subsequent POST /session/:id request. The agent continues regardless — if denied, the LLM is informed the tool was not permitted.

The agent only emits turn_stop with stopReason: "tool_use" if there is at least one application-side tool call or one untrusted server-side tool call that requires client action. If all tool calls are trusted server-side tools, the agent handles them inline and continues without stopping.

The client must collect all application-side tool results and untrusted server-side tool permissions and submit them together in a single subsequent POST /session/:id request.

event: tool_call
data: {"toolCallId": "call_001", "name": "get_weather", "input": {"location": "Tokyo"}}

Tool names must be unique across application tools and agent tools in a single request. The client identifies whether a tool call is application-side or server-side by matching the name against its request.

tool_result

(server-side trusted tools only) Only emitted between turn_start and turn_stop. Emitted after the server executes a trusted tool inline. The agent continues streaming after this event.

event: tool_result
data: {"toolCallId": "call_001", "content": "Tokyo: 18°C, partly cloudy"}

turn_stop

Always the final event in the stream. Marks the end of the agent's response.

event: turn_stop
data: {"stopReason": "end_turn"}

Stop reasons:

stopReasonMeaning
end_turnAgent finished normally
tool_useAgent emitted one or more tool_call events requiring client action (application-side tool or untrusted server-side tool)
max_tokensHit token limit — agents should implement their own history compaction strategy to avoid this; if no compaction is in place and the context window overflows, this stop reason is returned
refusalLLM refused to respond (e.g. safety policy)
errorServer encountered an error mid-stream

JSON Response (stream: "none")

Normal response:

json
{
  "stopReason": "end_turn",
  "messages": [
    {
      "role": "assistant",
      "content": "The weather in Tokyo is 18°C, partly cloudy."
    }
  ]
}

PUT /session additionally includes sessionId:

json
{
  "sessionId": "sess_abc123",
  "stopReason": "end_turn",
  "messages": [
    {
      "role": "assistant",
      "content": "The weather in Tokyo is 18°C, partly cloudy."
    }
  ]
}

With thinking:

json
{
  "stopReason": "end_turn",
  "messages": [
    {
      "role": "assistant",
      "content": [
        {
          "type": "thinking",
          "thinking": "The user wants Tokyo weather. I should use the get_weather tool."
        },
        {
          "type": "text",
          "text": "The weather in Tokyo is 18°C, partly cloudy."
        }
      ]
    }
  ]
}

When an application-side tool is needed, or an untrusted server-side tool requires permission:

json
{
  "stopReason": "tool_use",
  "messages": [
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_use",
          "toolCallId": "call_001",
          "name": "get_weather",
          "input": { "location": "Tokyo" }
        }
      ]
    }
  ]
}

When a trusted server-side tool was called inline, the full exchange is included in the returned messages:

json
{
  "stopReason": "end_turn",
  "messages": [
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_use",
          "toolCallId": "call_002",
          "name": "web_search",
          "input": { "query": "Tokyo weather today" }
        }
      ]
    },
    {
      "role": "tool",
      "toolCallId": "call_002",
      "content": "Tokyo: 18°C, partly cloudy"
    },
    {
      "role": "assistant",
      "content": "The weather in Tokyo is 18°C, partly cloudy."
    }
  ]
}

Message Format

Messages follow OpenAI-compatible roles.

System message

json
{
  "role": "system",
  "content": "You are a helpful assistant that responds concisely."
}

User message

json
{ "role": "user", "content": "What's the weather in Tokyo?" }

content may be a string or an array of content blocks.

Assistant message

json
{
  "role": "assistant",
  "content": [
    {
      "type": "thinking",
      "thinking": "The user wants the weather in Tokyo. I should use the get_weather tool."
    },
    { "type": "text", "text": "Let me check that for you." },
    {
      "type": "tool_use",
      "toolCallId": "call_001",
      "name": "get_weather",
      "input": { "location": "Tokyo" }
    }
  ]
}

Tool result message

Used in two cases:

  • Trusted server-side tool: the agent executes the tool inline, stores the result in history, and includes it in the returned messages.
  • Application-side tool: the client executes the tool and submits the result via POST /session/:id.
json
{
  "role": "tool",
  "toolCallId": "call_001",
  "content": "Tokyo: 18°C, partly cloudy"
}

content may be a string or an array of content blocks.

Tool permission message

Used to submit a permission decision for an untrusted server-side tool call via POST /session/:id. The agent continues and informs the LLM of the decision. These messages are never stored in session history.

When granted: true, the agent executes the tool and stores the tool result in history. When granted: false, the agent stores a message in history indicating the tool was denied by the user. The client may include an optional reason string that the agent will relay to the LLM.

json
{ "role": "tool_permission", "toolCallId": "call_002", "granted": true }
json
{
  "role": "tool_permission",
  "toolCallId": "call_002",
  "granted": false,
  "reason": "User declined"
}

Sequence Diagram