Designing an agent runtime that lives inside the messenger

In one line, vooy is an AI agent that runs inside the messenger your team already uses. "Don't make people install a new app" isn't a marketing slogan — it's the constraint that split every runtime decision down the middle. This post starts from that constraint and walks through the shape our agent runtime ended up with.

Why the messenger

Most agent products ship their own chat UI. It's clean, but it asks users to adopt yet another tab they might open once a day. We went the other way: drop the agent into the window that's already open all day — Slack, KakaoWork, the messenger.

That choice forces two constraints onto the runtime.

Sessions live long. A conversation can span days. We're not dealing with stateless request/response functions; we're dealing with long-lived sessions.
Responses must not stall. Messenger users are trained on "typing…". If you don't stream tokens the instant they're generated, the bot looks dead.

One cycle of the runtime

The heart of the runtime is a plain loop. A message arrives, we gather context, call the model, run tools, feed the results back. Repeat until there are no more tool calls.

runtime/loop.ts

async function runTurn(session: Session, input: UserMessage) {
  const ctx = await buildContext(session, input);
 
  for (let step = 0; step < MAX_STEPS; step++) {
    const response = await model.stream(ctx, { tools: session.tools });
 
    // Stream tokens to the messenger the moment they're produced.
    for await (const chunk of response.text) {
      session.transport.push(chunk);
    }
 
    if (response.toolCalls.length === 0) {
      return session.transport.commit();
    }
 
    const results = await executeTools(response.toolCalls, session);
    ctx.append(response.assistant, results);
  }
}

Sessions and context

A session is the blob of state tied to one conversation channel: who's in it, which connectors it's authenticated against, what was said recently. We map each session to a single Durable Object — one channel = one object = one serialized executor. Half of all concurrency bugs come from "two turns hit the same conversation at once," and this mapping makes that class of bug structurally impossible.

The cheapest way to handle concurrency is to avoid it. Guarantee one executor per channel and you never need a lock.

Context is reassembled every turn: the system prompt, tool definitions from active connectors, a compacted history, and any retrieved memory. As history grows, older spans collapse into summaries.

The tool-call loop

Every tool follows the same interface. Input schemas are defined with Zod and converted to JSON Schema for the model.

tools/define.ts

export const sendCalendarInvite = defineTool({
  name: "send_calendar_invite",
  description: "Send a calendar invite to attendees",
  input: z.object({
    title: z.string(),
    attendees: z.array(z.string().email()),
    startsAt: z.string().datetime(),
  }),
  run: async ({ title, attendees, startsAt }, { connectors }) => {
    return connectors.google.calendar.invite({ title, attendees, startsAt });
  },
});

Tool execution is always isolated. So that one tool's exception can't kill the whole turn, results return to the model as a tagged success/failure structure. The model can see the failure and retry, or ask the user.

Streaming as a first-class citizen

This is where we spent the most time. There's an impedance mismatch between the model's token stream and the messenger's message-edit API. The model emits tokens; the messenger edits whole messages, and editing dozens of times per second trips the rate limit.

So we put a coalescing buffer in the transport layer.

Signal	Behavior
Token arrives	Accumulate in buffer
80ms elapsed	Flush the accumulated text at once
Tool call starts	Flush immediately, switch to a status message
Turn ends	Final commit, clear buffer

The net result: users see a smoothly flowing response, and the messenger API receives no more than ~12 edits per second.

Designing failure in

In a long-lived session, everything eventually fails: model timeouts, connector 401s, worker restarts. We hold to three principles.

Turns are idempotent. Re-running with the same input message ID must not duplicate side effects, so we pass idempotency keys down to tools.
Persist partial progress. Tool results are committed to the session before being fed to the model. A restart never redoes work that already happened.
Be honest with the user. Unrecoverable failures aren't hidden — we surface them as-is: "I can't reach your calendar right now."

Today's runtime isn't flashy: one loop, one session, one buffer. But it's what falls out of taking "inside the messenger, without stalling, for a long time" head-on. In the next post we'll cover how this runtime attaches hundreds of external tools — the design of connector-hub.