Token accounting is the feature, not the chat

How we reconcile per-user token spend in MongoDB against Anthropic's monthly invoice — and why we do it server-side at tool-call boundaries.

When I built the T4A chat platform, the brief was "internal AI chat." The interesting engineering problem turned out to be: Anthropic bills in tokens, the company budgets in euros, and the finance team needs a monthly reconciliation that doesn't require a PhD to verify.

What not to do

Don't count tokens client-side. Don't estimate from prompt length. Don't trust the model's usage field from a streaming response until the stream is done.

Do count server-side, at tool-call boundaries, after the response is complete.

The ledger

Each user has a MongoDB document that accumulates token spend:

await db.collection('token_ledger').updateOne(
  { userId, billingMonth: toYYYYMM(new Date()) },
  {
    $inc: {
      inputTokens: usage.input_tokens,
      outputTokens: usage.output_tokens,
      cacheReadTokens: usage.cache_read_input_tokens ?? 0,
    },
  },
  { upsert: true },
);

At month close, we aggregate by billing month and compare the total against the Anthropic invoice line items. The delta has been under 1% in every cycle so far — rounding from token pricing, not bugs.

Prompt caching

Cache reads are cheaper than cache writes, which are cheaper than uncached input. Track them separately or your per-user cost attribution will be wrong by a constant factor that looks like a rounding error but isn't.

const cost =
  usage.input_tokens * INPUT_RATE +
  usage.output_tokens * OUTPUT_RATE +
  (usage.cache_read_input_tokens ?? 0) * CACHE_READ_RATE +
  (usage.cache_creation_input_tokens ?? 0) * CACHE_WRITE_RATE;

What this unlocked

Budget alerts per user. Department-level rollups. A per-model breakdown when we started mixing Haiku and Sonnet for different tasks. None of this was in the original spec — it emerged because the ledger existed.

The chat is just the UI. The reconciliation is the product.