Token accounting is the feature, not the chat
How we reconcile per-user token spend in MongoDB against Anthropic's monthly invoice — and why we do it server-side at tool-call boundaries.
When I built the T4A chat platform, the brief was "internal AI chat." The interesting engineering problem turned out to be: Anthropic bills in tokens, the company budgets in euros, and the finance team needs a monthly reconciliation that doesn't require a PhD to verify.
What not to do
Don't count tokens client-side. Don't estimate from prompt length. Don't
trust the model's usage field from a streaming response until the stream is
done.
Do count server-side, at tool-call boundaries, after the response is complete.
The ledger
Each user has a MongoDB document that accumulates token spend:
await db.collection('token_ledger').updateOne(
{ userId, billingMonth: toYYYYMM(new Date()) },
{
$inc: {
inputTokens: usage.input_tokens,
outputTokens: usage.output_tokens,
cacheReadTokens: usage.cache_read_input_tokens ?? 0,
},
},
{ upsert: true },
);At month close, we aggregate by billing month and compare the total against the Anthropic invoice line items. The delta has been under 1% in every cycle so far — rounding from token pricing, not bugs.
Prompt caching
Cache reads are cheaper than cache writes, which are cheaper than uncached input. Track them separately or your per-user cost attribution will be wrong by a constant factor that looks like a rounding error but isn't.
const cost =
usage.input_tokens * INPUT_RATE +
usage.output_tokens * OUTPUT_RATE +
(usage.cache_read_input_tokens ?? 0) * CACHE_READ_RATE +
(usage.cache_creation_input_tokens ?? 0) * CACHE_WRITE_RATE;What this unlocked
Budget alerts per user. Department-level rollups. A per-model breakdown when we started mixing Haiku and Sonnet for different tasks. None of this was in the original spec — it emerged because the ledger existed.
The chat is just the UI. The reconciliation is the product.