yeah the json token counts are super misleading. i run a bunch of claude agents ...

kleton · 2026-03-10T15:39:11 1773157151

I proxy all of my llm completion subscriptions. In a typical 7d span-

model completions read write cached_read cache_write

claude-opus-4-6 11000 16900000 5840000 1312000000 66120000

tgrowazay · 2026-03-10T19:37:42 1773171462

17M uncached reads (input) and 6M of uncached writes (output) is

  $5x17+$25x6=$235 for Opus 4.6

  $2x17+$12x6=$106 for Gemini 3 Pro

  $0.60x17+$3.6x6=$31.80 for Qwen3.5 397B-A17B via Huggingface API

kleton · 2026-03-12T21:29:23 1773350963

You did not add up cache writes, which are $6.25 / MTok, which is another ~$400