Skip to main content
All Articles
AICost ControlPlaybook

Per-Token Pricing Is Breaking Your Budget, A Finance Playbook

AI doesn't bill per-seat, it bills per-token and per-call. Finance teams trained on subscriptions can't model that. Here's a playbook that works.

Easy Entropy Team

Editorial Team

Practitioner notes from the Easy Entropy team. We write about renewal management, SaaS spend control, and the workflows that keep contract owners ahead of notice deadlines.

4 min readAbout us

Why Per-Seat Math Does Not Work

Per-seat SaaS pricing has one beautiful property: it is predictable. You buy 50 seats at $20 per month, your annual cost is $12,000, and the only thing that changes is the seat count. Finance teams have been modelling SaaS this way for fifteen years.

AI breaks the model entirely. There is no seat. The unit is a token, an API call, an inference, or an autonomous agent execution. The cost varies with usage, not with headcount. A team of 5 engineers using AI coding tools heavily can outspend a team of 50 marketers using a per-seat CRM. None of the existing forecasting models capture this.

The Four AI Pricing Models You Will Encounter

You will see four distinct pricing models in AI vendor contracts. Each one needs to be modelled differently.

  • Token-based: priced per million input/output tokens. Used by foundation model providers. Cost scales with prompt and response length.
  • API call-based: priced per request, regardless of size. Common in feature-specific APIs (image generation, transcription, embedding).
  • Inference-based: priced per generated output (per image, per video second, per audio minute).
  • Agent execution-based: priced per autonomous workflow run, often with internal token costs bundled in.

Why Finance Cannot Forecast These

The forecasting failure is not the finance team's fault. It is that the consumption data lives in engineering dashboards, not in finance systems. Tokens consumed last month are knowable but not visible to anyone who is not logged into the vendor's console.

Worse, consumption is non-stationary. If your engineers ship a new feature that uses AI internally, your monthly token cost can double overnight with no procurement event. The vendor will not warn you. The invoice will land at month-end. By the time finance sees it, the spend has already happened.

Step 1: Set Hard Consumption Ceilings in Every Contract

Every AI vendor contract should include a hard consumption ceiling: a maximum monthly or quarterly volume above which the vendor will alert you or cut off service rather than continue billing. Without this, your downside is unbounded.

Vendors will resist this initially because their incentive is consumption growth. Push back. Frame it as a budget control requirement, not a usage cap. If the vendor will not include a hard ceiling, include a "vendor must notify if usage exceeds X" clause as a fallback.

Step 2: Instrument Internal Usage Tracking

The vendor invoice arrives at month-end. By then it is too late to act. You need internal usage tracking that surfaces consumption daily or in real time, before the invoice catches up.

For foundation model spend, this usually means a thin proxy layer or gateway that all internal AI calls pass through, logging token usage by team and project. For other AI tools, the vendor often provides usage dashboards, set someone to pull them weekly.

Step 3: Build the Budget Model in Two Layers

Layer one is the committed base: the minimum you will pay regardless of usage. Most AI vendors will discount in exchange for a usage commitment, so this number is often non-zero. Layer two is the variable overage: the cost above the committed base, modelled as a range rather than a point estimate.

Your forecast then becomes: committed base (known) plus expected variable usage (modelled). For multi-year contracts, model variable usage growth conservatively, 30 to 60 percent per year is realistic given how fast AI usage is expanding.

Step 4: Negotiate the Right Volume Commitment

Volume commitments are the lever that converts unpredictable usage cost into a discount. Commit to less than your projected usage, leave room for overage, and ask for rollover of unused commitment to the next period. Vendors will quote a 20 to 30 percent discount in exchange for a 12-month commitment.

Avoid the temptation to commit to peak usage. The cost of over-committing is real cash that walks out the door regardless of whether you used the volume. The cost of slight under-committing is the higher overage rate, usually 10 to 20 percent above the committed rate, which is manageable.

Related posts