Salesforce's generative AI capabilities — Einstein Copilot, prompt builder, generative summary, generative replies, and the broader Einstein 1 prompt orchestration layer — are priced on a consumption model rather than on a flat per-user fee. The consumption unit is denominated in Einstein Requests, AI Credits, or a SKU-specific equivalent that ultimately reduces to a per-unit charge. The model is straightforward in concept and notably difficult to forecast in practice, which is the source of most enterprise AI cost surprises in the Salesforce stack.
This guide explains how the consumption model actually works, where consumption surprises come from, what good consumption governance looks like, and how to negotiate the consumption envelope so it does not become a runaway cost line. Across more than 500 engagements representing over $420M in negotiated savings, our team has observed that AI consumption is now the most common source of mid-term budget shock on Salesforce deployments.
The mechanics of consumption pricing
Every Einstein generative interaction consumes a defined unit of capacity. A simple Copilot turn that summarizes a case might consume a few units; a complex multi-step prompt with grounding to Data Cloud might consume many more. Salesforce bundles a defined volume of units in the per-user SKU and then charges overage above the bundled volume.
The complication is that not every interaction consumes the same number of units. Unit consumption is influenced by prompt length, response length, the number of grounding steps the prompt orchestration performs, and the specific model invoked. Two seemingly similar use cases — a sales call summary and a service case wrap-up — can consume materially different unit volumes even when the user experience is comparable.
| Interaction type | Typical unit consumption | Drivers of variance |
|---|---|---|
| Short generative reply | 1-3 units | Response length, grounding depth |
| Case summary | 3-8 units | Case interaction length, model choice |
| Sales call summary | 5-15 units | Transcript length, summary detail |
| Multi-step copilot workflow | 10-40 units | Number of orchestration steps, grounding |
| Account research with grounding | 20-80 units | Data Cloud queries, model orchestration |
| Bulk classification or scoring | 0.5-2 units per record | Volume of records, model size |
The variance is meaningful for forecasting. A simple model that assumes constant unit consumption per interaction will badly miss; a model that segments by interaction type and uses observed consumption from comparable deployments produces materially better forecasts.
Where consumption surprises come from
Four patterns drive most unanticipated consumption growth on Einstein deployments.
Adoption faster than forecast
Sellers and agents adopt Copilot faster than the deployment plan predicted. The capability is useful, word spreads, more users experiment, and per-user interaction volume rises. The deployment that forecast 30 interactions per user per month and budgeted accordingly discovers six months in that actual volume is 80-120 per user. The consumption envelope is exhausted long before renewal.
Use-case expansion without budget
The original deployment scope is one or two use cases. Operations adds a third use case that uses Copilot for case escalation routing, then a fourth for marketing content drafting, then a fifth for renewal-risk surfacing. Each use case looks reasonable in isolation; the cumulative consumption is the issue. Without an envelope-discipline by use case, consumption sprawls.
Complex prompts not optimized
Early prompts are often longer and less efficient than they need to be. Mature prompt engineering reduces consumption per interaction by 30-60% without compromising output quality. Deployments without prompt engineering discipline pay for inefficient prompts at scale.
Model selection sprawl
The platform supports multiple model families with different per-unit costs. Defaulting all use cases to the largest model consumes capacity faster than necessary; using a smaller model for use cases that do not require frontier capability saves materially. Deployments without model selection discipline overspend on capacity that does not change output quality.
Good consumption forecasting
A defensible consumption forecast for an Einstein deployment has several components.
First, segment users by role and define interaction-per-user-per-month assumptions for each segment. Sellers do not consume at the same rate as service agents, and inside-sales reps consume differently from enterprise field reps. A blended assumption misses the variance.
Second, segment interactions by type and apply differentiated unit consumption assumptions. Case summaries consume differently from account research; bulk classification consumes differently from interactive copilot. Use observed consumption from comparable deployments as the baseline.
Third, model adoption curves explicitly. The forecast at month 6 should differ from the forecast at month 18, and from the forecast at month 36. Step-function increases that match planned use-case launches should be modeled explicitly. The S-curve adoption pattern is more accurate than linear growth.
Fourth, run sensitivity analysis. Under three scenarios — slow adoption, expected adoption, fast adoption — what is the consumption envelope? The deal should be sized so that the expected case is comfortably within the envelope and the fast-adoption case does not catastrophically exceed it.
Negotiating the consumption envelope
Effective negotiation of the consumption envelope addresses six elements explicitly.
Included entitlement
The volume of units bundled in the per-user SKU should be written into the order form, not inferred from datasheet defaults that may shift over time. Salesforce sometimes revises bundled entitlements between SKU generations; written commitment protects the buyer from changes during the contract term.
Overage pricing
The per-unit cost of consumption above the entitlement should be in the contract, with deep enough discount to make overage tolerable. Salesforce list overage pricing is materially higher than the effective per-unit rate inside the bundled envelope; negotiated overage pricing should converge those two figures.
True-up timing
True-up should be renewal-time, not mid-term. Mid-term true-up creates an in-period invoice that disrupts budget planning and gives Salesforce a coercion point. Renewal-time true-up consolidates the conversation into a single negotiation.
Carryover
Some carryover of unused entitlement — month-to-month, or quarter-to-quarter — is achievable on larger deals. Without carryover, under-consumption in a low-activity period is value lost; with carryover, the envelope absorbs lumpy consumption patterns.
Pooling across business units
Pooling consumption across the enterprise rather than per-business-unit allocations preserves flexibility. The marketing team may underconsume while the sales organization overconsumes; pooled entitlement balances the variance internally without triggering overage.
Use-case-level reporting
The contract should require reporting that lets the buyer see consumption by use case, team, and user segment. Aggregate consumption reports do not support governance; segmented reporting does. Salesforce will provide the reporting if asked; deals that do not ask do not get it.
Operating governance
Beyond the contract, four operating disciplines distinguish deployments that govern consumption from deployments that do not.
First, a named accountable owner for AI consumption budget — typically a senior leader in platform or AI operations, with a defined monthly cadence to review consumption against budget. Without a single accountable owner, consumption growth is everyone's problem and no one's job.
Second, a use-case-level envelope. Each use case has a consumption budget and a measurement of value delivered. Use cases that exceed budget without delivering value are retired or rescoped. Use cases that exceed budget but deliver disproportionate value have their envelope expanded explicitly.
Third, prompt engineering discipline. New prompts go through a review process that checks length, grounding scope, and model selection. Reviews catch the inefficient defaults that drive consumption growth without commensurate value. The function need not be elaborate; a checklist applied consistently catches most of the waste.
Fourth, model selection guidance. Default model for routine tasks, premium model for high-value tasks, with clear criteria for the choice. The platform supports the selection; the organization has to enforce it.
Common pitfalls in consumption budgeting
Several patterns produce systematically inaccurate consumption budgets.
Anchoring on the bundled entitlement as if it were the budget. The bundled entitlement is a price point, not a usage estimate. Budgets should be built from expected consumption, not from the included entitlement. If the budget exceeds the entitlement, that is information, not error.
Assuming linear growth. Adoption curves are S-shaped or step-shaped, not linear. A linear forecast underestimates the inflection that occurs around month 6-12 when the deployment becomes broadly used.
Ignoring background consumption. Background jobs — bulk classification, scoring, eventing — consume capacity even when no user is interacting. These are easy to overlook in user-centric forecasts and easy to scale faster than realized.
Treating all interactions as equal. The variance in unit consumption across interaction types is several-fold; collapsing to a per-interaction average misses the variance.
The competitive context
Consumption-based pricing is not unique to Salesforce. Microsoft Copilot, Google Workspace AI, ServiceNow Now Assist, and most enterprise AI offerings use similar metering. Comparing total cost of ownership across platforms requires comparing the consumption models, not just the headline per-seat prices. A platform with lower per-seat pricing and higher per-interaction consumption may cost more at scale than a platform with the inverse structure.
Buyers evaluating Salesforce's Einstein consumption model against alternatives should model both the per-seat and per-interaction components together, with adoption curves applied. The platform that looks cheaper on a per-seat basis frequently looks more expensive once volume is applied; the platform that looks expensive per seat sometimes amortizes well at scale. The variance matters more on large deployments than on small ones.
What good looks like at renewal
At renewal, the deployment with mature consumption discipline has several artifacts ready. A 24-month consumption history segmented by use case and team. A value-delivered measurement for each major use case. A forward forecast for the next 24 months with adoption assumptions explicit. A list of use cases retired during the prior term and the reasons. A proposed envelope structure for the renewal that reflects observed consumption rather than vendor defaults.
These artifacts change the renewal conversation from a vendor-led pricing exercise into a buyer-led envelope discussion. The buyer is asking Salesforce to support a forecasted consumption envelope with appropriate per-unit pricing and protections, rather than accepting the envelope the vendor proposes. Across our portfolio, deployments that enter renewal with these artifacts achieve 18-32% better renewal economics than deployments that do not. The 34% average reduction figure across the broader practice is heavily weighted by these well-governed AI renewals; the deals without governance discipline pull the average in the other direction.