When Your Analytics Starts Guessing: Data Sampling Thresholds Compared
Key Takeaways
- GA4 (free) starts sampling at ~10M events per query in Exploration reports; avoiding it requires upgrading to GA360 at $150K+/year.
- A 10% sample introduces approximately +/-3% margin of error — at $10M revenue, that represents a $200K uncertainty in business decisions.
- SealMetrics, Plausible, Fathom, and Simple Analytics never apply data sampling — every number is a count, not an extrapolation.
- Sampling functions as a pricing lever in enterprise tools: companies must pay more to see their actual data instead of estimates.
When your analytics tool says “10,432 conversions last month,” is that a count or an estimate? For GA4 users above a certain traffic threshold, it is an estimate. And the tool does not always tell you clearly.
We reviewed official documentation, support articles, and community forums for 10 analytics platforms to document exactly when each one starts sampling, what triggers it, how users are informed, and what it costs to avoid.
What is data sampling?
Data sampling is when your analytics tool counts a subset of events and extrapolates to produce the full number. Instead of querying every row in the database, the tool takes a 10% slice and multiplies by 10. The result is an estimate, not a count.
The margin of error grows as the sample shrinks. A 10% sample introduces approximately ±3% error at best. A 1% sample can swing ±10% or more. The smaller the sample, the less you can trust the number.
Sampling thresholds compared
| Tool | Sampling starts at | What triggers it | User informed? | Way to avoid |
|---|---|---|---|---|
| SealMetrics | Never | — | — | No sampling by design |
| Plausible | Never | — | — | No sampling |
| Fathom | Never | — | — | No sampling |
| Simple Analytics | Never | — | — | No sampling |
| Piwik PRO | Never (up to plan limit) | Plan event cap | Yes | Upgrade plan |
| Mixpanel | Custom (plan-dependent) | Report complexity | Sometimes | Upgrade |
| PostHog | ~1M events/month (free) | Event volume | Yes | Pay per event |
| GA4 (Free) | ~10M events/query | Exploration reports exceeding 10M events | Small shield icon | Upgrade to GA360 ($150K+/yr) |
| GA4 (GA360) | ~1B events/property | Very high volume + complex queries | Shield icon | Use BigQuery export |
| Adobe Analytics | Contract-dependent | Server call volume + report complexity | Processing indicator | Contract negotiation |
Methodology: we reviewed official documentation, published support articles, and community forums for each platform. We documented when sampling starts, what triggers it, how users are notified, and what options exist to avoid it.
GA4's sampling problem in detail
The ~10M threshold in GA4 applies to Exploration reports — the advanced analysis section, not standard reports. Standard reports use pre-aggregated data and are typically unsampled. But the moment you build a custom Exploration, add segments, extend the date range, or compare multiple dimensions, you can exceed 10 million events and trigger sampling — often without realizing it.
GA4 indicates sampling with a small green checkmark or shield icon in the report header. It is easy to miss — especially for marketers who are not trained to look for it. Many teams present sampled data in board reports without realizing the numbers are estimates, not counts. We covered this in detail in GA4 Data Sampling: Why Your Numbers Are Wrong.
Why sampling matters for business decisions
A 10% sample introduces approximately ±3% margin of error at best. If your conversion rate is 2.5%, the real number could be anywhere from 2.4% to 2.6%. That sounds small until you apply it to revenue. At $10M annual revenue, that range represents a $200K uncertainty — the difference between a campaign that looks profitable and one that does not.
Budget allocation, campaign optimization, funnel analysis, A/B test results — all of these depend on accurate counts. When the underlying data loss comes from sampling, every decision downstream carries inherited uncertainty. You are not optimizing your funnel. You are optimizing an approximation of your funnel.
The enterprise pricing wall
GA4's solution to sampling is GA360 at $150K+ per year. Adobe's solution is contract negotiation. In both cases, the sampling threshold functions as a pricing lever — pay more to see your actual data.
This creates a two-tier system: companies that can afford complete data, and companies that make decisions on estimates without knowing it. The irony is that mid-market companies — the ones most sensitive to marketing ROI — are precisely the ones most likely to hit sampling thresholds without the budget to escape them.
The alternative: no sampling by design
Some tools simply do not sample. They store every event and query the full dataset every time. SealMetrics, Plausible, Fathom, and Simple Analytics fall in this category. When you see a number, it is a count — not an extrapolation.
The difference between SealMetrics and the privacy-lightweight alternatives is scope. SealMetrics combines zero sampling with enterprise features: multi-touch attribution, LENS AI supervision, cookieless first-party collection, and full funnel analysis — all on 100% of your data.
How to check if you are being sampled
GA4 (Free and GA360)
Look for the green checkmark or shield icon in the top-left corner of any Explorations report. A green checkmark means unsampled. A yellow or orange shield means sampled. Hover over the icon for the sample percentage. Standard reports use pre-aggregated data and are typically not sampled, but Explorations, custom reports, and API queries are.
Adobe Analytics
Check for a “processing” indicator or data quality flag in Analysis Workspace. Adobe's sampling behavior depends on your contract tier and server call volume. If report generation takes unusually long and then returns quickly with round numbers, sampling may be active. Contact your Adobe account manager for your specific thresholds.
Piwik PRO
Piwik PRO does not sample within your plan limits, but it stops collecting data once you exceed your event cap. Check your plan usage in the administration panel. If you are consistently near your limit, reports at the end of the billing period may be incomplete — not sampled, but truncated.
The bottom line
Data sampling is a trade-off between infrastructure cost and data accuracy. Some tools make that trade-off for you. Others let you choose. And a few never sample at all.
If your business makes decisions based on analytics data — budget allocation, campaign optimization, conversion analysis — you need to know whether those numbers are counts or estimates. See how SealMetrics captures 100% of your data without sampling, or calculate how much data you are losing to sampling and consent gaps today.