Skip to content
SealMetrics
Data Quality

When Your Analytics Starts Guessing: Data Sampling Thresholds Compared

5 min readBy Rafa Jimenez

Key Takeaways

  • GA4 (free) starts sampling at ~10M events per query in Exploration reports; avoiding it requires upgrading to GA360 at $150K+/year.
  • A 10% sample introduces approximately +/-3% margin of error — at $10M revenue, that represents a $200K uncertainty in business decisions.
  • SealMetrics, Plausible, Fathom, and Simple Analytics never apply data sampling — every number is a count, not an extrapolation.
  • Sampling functions as a pricing lever in enterprise tools: companies must pay more to see their actual data instead of estimates.

When your analytics tool says “10,432 conversions last month,” is that a count or an estimate? For GA4 users above a certain traffic threshold, it is an estimate. And the tool does not always tell you clearly.

We reviewed official documentation, support articles, and community forums for 10 analytics platforms to document exactly when each one starts sampling, what triggers it, how users are informed, and what it costs to avoid.

What is data sampling?

Data sampling is when your analytics tool counts a subset of events and extrapolates to produce the full number. Instead of querying every row in the database, the tool takes a 10% slice and multiplies by 10. The result is an estimate, not a count.

The margin of error grows as the sample shrinks. A 10% sample introduces approximately ±3% error at best. A 1% sample can swing ±10% or more. The smaller the sample, the less you can trust the number.

Sampling thresholds compared

ToolSampling starts atWhat triggers itUser informed?Way to avoid
SealMetricsNeverNo sampling by design
PlausibleNeverNo sampling
FathomNeverNo sampling
Simple AnalyticsNeverNo sampling
Piwik PRONever (up to plan limit)Plan event capYesUpgrade plan
MixpanelCustom (plan-dependent)Report complexitySometimesUpgrade
PostHog~1M events/month (free)Event volumeYesPay per event
GA4 (Free)~10M events/queryExploration reports exceeding 10M eventsSmall shield iconUpgrade to GA360 ($150K+/yr)
GA4 (GA360)~1B events/propertyVery high volume + complex queriesShield iconUse BigQuery export
Adobe AnalyticsContract-dependentServer call volume + report complexityProcessing indicatorContract negotiation

Methodology: we reviewed official documentation, published support articles, and community forums for each platform. We documented when sampling starts, what triggers it, how users are notified, and what options exist to avoid it.

GA4's sampling problem in detail

The ~10M threshold in GA4 applies to Exploration reports — the advanced analysis section, not standard reports. Standard reports use pre-aggregated data and are typically unsampled. But the moment you build a custom Exploration, add segments, extend the date range, or compare multiple dimensions, you can exceed 10 million events and trigger sampling — often without realizing it.

GA4 indicates sampling with a small green checkmark or shield icon in the report header. It is easy to miss — especially for marketers who are not trained to look for it. Many teams present sampled data in board reports without realizing the numbers are estimates, not counts. We covered this in detail in GA4 Data Sampling: Why Your Numbers Are Wrong.

Why sampling matters for business decisions

A 10% sample introduces approximately ±3% margin of error at best. If your conversion rate is 2.5%, the real number could be anywhere from 2.4% to 2.6%. That sounds small until you apply it to revenue. At $10M annual revenue, that range represents a $200K uncertainty — the difference between a campaign that looks profitable and one that does not.

Budget allocation, campaign optimization, funnel analysis, A/B test results — all of these depend on accurate counts. When the underlying data loss comes from sampling, every decision downstream carries inherited uncertainty. You are not optimizing your funnel. You are optimizing an approximation of your funnel.

The enterprise pricing wall

GA4's solution to sampling is GA360 at $150K+ per year. Adobe's solution is contract negotiation. In both cases, the sampling threshold functions as a pricing lever — pay more to see your actual data.

This creates a two-tier system: companies that can afford complete data, and companies that make decisions on estimates without knowing it. The irony is that mid-market companies — the ones most sensitive to marketing ROI — are precisely the ones most likely to hit sampling thresholds without the budget to escape them.

The alternative: no sampling by design

Some tools simply do not sample. They store every event and query the full dataset every time. SealMetrics, Plausible, Fathom, and Simple Analytics fall in this category. When you see a number, it is a count — not an extrapolation.

The difference between SealMetrics and the privacy-lightweight alternatives is scope. SealMetrics combines zero sampling with enterprise features: multi-touch attribution, LENS AI supervision, cookieless first-party collection, and full funnel analysis — all on 100% of your data.

How to check if you are being sampled

GA4 (Free and GA360)

Look for the green checkmark or shield icon in the top-left corner of any Explorations report. A green checkmark means unsampled. A yellow or orange shield means sampled. Hover over the icon for the sample percentage. Standard reports use pre-aggregated data and are typically not sampled, but Explorations, custom reports, and API queries are.

Adobe Analytics

Check for a “processing” indicator or data quality flag in Analysis Workspace. Adobe's sampling behavior depends on your contract tier and server call volume. If report generation takes unusually long and then returns quickly with round numbers, sampling may be active. Contact your Adobe account manager for your specific thresholds.

Piwik PRO

Piwik PRO does not sample within your plan limits, but it stops collecting data once you exceed your event cap. Check your plan usage in the administration panel. If you are consistently near your limit, reports at the end of the billing period may be incomplete — not sampled, but truncated.

The bottom line

Data sampling is a trade-off between infrastructure cost and data accuracy. Some tools make that trade-off for you. Others let you choose. And a few never sample at all.

If your business makes decisions based on analytics data — budget allocation, campaign optimization, conversion analysis — you need to know whether those numbers are counts or estimates. See how SealMetrics captures 100% of your data without sampling, or calculate how much data you are losing to sampling and consent gaps today.