Version 1.0 · 2026-05-04
This document defines the methodology for MainMarket's published monthly price indices. It is the authoritative reference for institutional buyers, journalists, and analysts who need to understand exactly how each index is computed.
Indices currently published:
eggs_cage_free_large_dozen — single-SKU egg indexsoda_12pk_12oz_cans — multi-SKU soda category indexbasket_low_income — USDA Thrifty Food Plan-anchored basket at value-tier chainsbasket_high_income — USDA Liberal Food Plan-anchored basket at premium-tier chainsFor machine-readable definitions of each index (constituents, floor/ceiling bounds, chain filters), see the published_indices registry exposed at GET /v1/indices/{slug} (returned in meta.constituents).
MainMarket publishes three types of indices. Each uses a distinct aggregation method.
A single-SKU index tracks the price of one canonical product (one UPC) over time. The Egg Index is single-SKU.
Formula. For a given (slug, period, region):
value_usd = MEDIAN(regular_price) over rows P satisfying:
canonical_products.upc = constituent.upc
AND store.state ∈ region_states (or any state if region = 'national')
AND store.chain matches published_indices.chain_filter (or any chain if filter is NULL)
AND last_scraped_at within period (i.e. scrape month = period)
AND floor_usd ≤ regular_price ≤ ceiling_usdWe use median rather than mean because median is robust to outliers and matches BLS practice for the Average Retail Food Prices series. Outliers are excluded prior to aggregation via the floor/ceiling bounds (§3) but median additionally protects against any residual data quality issues.
A multi-SKU same-format index tracks an aggregate price across N UPCs that are sold in identical pack format. The Soda Index is multi-SKU same-format (5 UPCs, all in 12pk × 12oz cans).
Formula. For each constituent UPC i, compute the per-SKU median exactly as in §1.1:
median_i = MEDIAN(regular_price for UPC_i across stores in region/chain_filter)Compute store-distribution weight per SKU:
n_stores_i = COUNT(DISTINCT store_id) for UPC_i in region
weight_i = n_stores_i / Σ n_stores_j (sum across all constituent SKUs)Then:
value_usd = Σ (median_i × weight_i)This is the distribution-weighted basket formula. It captures the de factomarket by weighting each SKU by how broadly it's stocked, preserving the consumer's-eye-view of category pricing. A SKU stocked at 700 stores contributes more to the index than one stocked at 400.
A basket total cost index reports the aggregate dollar cost of a basket of N constituent UPCs at qty 1 each (or specified quantities). The Low Income and High Income baskets are basket_total_cost type.
Formula. For each constituent UPC i, compute the per-SKU median as in §1.1, filtered by chain_filter:
median_i = MEDIAN(regular_price for UPC_i across stores in region
WHERE chain.market_position matches chain_filter)Then:
value_usd = Σ (median_i × qty_i)This is identical to how the FAO Food Price Index aggregates within a sub-index and how BLS food-at-home CPI aggregates across items. The result is a real dollar amount — the cost of buying the basket once at the basket's eligible chains — not a unitless index value.
A normalized indexed value is computed at query time (§5).
Indices are computed for national plus the 4 US Census Bureau regions used by BLS:
| Region | States |
|---|---|
| Northeast | CT, ME, MA, NH, NJ, NY, PA, RI, VT |
| Midwest | IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI |
| South | AL, AR, DE, DC, FL, GA, KY, LA, MD, MS, NC, OK, SC, TN, TX, VA, WV |
| West | AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY |
national aggregates across all 50 states (no region filter).
A store contributes to exactly one regional index (the region containing its state) and to the national index. State assignment is from stores.state.
Every constituent UPC has a floor_usd and ceiling_usd defined in published_indices.constituents. Any SPP row outside [floor_usd, ceiling_usd] is excluded from the median calculation and logged to index_snapshot_dropped_rows with the reason ('below_floor' or 'above_ceiling').
Bounds are set per UPC based on observed P5/P95 of historical scrapes plus a margin for promotional volatility. Bounds are conservative: their goal is to reject obvious data errors (case-of-cases mislistings, decimal-point typos) without rejecting legitimate sale prices.
Example bounds in v1:
| Index | UPC | Floor | Ceiling | Rationale |
|---|---|---|---|---|
| Egg | 715141514643 | $1.00 | $12.00 | Observed range $2.00–$7.49; padded for promo + premium markets |
| Soda (each SKU) | various | $3.00 | $15.00 | Observed range $4.00–$14.99 across all 5 SKUs |
The initial bounds caught a single $45.00 Mountain Dew row at Save-A-Lot Crossville, TN, scraped 2026-04-20 — almost certainly a case-of-cases mislisting. That row appears in the audit log but was not used in any published value.
The full audit log is queryable for transparency. When a snapshot is computed, its API response includes meta.dropped_rows: N so consumers know how many observations were excluded.
Every snapshot reports coverage_score, a 0–1 metric representing the fraction of in-scope stores that contributed to the snapshot.
For single-SKU and multi-SKU indices:
coverage_score = n_stores_with_observation_this_period / n_stores_carrying_this_constituent_everFor basket indices:
coverage_score = MEAN over constituents of (per-constituent coverage)Each published_indices row carries a coverage_threshold (default 0.6 for SKU indices, 0.5 for baskets). Snapshots below threshold are still computed and stored, but their publish_status may be marked 'forced_publish' (manual override) or simply included in the API response with the low coverage transparently shown in meta. The API response always includes coverage_score so consumers can weight thin snapshots appropriately.
publish_status = 'beta' and excluded from the default API response unless ?include_beta=true is set. Some chain crons did not fire on a normal schedule during April; coverage is therefore non-uniform.May 2026 is the official live inception for all v1 indices. Snapshots from May onward are publish_status = 'auto' and form the published series. The base period (base_period) is set to '2026-05' for all v1 indices; the indexed value (§5) is normalized against the May 2026 absolute USD value.
Each snapshot stores its absolute value_usd. The indexed value (where base period = 100) is computed at query time:
value_indexed = value_usd / published_indices.base_value_usd × 100base_value_usd is locked at the first publishable snapshot for the base period (typically May 2026 for v1 indices). It does not change unless the index methodology version is bumped (§6), at which point a new base may be set.
The API response always returns both value_usd (absolute) and value_indexed (normalized) so consumers can use whichever is appropriate for their purpose. Hedge funds and economists typically prefer indexed; journalists and consumers typically prefer absolute.
Constituent UPCs, floor/ceiling bounds, and aggregation methods are immutable per methodology_version. When any of these change, we bump the version on published_indices.current_methodology_version. Snapshots store the version active at compute time and stay frozen at that version forever.
This means:
methodology_version = 1 always uses the same constituents and rulesWhen a constituent UPC is discontinued (e.g. Eggland's Best ceases producing Brown Cage Free 12ct), the policy is:
The methodology document is also versioned. The current document version is shown at the top.
Snapshots are immutable at the row level and append-only at the series level. If a published value is later found to be wrong (e.g. a chain reported corrupted prices that passed our floor/ceiling checks), the policy is:
revoked = true, set revoked_at and revoked_reasonreplaces_snapshot_id pointing back at the revoked one?include_revoked=true to see the revision historyThis mirrors institutional revision practice (BEA, BLS, IMF). Published values are never silently overwritten.
Internal collection cadence: weekly. Every Sunday at 02:00 UTC, the web_price_snapshots cron (migration 170) freezes a copy of all currently-scraped store prices.
Published index cadence: monthly. On the 1st of each month at 04:00 UTC, the compute_index_snapshotcron computes the prior month's snapshot for every active index × region combination. The published series therefore lags the calendar by ~1 day.
Within-month snapshots may be available via ?include_preview=true for QA but are not part of the published series.
When a new snapshot is computed, two anomaly checks run automatically:
|value_usd - prior_period_value_usd| / prior_period_value_usd > 0.10, the snapshot is flagged flagged_for_review = true with flag_reason = 'mom_delta_>10pct'.coverage_score < published_indices.coverage_threshold, the snapshot is flagged with flag_reason = 'low_coverage'.Flagged snapshots are still published by default but include meta.flagged: true, meta.flag_reason: ... in the API response. Flagging triggers a manual review by the data team to confirm the value before consumers act on it. Investigating flagged snapshots is part of the standard monthly QA cadence.
MainMarket's index methodology draws on the following authoritative sources:
APU series. The Egg Index methodology (single-SKU, monthly, regional medians) is adapted from BLS practice for APU0000708111 (Eggs, Grade A, Large per dozen).Inputs. The April 2026 SPP data for the 5 soda constituent UPCs:
| SKU | UPC | n_stores (April) | Median price |
|---|---|---|---|
| Pepsi Real Sugar Cola 12pk | 00012000030680 | 663 | $9.99 |
| Mountain Dew Diet 12pk | 00012000809972 | 720 | $9.99 |
| Dr Pepper Zero Sugar 12pk | 078000035261 | 506 | $10.49 |
| Diet Coke Caffeine Free 12pk | 00049000006131 | 460 | $10.49 |
| Diet Coke Soda Fridge Pack 12pk | 049000028911 | 698 | $10.49 |
(Medians shown reflect actual April 2026 SPP data; the per-SKU outlier $45 Mountain Dew row is excluded by the ceiling check.)
Step 1: total store count for distribution weights:
Σ n_stores = 663 + 720 + 506 + 460 + 698 = 3047Step 2: per-SKU weights:
| SKU | n_stores | weight |
|---|---|---|
| Pepsi Real Sugar | 663 | 0.2176 |
| Mtn Dew Diet | 720 | 0.2363 |
| Dr Pepper Zero | 506 | 0.1660 |
| Diet Coke Caf-Free | 460 | 0.1510 |
| Diet Coke Fridge | 698 | 0.2291 |
Step 3: weighted sum:
value_usd = (9.99 × 0.2176) + (9.99 × 0.2363) + (10.49 × 0.1660)
+ (10.49 × 0.1510) + (10.49 × 0.2291)
≈ 2.174 + 2.361 + 1.741 + 1.584 + 2.403
≈ $10.26Snapshot row written to index_snapshots:
slug: soda_12pk_12oz_cans
period: 2026-04
region: national
methodology_version: 1
value_usd: 10.26
n_observations: 3047 (sum of per-SKU stores; same store may appear under multiple SKUs)
n_stores: ~1500 (DISTINCT stores across all 5 SKUs)
n_chains: ~25
coverage_score: ~1.0 (full coverage)
method: distribution_weighted_basket
publish_status: beta (April is beta period)
constituents_breakdown: [...] (per-SKU stats as JSONB)This snapshot's value_indexed (§5) is undefined until the May 2026 base value is set.
chain_level chains. Some chains use uniform pricing across all stores. For those chains, only one representative store contributes to the index. This is disclosed via meta.constituents_breakdown[i].chain_level_chains for transparency.published_indices row's component data may have been scraped on different days within the period. The index value reflects "best available current price" within the month, not a single-instant snapshot.chain_filterreflects the dominant shopping environment per income tier, not the only one. We may publish a "blended income basket" in v1.5 that mixes chain tiers per real shopping behavior.Methodology questions, audit requests, or institutional licensing inquiries: hello@mainmarket.com.
The full published_indices registry (including current constituents, bounds, and chain filters per index) is queryable via GET /v1/indices (coming soon) or directly inspectable in meta.constituents of any GET /v1/indices/{slug} response.
Audit log of dropped rows is available on request.
Methodology v1.0. Last updated 2026-05-04.