MainMarketDocumentation
Get Access
Get started
  • Welcome
  • Getting started
  • Authentication
  • Pricing
Concepts
  • Stores, chains, products
  • Pricing tiers + freshness
  • Indices + methodology
  • Coupons
Use cases
  • Cheapest near me
  • Track price trends
  • Resolve a shopping list
  • Compare across chains
  • Coupons & deals nearby
  • Plan an in-store route
  • Monthly inflation snapshot
Reference
  • API Reference →
  • Methodology
  • OpenAPI spec
  • Agent skill spec
Get started
  • Welcome
  • Getting started
  • Authentication
  • Pricing
Concepts
  • Stores, chains, products
  • Pricing tiers + freshness
  • Indices + methodology
  • Coupons
Use cases
  • Cheapest near me
  • Track price trends
  • Resolve a shopping list
  • Compare across chains
  • Coupons & deals nearby
  • Plan an in-store route
  • Monthly inflation snapshot
Reference
  • API Reference →
  • Methodology
  • OpenAPI spec
  • Agent skill spec

MainMarket Published Price Indices — Methodology

Version 1.0 · 2026-05-04

This document defines the methodology for MainMarket's published monthly price indices. It is the authoritative reference for institutional buyers, journalists, and analysts who need to understand exactly how each index is computed.

Indices currently published:

  • eggs_cage_free_large_dozen — single-SKU egg index
  • soda_12pk_12oz_cans — multi-SKU soda category index
  • basket_low_income — USDA Thrifty Food Plan-anchored basket at value-tier chains
  • basket_high_income — USDA Liberal Food Plan-anchored basket at premium-tier chains

For machine-readable definitions of each index (constituents, floor/ceiling bounds, chain filters), see the published_indices registry exposed at GET /v1/indices/{slug} (returned in meta.constituents).

1. Index types and aggregation formulas

MainMarket publishes three types of indices. Each uses a distinct aggregation method.

single_sku
One canonical product UPC. Index value = median of regional store prices for that UPC.
multi_sku_same_format
N constituents in identical pack format. Distribution-weighted basket of per-SKU medians.
basket_total_cost
N constituents, possibly across formats. Sum of per-SKU medians × quantity, filtered by chain market position.

1.1 Single-SKU index (single_sku)

A single-SKU index tracks the price of one canonical product (one UPC) over time. The Egg Index is single-SKU.

Formula. For a given (slug, period, region):

text
value_usd = MEDIAN(regular_price)  over rows P satisfying:
    canonical_products.upc = constituent.upc
  AND store.state ∈ region_states (or any state if region = 'national')
  AND store.chain matches published_indices.chain_filter (or any chain if filter is NULL)
  AND last_scraped_at within period  (i.e. scrape month = period)
  AND floor_usd ≤ regular_price ≤ ceiling_usd

We use median rather than mean because median is robust to outliers and matches BLS practice for the Average Retail Food Prices series. Outliers are excluded prior to aggregation via the floor/ceiling bounds (§3) but median additionally protects against any residual data quality issues.

1.2 Multi-SKU same-format index (multi_sku_same_format)

A multi-SKU same-format index tracks an aggregate price across N UPCs that are sold in identical pack format. The Soda Index is multi-SKU same-format (5 UPCs, all in 12pk × 12oz cans).

Formula. For each constituent UPC i, compute the per-SKU median exactly as in §1.1:

text
median_i = MEDIAN(regular_price for UPC_i across stores in region/chain_filter)

Compute store-distribution weight per SKU:

text
n_stores_i = COUNT(DISTINCT store_id) for UPC_i in region
weight_i   = n_stores_i / Σ n_stores_j  (sum across all constituent SKUs)

Then:

text
value_usd = Σ (median_i × weight_i)

This is the distribution-weighted basket formula. It captures the de factomarket by weighting each SKU by how broadly it's stocked, preserving the consumer's-eye-view of category pricing. A SKU stocked at 700 stores contributes more to the index than one stocked at 400.

1.3 Basket total cost (basket_total_cost)

A basket total cost index reports the aggregate dollar cost of a basket of N constituent UPCs at qty 1 each (or specified quantities). The Low Income and High Income baskets are basket_total_cost type.

Formula. For each constituent UPC i, compute the per-SKU median as in §1.1, filtered by chain_filter:

text
median_i = MEDIAN(regular_price for UPC_i across stores in region
                  WHERE chain.market_position matches chain_filter)

Then:

text
value_usd = Σ (median_i × qty_i)

This is identical to how the FAO Food Price Index aggregates within a sub-index and how BLS food-at-home CPI aggregates across items. The result is a real dollar amount — the cost of buying the basket once at the basket's eligible chains — not a unitless index value.

A normalized indexed value is computed at query time (§5).

2. Region definitions

Indices are computed for national plus the 4 US Census Bureau regions used by BLS:

RegionStates
NortheastCT, ME, MA, NH, NJ, NY, PA, RI, VT
MidwestIL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI
SouthAL, AR, DE, DC, FL, GA, KY, LA, MD, MS, NC, OK, SC, TN, TX, VA, WV
WestAK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY

national aggregates across all 50 states (no region filter).

A store contributes to exactly one regional index (the region containing its state) and to the national index. State assignment is from stores.state.

3. Sanity bounds (per-constituent floor and ceiling)

Every constituent UPC has a floor_usd and ceiling_usd defined in published_indices.constituents. Any SPP row outside [floor_usd, ceiling_usd] is excluded from the median calculation and logged to index_snapshot_dropped_rows with the reason ('below_floor' or 'above_ceiling').

Bounds are set per UPC based on observed P5/P95 of historical scrapes plus a margin for promotional volatility. Bounds are conservative: their goal is to reject obvious data errors (case-of-cases mislistings, decimal-point typos) without rejecting legitimate sale prices.

Example bounds in v1:

IndexUPCFloorCeilingRationale
Egg715141514643$1.00$12.00Observed range $2.00–$7.49; padded for promo + premium markets
Soda (each SKU)various$3.00$15.00Observed range $4.00–$14.99 across all 5 SKUs

The initial bounds caught a single $45.00 Mountain Dew row at Save-A-Lot Crossville, TN, scraped 2026-04-20 — almost certainly a case-of-cases mislisting. That row appears in the audit log but was not used in any published value.

The full audit log is queryable for transparency. When a snapshot is computed, its API response includes meta.dropped_rows: N so consumers know how many observations were excluded.

4. Coverage scoring

Every snapshot reports coverage_score, a 0–1 metric representing the fraction of in-scope stores that contributed to the snapshot.

For single-SKU and multi-SKU indices:

text
coverage_score = n_stores_with_observation_this_period / n_stores_carrying_this_constituent_ever

For basket indices:

text
coverage_score = MEAN over constituents of (per-constituent coverage)

Each published_indices row carries a coverage_threshold (default 0.6 for SKU indices, 0.5 for baskets). Snapshots below threshold are still computed and stored, but their publish_status may be marked 'forced_publish' (manual override) or simply included in the API response with the low coverage transparently shown in meta. The API response always includes coverage_score so consumers can weight thin snapshots appropriately.

Beta period (April 2026) vs Live period (May 2026 onward)

⚠
April 2026 snapshots are beta
April 2026 snapshots are stored with publish_status = 'beta' and excluded from the default API response unless ?include_beta=true is set. Some chain crons did not fire on a normal schedule during April; coverage is therefore non-uniform.

May 2026 is the official live inception for all v1 indices. Snapshots from May onward are publish_status = 'auto' and form the published series. The base period (base_period) is set to '2026-05' for all v1 indices; the indexed value (§5) is normalized against the May 2026 absolute USD value.

5. Indexed value normalization

Each snapshot stores its absolute value_usd. The indexed value (where base period = 100) is computed at query time:

text
value_indexed = value_usd / published_indices.base_value_usd × 100

base_value_usd is locked at the first publishable snapshot for the base period (typically May 2026 for v1 indices). It does not change unless the index methodology version is bumped (§6), at which point a new base may be set.

The API response always returns both value_usd (absolute) and value_indexed (normalized) so consumers can use whichever is appropriate for their purpose. Hedge funds and economists typically prefer indexed; journalists and consumers typically prefer absolute.

6. Methodology versioning

Constituent UPCs, floor/ceiling bounds, and aggregation methods are immutable per methodology_version. When any of these change, we bump the version on published_indices.current_methodology_version. Snapshots store the version active at compute time and stay frozen at that version forever.

This means:

  • A series labeled methodology_version = 1 always uses the same constituents and rules
  • A version bump effectively starts a new series alongside the old one
  • Old snapshots are never silently revised when the methodology changes
  • The API can serve historical snapshots at their original methodology version

When a constituent UPC is discontinued (e.g. Eggland's Best ceases producing Brown Cage Free 12ct), the policy is:

  1. Continue computing using the discontinued UPC for as long as residual SPP coverage allows (typically 3 months)
  2. Add the replacement UPC as an additional constituent for an overlap period (3 months)
  3. Bump methodology_version to remove the discontinued UPC

The methodology document is also versioned. The current document version is shown at the top.

7. Revision policy

Snapshots are immutable at the row level and append-only at the series level. If a published value is later found to be wrong (e.g. a chain reported corrupted prices that passed our floor/ceiling checks), the policy is:

  1. Mark the original snapshot revoked = true, set revoked_at and revoked_reason
  2. Compute a new corrected snapshot for the same (slug, period, region, methodology_version), with replaces_snapshot_id pointing back at the revoked one
  3. Default API responses return only non-revoked snapshots
  4. Audit/transparency consumers can request ?include_revoked=true to see the revision history

This mirrors institutional revision practice (BEA, BLS, IMF). Published values are never silently overwritten.

8. Refresh cadence and snapshot timing

Internal collection cadence: weekly. Every Sunday at 02:00 UTC, the web_price_snapshots cron (migration 170) freezes a copy of all currently-scraped store prices.

Published index cadence: monthly. On the 1st of each month at 04:00 UTC, the compute_index_snapshotcron computes the prior month's snapshot for every active index × region combination. The published series therefore lags the calendar by ~1 day.

Within-month snapshots may be available via ?include_preview=true for QA but are not part of the published series.

9. Anomaly detection

When a new snapshot is computed, two anomaly checks run automatically:

  1. MoM delta check. If |value_usd - prior_period_value_usd| / prior_period_value_usd > 0.10, the snapshot is flagged flagged_for_review = true with flag_reason = 'mom_delta_>10pct'.
  2. Coverage check. If coverage_score < published_indices.coverage_threshold, the snapshot is flagged with flag_reason = 'low_coverage'.

Flagged snapshots are still published by default but include meta.flagged: true, meta.flag_reason: ... in the API response. Flagging triggers a manual review by the data team to confirm the value before consumers act on it. Investigating flagged snapshots is part of the standard monthly QA cadence.

10. Data sources and methodology citations

MainMarket's index methodology draws on the following authoritative sources:

  • U.S. Bureau of Labor Statistics (BLS) — Average Retail Food Prices, APU series. The Egg Index methodology (single-SKU, monthly, regional medians) is adapted from BLS practice for APU0000708111 (Eggs, Grade A, Large per dozen).
  • U.S. Department of Agriculture, Economic Research Service (USDA ERS) — Food Plans, specifically the Thrifty Food Plan 2021 Reevaluation and the Liberal Food Plan. The Low Income Basket constituents are anchored on Thrifty Plan categories; the High Income Basket on Liberal Plan categories.
  • U.S. Census Bureau — Census Region definitions used for regional rollups.
  • Food and Agriculture Organization of the UN (FAO) — Food Price Index methodology. The MainMarket index types and the convention of publishing both absolute and indexed values are modeled on FAO practice.

11. Worked example — Soda Index, April 2026, National (beta)

ℹ
Verify the formula by hand
This worked example walks through one snapshot computation against the live MainMarket data so a reader can independently verify the formula.

Inputs. The April 2026 SPP data for the 5 soda constituent UPCs:

SKUUPCn_stores (April)Median price
Pepsi Real Sugar Cola 12pk00012000030680663$9.99
Mountain Dew Diet 12pk00012000809972720$9.99
Dr Pepper Zero Sugar 12pk078000035261506$10.49
Diet Coke Caffeine Free 12pk00049000006131460$10.49
Diet Coke Soda Fridge Pack 12pk049000028911698$10.49

(Medians shown reflect actual April 2026 SPP data; the per-SKU outlier $45 Mountain Dew row is excluded by the ceiling check.)

Step 1: total store count for distribution weights:

text
Σ n_stores = 663 + 720 + 506 + 460 + 698 = 3047

Step 2: per-SKU weights:

SKUn_storesweight
Pepsi Real Sugar6630.2176
Mtn Dew Diet7200.2363
Dr Pepper Zero5060.1660
Diet Coke Caf-Free4600.1510
Diet Coke Fridge6980.2291

Step 3: weighted sum:

text
value_usd = (9.99 × 0.2176) + (9.99 × 0.2363) + (10.49 × 0.1660)
          + (10.49 × 0.1510) + (10.49 × 0.2291)
        ≈ 2.174 + 2.361 + 1.741 + 1.584 + 2.403
        ≈ $10.26

Snapshot row written to index_snapshots:

text
slug:                  soda_12pk_12oz_cans
period:                2026-04
region:                national
methodology_version:   1
value_usd:             10.26
n_observations:        3047  (sum of per-SKU stores; same store may appear under multiple SKUs)
n_stores:              ~1500 (DISTINCT stores across all 5 SKUs)
n_chains:              ~25
coverage_score:        ~1.0 (full coverage)
method:                distribution_weighted_basket
publish_status:        beta  (April is beta period)
constituents_breakdown: [...]  (per-SKU stats as JSONB)

This snapshot's value_indexed (§5) is undefined until the May 2026 base value is set.

12. Limitations (honest disclosure)

  • US grocery only. No restaurants, no convenience stores, no international markets.
  • Packaged retail goods.Prepared deli items, weighted produce (sold by the pound at variable rates), and in-store-only specials (not on the chain's online catalog) are generally outside the catalog and outside the indices.
  • chain_level chains. Some chains use uniform pricing across all stores. For those chains, only one representative store contributes to the index. This is disclosed via meta.constituents_breakdown[i].chain_level_chains for transparency.
  • Freshness varies by chain. Each published_indices row's component data may have been scraped on different days within the period. The index value reflects "best available current price" within the month, not a single-instant snapshot.
  • Beta period (April 2026) coverage is non-uniform. Some chain crons did not fire on a normal schedule. Beta snapshots are excluded from the default API response and should be treated as preview data.
  • Income basket chain assignments are categorical. A real low-income shopper may shop at multiple chain tiers. The chain_filterreflects the dominant shopping environment per income tier, not the only one. We may publish a "blended income basket" in v1.5 that mixes chain tiers per real shopping behavior.

13. Contact + data access

Methodology questions, audit requests, or institutional licensing inquiries: hello@mainmarket.com.

The full published_indices registry (including current constituents, bounds, and chain filters per index) is queryable via GET /v1/indices (coming soon) or directly inspectable in meta.constituents of any GET /v1/indices/{slug} response.

Audit log of dropped rows is available on request.


Methodology v1.0. Last updated 2026-05-04.