MainMarket Published Price Indices — Methodology

Version 1.0 · 2026-05-04

This document defines the methodology for MainMarket's published monthly price indices. It is the authoritative reference for institutional buyers, journalists, and analysts who need to understand exactly how each index is computed.

Indices currently published:

eggs_cage_free_large_dozen — single-SKU egg index
soda_12pk_12oz_cans — multi-SKU soda category index
basket_low_income — USDA Thrifty Food Plan-anchored basket at value-tier chains
basket_high_income — USDA Liberal Food Plan-anchored basket at premium-tier chains

For machine-readable definitions of each index (constituents, floor/ceiling bounds, chain filters), see the published_indices registry exposed at GET /v1/indices/{slug} (returned in meta.constituents).

1. Index types and aggregation formulas

MainMarket publishes three types of indices. Each uses a distinct aggregation method.

single_sku: One canonical product UPC. Index value = median of regional store prices for that UPC.
multi_sku_same_format: N constituents in identical pack format. Distribution-weighted basket of per-SKU medians.
basket_total_cost: N constituents, possibly across formats. Sum of per-SKU medians × quantity, filtered by chain market position.

1.1 Single-SKU index (single_sku)

A single-SKU index tracks the price of one canonical product (one UPC) over time. The Egg Index is single-SKU.

Formula. For a given (slug, period, region):

text

value_usd = MEDIAN(regular_price)  over rows P satisfying:
    canonical_products.upc = constituent.upc
  AND store.state ∈ region_states (or any state if region = 'national')
  AND store.chain matches published_indices.chain_filter (or any chain if filter is NULL)
  AND last_scraped_at within period  (i.e. scrape month = period)
  AND floor_usd ≤ regular_price ≤ ceiling_usd

We use median rather than mean because median is robust to outliers and matches BLS practice for the Average Retail Food Prices series. Outliers are excluded prior to aggregation via the floor/ceiling bounds (§3) but median additionally protects against any residual data quality issues.

1.2 Multi-SKU same-format index (multi_sku_same_format)

A multi-SKU same-format index tracks an aggregate price across N UPCs that are sold in identical pack format. The Soda Index is multi-SKU same-format (5 UPCs, all in 12pk × 12oz cans).

Formula. For each constituent UPC i, compute the per-SKU median exactly as in §1.1:

text

median_i = MEDIAN(regular_price for UPC_i across stores in region/chain_filter)

Compute store-distribution weight per SKU:

text

n_stores_i = COUNT(DISTINCT store_id) for UPC_i in region
weight_i   = n_stores_i / Σ n_stores_j  (sum across all constituent SKUs)

Then:

text

value_usd = Σ (median_i × weight_i)

This is the distribution-weighted basket formula. It captures the de factomarket by weighting each SKU by how broadly it's stocked, preserving the consumer's-eye-view of category pricing. A SKU stocked at 700 stores contributes more to the index than one stocked at 400.

1.3 Basket total cost (basket_total_cost)

A basket total cost index reports the aggregate dollar cost of a basket of N constituent UPCs at qty 1 each (or specified quantities). The Low Income and High Income baskets are basket_total_cost type.

Formula. For each constituent UPC i, compute the per-SKU median as in §1.1, filtered by chain_filter:

text

median_i = MEDIAN(regular_price for UPC_i across stores in region
                  WHERE chain.market_position matches chain_filter)

Then:

text

value_usd = Σ (median_i × qty_i)

This is identical to how the FAO Food Price Index aggregates within a sub-index and how BLS food-at-home CPI aggregates across items. The result is a real dollar amount — the cost of buying the basket once at the basket's eligible chains — not a unitless index value.

A normalized indexed value is computed at query time (§5).

2. Region definitions

Indices are computed for national plus the 4 US Census Bureau regions used by BLS:

Region	States
Northeast	CT, ME, MA, NH, NJ, NY, PA, RI, VT
Midwest	IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI
South	AL, AR, DE, DC, FL, GA, KY, LA, MD, MS, NC, OK, SC, TN, TX, VA, WV
West	AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY

national aggregates across all 50 states (no region filter).

A store contributes to exactly one regional index (the region containing its state) and to the national index. State assignment is from stores.state.

3. Sanity bounds (per-constituent floor and ceiling)

Every constituent UPC has a floor_usd and ceiling_usd defined in published_indices.constituents. Any SPP row outside [floor_usd, ceiling_usd] is excluded from the median calculation and logged to index_snapshot_dropped_rows with the reason ('below_floor' or 'above_ceiling').

Bounds are set per UPC based on observed P5/P95 of historical scrapes plus a margin for promotional volatility. Bounds are conservative: their goal is to reject obvious data errors (case-of-cases mislistings, decimal-point typos) without rejecting legitimate sale prices.

Example bounds in v1:

Index	UPC	Floor	Ceiling	Rationale
Egg	715141514643	$1.00	$12.00	Observed range $2.00–$7.49; padded for promo + premium markets
Soda (each SKU)	various	$3.00	$15.00	Observed range $4.00–$14.99 across all 5 SKUs

The initial bounds caught a single $45.00 Mountain Dew row at Save-A-Lot Crossville, TN, scraped 2026-04-20 — almost certainly a case-of-cases mislisting. That row appears in the audit log but was not used in any published value.

The full audit log is queryable for transparency. When a snapshot is computed, its API response includes meta.dropped_rows: N so consumers know how many observations were excluded.

4. Coverage scoring

Every snapshot reports coverage_score, a 0–1 metric representing the fraction of in-scope stores that contributed to the snapshot.

For single-SKU and multi-SKU indices:

text

coverage_score = n_stores_with_observation_this_period / n_stores_carrying_this_constituent_ever

For basket indices:

text

coverage_score = MEAN over constituents of (per-constituent coverage)

Each published_indices row carries a coverage_threshold (default 0.6 for SKU indices, 0.5 for baskets). Snapshots below threshold are still computed and stored, but their publish_status may be marked 'forced_publish' (manual override) or simply included in the API response with the low coverage transparently shown in meta. The API response always includes coverage_score so consumers can weight thin snapshots appropriately.

Beta period (April 2026) vs Live period (May 2026 onward)

⚠

April 2026 snapshots are beta

April 2026 snapshots are stored with publish_status = 'beta' and excluded from the default API response unless ?include_beta=true is set. Some chain crons did not fire on a normal schedule during April; coverage is therefore non-uniform.

May 2026 is the official live inception for all v1 indices. Snapshots from May onward are publish_status = 'auto' and form the published series. The base period (base_period) is set to '2026-05' for all v1 indices; the indexed value (§5) is normalized against the May 2026 absolute USD value.

5. Indexed value normalization

Each snapshot stores its absolute value_usd. The indexed value (where base period = 100) is computed at query time:

text

value_indexed = value_usd / published_indices.base_value_usd × 100

base_value_usd is locked at the first publishable snapshot for the base period (typically May 2026 for v1 indices). It does not change unless the index methodology version is bumped (§6), at which point a new base may be set.

The API response always returns both value_usd (absolute) and value_indexed (normalized) so consumers can use whichever is appropriate for their purpose. Hedge funds and economists typically prefer indexed; journalists and consumers typically prefer absolute.

6. Methodology versioning

Constituent UPCs, floor/ceiling bounds, and aggregation methods are immutable per methodology_version. When any of these change, we bump the version on published_indices.current_methodology_version. Snapshots store the version active at compute time and stay frozen at that version forever.

This means:

A series labeled methodology_version = 1 always uses the same constituents and rules
A version bump effectively starts a new series alongside the old one
Old snapshots are never silently revised when the methodology changes
The API can serve historical snapshots at their original methodology version

When a constituent UPC is discontinued (e.g. Eggland's Best ceases producing Brown Cage Free 12ct), the policy is:

Continue computing using the discontinued UPC for as long as residual SPP coverage allows (typically 3 months)
Add the replacement UPC as an additional constituent for an overlap period (3 months)
Bump methodology_version to remove the discontinued UPC

The methodology document is also versioned. The current document version is shown at the top.

7. Revision policy

Snapshots are immutable at the row level and append-only at the series level. If a published value is later found to be wrong (e.g. a chain reported corrupted prices that passed our floor/ceiling checks), the policy is:

Mark the original snapshot revoked = true, set revoked_at and revoked_reason
Compute a new corrected snapshot for the same (slug, period, region, methodology_version), with replaces_snapshot_id pointing back at the revoked one
Default API responses return only non-revoked snapshots
Audit/transparency consumers can request ?include_revoked=true to see the revision history

This mirrors institutional revision practice (BEA, BLS, IMF). Published values are never silently overwritten.

8. Refresh cadence and snapshot timing

Internal collection cadence: weekly. Every Sunday at 02:00 UTC, the web_price_snapshots cron (migration 170) freezes a copy of all currently-scraped store prices.

Published index cadence: monthly. On the 1st of each month at 04:00 UTC, the compute_index_snapshotcron computes the prior month's snapshot for every active index × region combination. The published series therefore lags the calendar by ~1 day.

Within-month snapshots may be available via ?include_preview=true for QA but are not part of the published series.

9. Anomaly detection

When a new snapshot is computed, two anomaly checks run automatically:

MoM delta check. If |value_usd - prior_period_value_usd| / prior_period_value_usd > 0.10, the snapshot is flagged flagged_for_review = true with flag_reason = 'mom_delta_>10pct'.
Coverage check. If coverage_score < published_indices.coverage_threshold, the snapshot is flagged with flag_reason = 'low_coverage'.

Flagged snapshots are still published by default but include meta.flagged: true, meta.flag_reason: ... in the API response. Flagging triggers a manual review by the data team to confirm the value before consumers act on it. Investigating flagged snapshots is part of the standard monthly QA cadence.

10. Data sources and methodology citations

MainMarket's index methodology draws on the following authoritative sources:

U.S. Bureau of Labor Statistics (BLS) — Average Retail Food Prices, APU series. The Egg Index methodology (single-SKU, monthly, regional medians) is adapted from BLS practice for APU0000708111 (Eggs, Grade A, Large per dozen).
U.S. Department of Agriculture, Economic Research Service (USDA ERS) — Food Plans, specifically the Thrifty Food Plan 2021 Reevaluation and the Liberal Food Plan. The Low Income Basket constituents are anchored on Thrifty Plan categories; the High Income Basket on Liberal Plan categories.
U.S. Census Bureau — Census Region definitions used for regional rollups.
Food and Agriculture Organization of the UN (FAO) — Food Price Index methodology. The MainMarket index types and the convention of publishing both absolute and indexed values are modeled on FAO practice.

11. Worked example — Soda Index, April 2026, National (beta)

ℹ

Verify the formula by hand

This worked example walks through one snapshot computation against the live MainMarket data so a reader can independently verify the formula.

Inputs. The April 2026 SPP data for the 5 soda constituent UPCs:

SKU	UPC	n_stores (April)	Median price
Pepsi Real Sugar Cola 12pk	00012000030680	663	$9.99
Mountain Dew Diet 12pk	00012000809972	720	$9.99
Dr Pepper Zero Sugar 12pk	078000035261	506	$10.49
Diet Coke Caffeine Free 12pk	00049000006131	460	$10.49
Diet Coke Soda Fridge Pack 12pk	049000028911	698	$10.49

(Medians shown reflect actual April 2026 SPP data; the per-SKU outlier $45 Mountain Dew row is excluded by the ceiling check.)

Step 1: total store count for distribution weights:

text

Σ n_stores = 663 + 720 + 506 + 460 + 698 = 3047

Step 2: per-SKU weights:

SKU	n_stores	weight
Pepsi Real Sugar	663	0.2176
Mtn Dew Diet	720	0.2363
Dr Pepper Zero	506	0.1660
Diet Coke Caf-Free	460	0.1510
Diet Coke Fridge	698	0.2291

Step 3: weighted sum:

text

value_usd = (9.99 × 0.2176) + (9.99 × 0.2363) + (10.49 × 0.1660)
          + (10.49 × 0.1510) + (10.49 × 0.2291)
        ≈ 2.174 + 2.361 + 1.741 + 1.584 + 2.403
        ≈ $10.26

Snapshot row written to index_snapshots:

text

slug:                  soda_12pk_12oz_cans
period:                2026-04
region:                national
methodology_version:   1
value_usd:             10.26
n_observations:        3047  (sum of per-SKU stores; same store may appear under multiple SKUs)
n_stores:              ~1500 (DISTINCT stores across all 5 SKUs)
n_chains:              ~25
coverage_score:        ~1.0 (full coverage)
method:                distribution_weighted_basket
publish_status:        beta  (April is beta period)
constituents_breakdown: [...]  (per-SKU stats as JSONB)

This snapshot's value_indexed (§5) is undefined until the May 2026 base value is set.

12. Limitations (honest disclosure)

US grocery only. No restaurants, no convenience stores, no international markets.
Packaged retail goods.Prepared deli items, weighted produce (sold by the pound at variable rates), and in-store-only specials (not on the chain's online catalog) are generally outside the catalog and outside the indices.
chain_level chains. Some chains use uniform pricing across all stores. For those chains, only one representative store contributes to the index. This is disclosed via meta.constituents_breakdown[i].chain_level_chains for transparency.
Freshness varies by chain. Each published_indices row's component data may have been scraped on different days within the period. The index value reflects "best available current price" within the month, not a single-instant snapshot.
Beta period (April 2026) coverage is non-uniform. Some chain crons did not fire on a normal schedule. Beta snapshots are excluded from the default API response and should be treated as preview data.
Income basket chain assignments are categorical. A real low-income shopper may shop at multiple chain tiers. The chain_filterreflects the dominant shopping environment per income tier, not the only one. We may publish a "blended income basket" in v1.5 that mixes chain tiers per real shopping behavior.

13. Contact + data access

Methodology questions, audit requests, or institutional licensing inquiries: hello@mainmarket.com.

The full published_indices registry (including current constituents, bounds, and chain filters per index) is queryable via GET /v1/indices (coming soon) or directly inspectable in meta.constituents of any GET /v1/indices/{slug} response.

Audit log of dropped rows is available on request.

Methodology v1.0. Last updated 2026-05-04.

MainMarket Published Price Indices — Methodology

Version 1.0 · 2026-05-04

Indices currently published:

eggs_cage_free_large_dozen — single-SKU egg index
soda_12pk_12oz_cans — multi-SKU soda category index
basket_low_income — USDA Thrifty Food Plan-anchored basket at value-tier chains
basket_high_income — USDA Liberal Food Plan-anchored basket at premium-tier chains

1. Index types and aggregation formulas

MainMarket publishes three types of indices. Each uses a distinct aggregation method.

single_sku: One canonical product UPC. Index value = median of regional store prices for that UPC.
multi_sku_same_format: N constituents in identical pack format. Distribution-weighted basket of per-SKU medians.
basket_total_cost: N constituents, possibly across formats. Sum of per-SKU medians × quantity, filtered by chain market position.

1.1 Single-SKU index (single_sku)

A single-SKU index tracks the price of one canonical product (one UPC) over time. The Egg Index is single-SKU.

Formula. For a given (slug, period, region):

text

value_usd = MEDIAN(regular_price)  over rows P satisfying:
    canonical_products.upc = constituent.upc
  AND store.state ∈ region_states (or any state if region = 'national')
  AND store.chain matches published_indices.chain_filter (or any chain if filter is NULL)
  AND last_scraped_at within period  (i.e. scrape month = period)
  AND floor_usd ≤ regular_price ≤ ceiling_usd

1.2 Multi-SKU same-format index (multi_sku_same_format)

A multi-SKU same-format index tracks an aggregate price across N UPCs that are sold in identical pack format. The Soda Index is multi-SKU same-format (5 UPCs, all in 12pk × 12oz cans).

Formula. For each constituent UPC i, compute the per-SKU median exactly as in §1.1:

text

median_i = MEDIAN(regular_price for UPC_i across stores in region/chain_filter)

Compute store-distribution weight per SKU:

text

n_stores_i = COUNT(DISTINCT store_id) for UPC_i in region
weight_i   = n_stores_i / Σ n_stores_j  (sum across all constituent SKUs)

Then:

text

value_usd = Σ (median_i × weight_i)

1.3 Basket total cost (basket_total_cost)

Formula. For each constituent UPC i, compute the per-SKU median as in §1.1, filtered by chain_filter:

text

median_i = MEDIAN(regular_price for UPC_i across stores in region
                  WHERE chain.market_position matches chain_filter)

Then:

text

value_usd = Σ (median_i × qty_i)

A normalized indexed value is computed at query time (§5).

2. Region definitions

Indices are computed for national plus the 4 US Census Bureau regions used by BLS:

Region	States
Northeast	CT, ME, MA, NH, NJ, NY, PA, RI, VT
Midwest	IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI
South	AL, AR, DE, DC, FL, GA, KY, LA, MD, MS, NC, OK, SC, TN, TX, VA, WV
West	AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY

national aggregates across all 50 states (no region filter).

A store contributes to exactly one regional index (the region containing its state) and to the national index. State assignment is from stores.state.

3. Sanity bounds (per-constituent floor and ceiling)

Example bounds in v1:

Index	UPC	Floor	Ceiling	Rationale
Egg	715141514643	$1.00	$12.00	Observed range $2.00–$7.49; padded for promo + premium markets
Soda (each SKU)	various	$3.00	$15.00	Observed range $4.00–$14.99 across all 5 SKUs

The full audit log is queryable for transparency. When a snapshot is computed, its API response includes meta.dropped_rows: N so consumers know how many observations were excluded.

4. Coverage scoring

Every snapshot reports coverage_score, a 0–1 metric representing the fraction of in-scope stores that contributed to the snapshot.

For single-SKU and multi-SKU indices:

text

coverage_score = n_stores_with_observation_this_period / n_stores_carrying_this_constituent_ever

For basket indices:

text

coverage_score = MEAN over constituents of (per-constituent coverage)

Beta period (April 2026) vs Live period (May 2026 onward)

⚠

April 2026 snapshots are beta

5. Indexed value normalization

Each snapshot stores its absolute value_usd. The indexed value (where base period = 100) is computed at query time:

text

value_indexed = value_usd / published_indices.base_value_usd × 100

6. Methodology versioning

This means:

A series labeled methodology_version = 1 always uses the same constituents and rules
A version bump effectively starts a new series alongside the old one
Old snapshots are never silently revised when the methodology changes
The API can serve historical snapshots at their original methodology version

When a constituent UPC is discontinued (e.g. Eggland's Best ceases producing Brown Cage Free 12ct), the policy is:

Continue computing using the discontinued UPC for as long as residual SPP coverage allows (typically 3 months)
Add the replacement UPC as an additional constituent for an overlap period (3 months)
Bump methodology_version to remove the discontinued UPC

The methodology document is also versioned. The current document version is shown at the top.

7. Revision policy

Mark the original snapshot revoked = true, set revoked_at and revoked_reason
Compute a new corrected snapshot for the same (slug, period, region, methodology_version), with replaces_snapshot_id pointing back at the revoked one
Default API responses return only non-revoked snapshots
Audit/transparency consumers can request ?include_revoked=true to see the revision history

This mirrors institutional revision practice (BEA, BLS, IMF). Published values are never silently overwritten.

8. Refresh cadence and snapshot timing

Internal collection cadence: weekly. Every Sunday at 02:00 UTC, the web_price_snapshots cron (migration 170) freezes a copy of all currently-scraped store prices.

Within-month snapshots may be available via ?include_preview=true for QA but are not part of the published series.

9. Anomaly detection

When a new snapshot is computed, two anomaly checks run automatically:

MoM delta check. If |value_usd - prior_period_value_usd| / prior_period_value_usd > 0.10, the snapshot is flagged flagged_for_review = true with flag_reason = 'mom_delta_>10pct'.
Coverage check. If coverage_score < published_indices.coverage_threshold, the snapshot is flagged with flag_reason = 'low_coverage'.

10. Data sources and methodology citations

MainMarket's index methodology draws on the following authoritative sources:

U.S. Bureau of Labor Statistics (BLS) — Average Retail Food Prices, APU series. The Egg Index methodology (single-SKU, monthly, regional medians) is adapted from BLS practice for APU0000708111 (Eggs, Grade A, Large per dozen).
U.S. Department of Agriculture, Economic Research Service (USDA ERS) — Food Plans, specifically the Thrifty Food Plan 2021 Reevaluation and the Liberal Food Plan. The Low Income Basket constituents are anchored on Thrifty Plan categories; the High Income Basket on Liberal Plan categories.
U.S. Census Bureau — Census Region definitions used for regional rollups.
Food and Agriculture Organization of the UN (FAO) — Food Price Index methodology. The MainMarket index types and the convention of publishing both absolute and indexed values are modeled on FAO practice.

11. Worked example — Soda Index, April 2026, National (beta)

ℹ

Verify the formula by hand

This worked example walks through one snapshot computation against the live MainMarket data so a reader can independently verify the formula.

Inputs. The April 2026 SPP data for the 5 soda constituent UPCs:

SKU	UPC	n_stores (April)	Median price
Pepsi Real Sugar Cola 12pk	00012000030680	663	$9.99
Mountain Dew Diet 12pk	00012000809972	720	$9.99
Dr Pepper Zero Sugar 12pk	078000035261	506	$10.49
Diet Coke Caffeine Free 12pk	00049000006131	460	$10.49
Diet Coke Soda Fridge Pack 12pk	049000028911	698	$10.49

(Medians shown reflect actual April 2026 SPP data; the per-SKU outlier $45 Mountain Dew row is excluded by the ceiling check.)

Step 1: total store count for distribution weights:

text

Σ n_stores = 663 + 720 + 506 + 460 + 698 = 3047

Step 2: per-SKU weights:

SKU	n_stores	weight
Pepsi Real Sugar	663	0.2176
Mtn Dew Diet	720	0.2363
Dr Pepper Zero	506	0.1660
Diet Coke Caf-Free	460	0.1510
Diet Coke Fridge	698	0.2291

Step 3: weighted sum:

text

value_usd = (9.99 × 0.2176) + (9.99 × 0.2363) + (10.49 × 0.1660)
          + (10.49 × 0.1510) + (10.49 × 0.2291)
        ≈ 2.174 + 2.361 + 1.741 + 1.584 + 2.403
        ≈ $10.26

Snapshot row written to index_snapshots:

text

slug:                  soda_12pk_12oz_cans
period:                2026-04
region:                national
methodology_version:   1
value_usd:             10.26
n_observations:        3047  (sum of per-SKU stores; same store may appear under multiple SKUs)
n_stores:              ~1500 (DISTINCT stores across all 5 SKUs)
n_chains:              ~25
coverage_score:        ~1.0 (full coverage)
method:                distribution_weighted_basket
publish_status:        beta  (April is beta period)
constituents_breakdown: [...]  (per-SKU stats as JSONB)

This snapshot's value_indexed (§5) is undefined until the May 2026 base value is set.

12. Limitations (honest disclosure)

US grocery only. No restaurants, no convenience stores, no international markets.
Packaged retail goods.Prepared deli items, weighted produce (sold by the pound at variable rates), and in-store-only specials (not on the chain's online catalog) are generally outside the catalog and outside the indices.
chain_level chains. Some chains use uniform pricing across all stores. For those chains, only one representative store contributes to the index. This is disclosed via meta.constituents_breakdown[i].chain_level_chains for transparency.
Freshness varies by chain. Each published_indices row's component data may have been scraped on different days within the period. The index value reflects "best available current price" within the month, not a single-instant snapshot.
Beta period (April 2026) coverage is non-uniform. Some chain crons did not fire on a normal schedule. Beta snapshots are excluded from the default API response and should be treated as preview data.
Income basket chain assignments are categorical. A real low-income shopper may shop at multiple chain tiers. The chain_filterreflects the dominant shopping environment per income tier, not the only one. We may publish a "blended income basket" in v1.5 that mixes chain tiers per real shopping behavior.

13. Contact + data access

Methodology questions, audit requests, or institutional licensing inquiries: hello@mainmarket.com.

Audit log of dropped rows is available on request.

Methodology v1.0. Last updated 2026-05-04.