Data Sources & Health

Master data quality report — every field in the Datahub at a glance.
93%
Overall Health Score
17
GREEN
5
YELLOW
1
RED / MISSING
389K
TOTAL ROWS
Net Revenue
$6.45M
Shopify $5.34M · Amazon $1.11M
Contribution Margin
$2.65M
98.3% coverage
Ad Spend
$2.00M
Meta $1.56M · Google $324K · Amz $118K
Shipping Cost
$989K
95% matched · Dec 2023–Mar 2026
Products
186
150 with COGS · $0 unmapped
Master Data Health Report — the single source of truth for every field in the Datahub
Data Field Source Rows Coverage Drift Known Concerns How to Verify / Close the Gap
Shopify Orders Shopify CSV export → load_shopify.py 307,108 100% 0.3–1.5% Returns tracked by order date (CSV) not refund date (Analytics), so monthly return timing can differ 8-10%. Compare monthly gross to Shopify Analytics → Sales → Total Sales. Should be within 1.5%.
Amazon Orders Amazon flat file CSV → load_amazon.py 82,469 100% 0.4–3% 3.4% of orders are multi-product where CSV captured partial line items (export issue). Settlement shows slightly higher gross. Compare to Amazon Business Reports → Sales & Traffic. Monthly should be within 3%.
Shopify Discounts Order CSV "Discount Amount" → proportional allocation 142,920 100% <0.4% Penny-perfect for 2025, within 0.4% for 2024. Proportionally allocated across line items in multi-item orders. Compare monthly to Shopify Analytics → Discounts. Should be within 0.5%.
Shopify Returns Order CSV "Refunded Amount" → capped at line gross 3,703 ~90% 8–10% CSV tracks by order date, Analytics by refund date. Timing mismatch. Capped at gross to exclude shipping refunds. 2026 partial. Compare annual to Shopify Analytics → Returns. Monthly will drift; annual should be within 5%.
Shopify Fees Shopify Payments payout CSV (953 payouts) 100% <0.1% None. Actual rates per year: 3.11% (2024), 3.18% (2025), 3.14% (2026). Spot-check: sum platform_fees ÷ gross_revenue by year. Should match rates above.
Amazon Fees (2025) Settlement reports → fact_amazon_settlement_fees 33,264 94.6% 2.4% 34.0% blended (15% referral + 19.1% FBA). 5.4% unmatched orders use 34.0% average. fact_orders shows 34.8% — within $4K. Download new settlement from Seller Central → Reports → Payments. Compare total fees.
Amazon Fees (2024) Settlement reports → fact_amazon_settlement_fees 35,757 92% ~5% 39.9% blended. 8% unmatched use average. fact_orders shows 44.4% — gap due to CSV gross revenue being ~$21K lower than settlement gross. Cross-check: SUM(total_fees) in settlements vs SUM(platform_fees) in fact_orders for 2024.
Amazon Fees (2026) 6 settlement files (Dec 16–Mar 10) 4,543 67% est. 7% 67% at 24.8% actual rate, 33% (post-Mar 10) on 32.2% flat estimate. Fee trend: 39.9% → 34.0% → 24.8%. Upload next settlement report from Seller Central. Each one closes ~2 weeks of estimates. ▸ Green when full Q1 covered.
Meta Ad Spend Meta Ads Manager CSV → load_meta_ads.py 415 100% 2.5% Weekly granularity causes bucket-alignment drift vs daily Windsor pulls. link_clicks used (not clicks all). Compare monthly totals to Meta Ads Manager → Campaigns → Date filter. Penny-exact at monthly level.
Google Ad Spend Google Ads console CSV → load_google_ads.py 89 100% $0.00 None. Penny-perfect vs both Adzviser and Windsor pulls. Compare to Google Ads → Campaigns → Overview. Should be exact.
Amazon Ad Spend (2025-26) Amazon Ads console + Adzviser MCP 577 100% <1% 2025: $105K from console (all 12 months). 2026 Q1: $13K complete (Jan 1-21 console export + Jan 22–Mar 22 Adzviser). No gaps. Compare to Amazon Ads console → Campaign Manager → date filter. Jan 1-21 from downloaded report.
Amazon Ad Spend (2024) 0 0% 100% All 12 months missing (~$82K est). Amazon API only goes back 60 days. Console won't export historical. Triple Whale is only fallback. Try Triple Whale export (Jan 1–Dec 31, 2024). If unavailable, estimate from 2025 seasonality pattern scaled to $82K. ▸ Only Red field in the system.
Shipping Cost ShipStation CSV export → load_shipstation.py 293,340 95.4% <5% Shopify 99.4% matched ($988K). Amazon FBM only 100 rows (most Amazon is FBA — shipping in platform fees). 4.6% unmatched are likely cancelled/sample orders. Compare monthly totals to ShipStation → Reports → Shipment Cost. Should be within 5% per month. Re-export monthly for ongoing coverage.
COGS (Shopify) Nikita's BOM Excel → dim_products.cogs_per_unit 126 SKUs 98.3% Unknown Raw material costs unverified against current supplier pricing — may have drifted since BOM was built. Labor flat $1.16/unit for all 5oz sprays (directional, not task-level accurate). Nikita to provide updated BOM with current supplier quotes. Cross-check 5 top-volume SKUs against latest invoices. ▸ Green when verified.
COGS (Amazon) Derived: Shopify spray COGS + $1.00/pack → update_amazon_cogs.py 25 ASINs 100% est. 5-15% Rough estimate: component spray COGS from BOM + $0.50 labor + $0.50 materials per pack, split by units. Inherits BOM drift. Packaging cost is a flat estimate. Build proper Amazon BOMs with actual pack materials cost per ASIN. Same BOM structure, just more precise packaging/labor. ▸ Green when BOMs built.
Contribution Margin Calculated: net_rev − fees − shipping − COGS 380,963 98.3% <5% Shopify 48.9% margin, Amazon 13.6%. Amazon margin heavily compressed by platform fees (24-44%). Inherits COGS drift. 1.7% uncovered = free/sample items with no product match. Spot-check: pick 5 high-volume SKUs, manually compute margin from source numbers. Should match within $0.50/unit.
Product Catalog BOM Excel + map_amazon_packs + manual adds 186 100% $0 Zero unmapped revenue. All ASINs mapped. 36 free/sample items included. New products need manual adds. Run: SELECT product_id FROM fact_orders WHERE product_id NOT IN (SELECT product_id FROM dim_products). Should return 0.
Customer Data Shopify order CSV → dim_customers 65,657 100% 0% Shopify customers only. Amazon customers not trackable (no PII). Cohort-ready with order history. Compare COUNT(DISTINCT customer_id) to Shopify Admin → Customers → total.
Finance (17 tables) Nikita's Excel → load_finance.py, load_budget_and_launch.py 12,125 100% <1% P&L, BOM, cash flow, budget, seasonal launch, revenue projections. ⚠️ P&L has duplicate-label rows for gross/net — must filter by category column. Compare P&L totals to QuickBooks monthly. Must filter by category to avoid 2x. See export_finance_json.py for correct patterns.
Windsor MCP Windsor.ai ($99/mo) — 7 sources connected 7/7 Shopify row cap (~1,400/query — use weekly chunks). Amazon Ads has same 60-day lookback as Adzviser. Klaviyo revenue data flowing. Test pull: Windsor get_data for Shopify, last 3 days. Should return order-level data. If empty, check connector status.
Green — validated, safe for reporting Yellow — directional, annotate in output Red — missing or placeholder, never report
DuckDB totals (Session 14 — March 23, 2026): 389,577 order rows (307K Shopify + 82K Amazon) · 1,081 ad spend rows ($2.0M) · 186 products (150 with COGS) · 65,657 customers · 73,564 settlement fees · 17 finance tables (12,125 rows). Zero date gaps. Zero unmapped revenue. 98% contribution margin coverage. One Red field remaining (Amazon 2024 ad spend).
What's Next — Closing Remaining Gaps

P0 — Last Red Field

DanAmazon 2024 ad spend — Console export (Jan 1–Dec 31, 2024). The only remaining Red field in the entire system (~$82K estimated). Amazon won't provide historical data via API — requires manual console export. Enables full 2-year blended MER.Now

P1 — Improves Precision (Yellow → Green)

Dan/NikitaUpdated BOM per SKU — Complete COGS database from costing system. Replaces current Yellow BOM data with verified RM costs and task-level labor.Soon
DanAmazon BOMs — Build proper pack-level BOMs with actual packaging/labeling materials cost per ASIN. Replaces current $1.00/pack flat estimate.Soon
DanAmazon 2026 settlement (post-Mar 10) — Upload remaining Q1 settlement reports. Currently 67% matched to actuals (24.8%), rest on 32.2% flat estimate.Soon

Recently Closed

ShipStation shipping costs — 108K shipments, $989K carrier fees, Dec 2023–Mar 2026. 95% Shopify match. Upgraded to Green. March 23.Done
Amazon derived COGS — 25 ASINs updated via spray component COGS + $1.00/pack packaging. Amazon CM coverage 64% → 100%. March 23.Done
Amazon 2026 settlement — 6 reports loaded, 24.8% actual fee rate. March 23.Done
8 unmapped ASINs — All added to dim_products. 186 products, zero unmapped. March 23.Done
Amazon 2026 Q1 ad spend — Jan 1-21 loaded from console. Full Q1, $13K. Upgraded to Green. March 23.Done

Ongoing Pipeline — Keeping Data Current

Windsor MCPOrders: Shopify + Amazon orders pullable on demand. Windsor single-day calls for Amazon, Shopify via API.Automated
Windsor MCPAd Spend: Meta (weekly), Google (monthly), Amazon Ads — all live via Windsor connectors.Automated
ManualFinance: Nikita's Excel → load_finance.py + export_finance_json.py → Finance dashboard.Monthly
ManualSettlement Reports: Amazon Seller Central → CSV upload → per-order fee replacement.Bi-weekly
ManualShipStation: Periodic CSV export for shipping cost refresh. No MCP available.Monthly
ReferenceFull pipeline documentation → Every field, source, method, and refresh cadence.Docs