Editorial · Methodology
How Rawcomps sources, weights, and discloses sold-comp data
What Rawcomps does
Rawcomps is a condition-aware sold-comp engine for pre-1980 vintage sports cards (currently MLB-only, with 127,813 baseball cards indexed and 175 GREEN-tier populations through end-to-end comp processing as of 2026-05-17). The product's moat is raw-title parsing: most comp tools anchor on graded sales because graded data is structured and easy to scrape, but the majority of vintage cards trade raw — and existing pricing tools either ignore the raw market or substitute graded medians as a proxy. Rawcomps parses condition out of free-text eBay sold titles via an NLP pipeline and reports raw and graded sold history side by side. Free-source-only by design: ADR-014 (the locked thesis) defers the $171–221/mo paid stack (Card Ladder Pro, VCP Gold, Scrapfly, WorthPoint, Beckett MDR) until cohort feedback confirms comp coverage is the binding constraint.
How we gather data
12 sources feed the comp engine. Every per-card walker run hits the auction-archive sources serially and writes a JSON file with the matched comps + funnel telemetry to data/comps/per_card_walker/<year>/<card_id>.jsonl. The aggregator at services/ingest/scrapers/baseball/comp_aggregator.py normalizes shapes across sources before tiering.
- Goldin Auctions — Major auction house archive; well-indexed for vintage HOFers.
- Heritage Auctions Sports — Premier vintage auction archive; access intermittent (Akamai).
- Robert Edward Auctions (REA) — Vintage-specialist archive; strongest pre-1940 coverage.
- SCP Auctions — Pre-war + post-war auction archive.
- PWCC Marketplace — Comprehensive 1950–1980 sold history.
- Memory Lane Inc. — Long-running vintage card auction archive.
- Huggins & Scott — Vintage auction house; pre-war specialty.
- Mile High Card Co. — Vintage cards + sets; weekly auction archive.
- Love of the Game Auctions — Photography and pre-war specialty house.
- PSA APR — Authoritative graded-only sold history across all houses.
- GemRate — Population + grading-rate aggregator across PSA/SGC/BGS/CGC.
- eBay (via Terapeak Product Research) — Sold listings via authenticated Seller Hub session; the largest single source of raw-condition data.
What we do not scrape:
- Card Ladder — Commercial pricing-index product; competitor.
- Vintage Card Prices (VCP) — Subscription-only paid feed; not in the free-source stack.
- COMC — Live listings, not sold history; different signal.
eBay's February 2026 Terms of Service explicitly prohibit anonymous scraping and LLM-driven bots, so we no longer hit the public sold-listing pages. eBay sold history now comes only through Terapeak Product Research using a paying eBay Store account; that path is contractually tooling-as-sold, not anonymous data extraction. Source: eBay User Agreement.
The three honesty flags
Every comp display on Rawcomps carries up to three flags — the “disclose the gap” layer that the wedge is built on.
Stale (>180 days)
A card's comp panel shows stale-180d when the most recent sold sale in the dataset is more than 180 days old. The threshold is deliberate: card markets shift on grading-population announcements, hobby-news cycles, and macro collector-spend swings, and a price more than 6 months stale should not be reported as “current.” The badge is on by default for any card whose newest sale crosses the 180-day mark. Source: PSA APR.
High variance (IQR ÷ median > 2.0)
A card's comp panel shows high-variance when the interquartile range of its sold prices exceeds 2× the median — i.e., the middle 50% of sales spans more than twice the median price. This typically happens when (a) the card has a wide condition spread (PSA 1 to PSA 9) trading in the same window, (b) a single outlier sale (provenance, error, autograph) dominates the distribution, or (c) the sample is too thin for the median to be stable. The badge reminds buyers that the headline median is not a reliable single number.
Graded-only substitution
A card's comp panel shows graded-only when 100% of the matched sales are graded (PSA, SGC, BGS, or CGC) and zero are raw. This is the gap the entire product was built to surface: many pricing tools quietly use graded medians as a stand-in for raw-card prices, which can overstate raw value by 40–300% depending on the card. When you see this badge, the median reported is graded-only — do not use it as a raw-card pricing reference. Source: PSA Grading Services.
We surface additional auxiliary badges (thin-data when n<10, floor-price when a Heritage row is a Make Offer floor, reprint-flag / defect-flag / lot-flag for title pattern matches), but the three above are the original wedge.
Raw versus graded
Roughly 70–90% of pre-1980 vintage cards trade raw at any given time. Grading slabs cost $25–100 per card, take 2–9 months at PSA, and only economically make sense on cards worth several hundred dollars. The implication: most actual transactions in the vintage market are raw, but most published “values” are graded.
That gap is the wedge. A 1965 Topps #135 Deron Johnson PSA 8 might sell for $200; the same card raw — assessed by a knowledgeable buyer's eye as roughly EX-MT — sells for $25–40. Reporting the $200 median as “the value” misleads a dealer pricing inventory for a card show. Reporting the $25–40 raw band, with the graded $200 as a separate column and a disclosed condition-quality assumption, is the truth.
Rawcomps does not interpolate between raw and graded. We report each independently, source-by-source, with the honesty flags above. The pricing tools that quietly substitute graded medians for raw — without disclosing the substitution — are the failure mode this site exists to correct.
What we don't claim
- Not real-time. Walker runs paced at 30 requests per minute; per-source freshness lags hours-to-days behind auction close.
- Not a grading service. Rawcomps reports sold prices; it does not assess any specific card's condition. Submit cards to PSA, SGC, BGS, or CGC for an authenticated grade.
- Not financial advice. Vintage cards are illiquid, condition-sensitive, and subject to grading-population shocks. Do not treat any reported median as an investment recommendation.
- Comps are historical, not predictive. A 12-month median is what the market did, not what it will do. Forward returns are not in our data.
- Not a card-condition assessment tool. Our condition parsing reads what a listing said the card was; it does not look at the card itself.
- Not affiliated with Topps, Bowman, Upper Deck, Panini, PSA, SGC, CGC, or any card manufacturer or grading service.
Update cadence
The walker chain runs continuously and walks one year of corpus at a time at ~92 cards/hour. Most auction-archive sources surface a new sale within 7 days of the auction's close; PSA APR is typically same-day. Every card's comp panel surfaces its own freshness via the Updated stamp at the top of the page. If you see stale-180d on a card that you know just sold somewhere, the most likely explanation is that our walker hasn't revisited that year yet — re-walks are scheduled chronologically and the high-comp-gap years (1969–1975) get priority.
Conflicts and corrections
If you see a wrong sale price, a misidentified card, a missing variation, or a mislabeled condition band, email corrections@rawcomps.com with the card URL and the specific row. We acknowledge corrections within 7 days. Verified corrections roll into the next walker pass; the underlying source row stays in the archive with a corrected: true flag so the public history is honest about its own revisions.
Affiliate conflicts: Rawcomps participates in the eBay Partner Network — see our affiliate disclosure for the full statement. The comp data is independent of affiliate revenue; the EPN integration only wraps outbound listing links.