Why bet on private data?
We have entered a world where LLMs have made public data much less special. To gain an edge, we will see more and more eyes turn towards private and alternative data (following in the path of 95% of hedge funds, VCs, and consultants planning to increase their investment in private/alt data in 2025)
What is private vs public data?
Definitions
- Public data: Data anyone can access with low friction.
- Semi-private data: Data whose origin is public, but whose value is predominantly from a proprietary / high friction transformation.
- Private data: you need permission, relationships, or infrastructure to get this data. It may have transformations on top of it. The higher the friction to obtain this data / the transformations on top of it, the more ‘private’ it is.
Examples:
- Public: The company’s publicly filed 10-K states that net revenue retention (NRR) is 105%.
- Semi-private: A third-party dataset of 10 million aggregated public user reviews was carefully scraped to show that mid-market customers of the company frequently mention pricing dissatisfaction.
- Private: The company’s internal dashboard shows that NRR is 120% for enterprise clients but 85% for mid-market accounts—and churn is accelerating in the last 60 days.
What does private data provide?
Private data provides an edge in ways public data cannot, by enabling:
- Foresight — See what’s likely to happen before others do.
- Example: Bridgewater uses proprietary labor market and supply chain data (ie shipping delays, supplier purchasing behavior) to predict inflation trends and interest rate movements earlier than broad CPI reports.
- Discovery — Find and qualify opportunities others don’t see or overlook
- Example: Thomas Bravo has sourced mid-market SaaS acquisitions by leveraging proprietary benchmarks (e.g., gross margin by ACV band) and user telemetry from customer interviews to spot undervalued vendors with strong retention but weak GTM—a combo that looks unattractive in public comps but attractive post-fix.
- Better execution — Act more effectively on what you own or pursue
- Example: In 2014, Facebook’s leadership changed priority to mobile based on proprietary data showing that the majority of Facebook users were transitioning to mobile usage despite ad revenue lagging there.
Noticeably,
- Private data tends to be much less biased than public data.
- data is often only made public AFTER its been massaged to fit a narrative for broad based consumption.
- private data is (inherently) not meant for broad consumption.
- Example: despite regulation requiring accurate financial statements from public companies, many financial statement fits a narrative by manipulating how it breaks out certain line items.
- More private data directly compounds a durable edge.
- The edge derived from private data tends to grow ~linearly with the amount of private data you have. The more you can productively pull together, the better!
- Analogy: Public data is like common textbooks everyone can read. You can put more effort into studying to reach novel insight/understanding - but everyone else is too, and eventually the value of your interpretation converges with everyone else. Private data is access to alternative books no one else has, and so every additional book you read directly stacks more insights with edge.
In a world of LLMs/agents, private data becomes more valuable
LLMs commoditize insights from public information. This makes semi-private and private data much more important to find an edge — not because there’s less of it, but because the value of everything else just collapsed.
Before LLMs, it took skill to research, synthesize, and interpret all the public data that was out there. This created significant arbitrage opportunities: if you could research, synthesize, and understand public data quicker than others, you had a significant edge.
With LLMs, that arbitrage opportunity has severely shrank, although they still exist (LLMs aren’t perfect, and another actor - usually a human - is still required to act on the reasoning they offer) the trend is clear: private data is going to become much more important.