Close Menu
London Herald
  • UK
  • London
  • Politics
  • Sports
  • Finance
  • Tech
What's Hot

GB News guest calls out fossil fuel interests behind right wing channel’s net zero attacks

May 19, 2025

Gary Lineker to leave BBC without payout this weekend

May 19, 2025

UK-EU reset: Keir Starmer secures major breakthrough on UK-EU deal

May 19, 2025
London HeraldLondon Herald
Monday, May 19
  • UK
  • London
  • Politics
  • Sports
  • Finance
  • Tech
London Herald
Home » The new markets for AI data

The new markets for AI data

Jaxon BennettBy Jaxon BennettMay 19, 2025 Tech 4 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email


Unlock the Editor’s Digest for free

Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.

The writer is the global co-head of investment banking at Goldman Sachs.

Data is the foundation of the artificial intelligence revolution, but AI is also revolutionising the market for data. Developers are racing to invest billions of dollars to build the infrastructure to power vast AI systems. That rapid expansion has led to a surge in demand for data, creating the potential for companies to generate significant economic value.

AI systems are typically described as having three main components — power, compute and data. These refer to the electricity required to power data centres, the chips needed to conduct computations at mind-boggling speeds, and the data necessary to train AI models. Of these critical components, it is data that is least discussed, perhaps because data centres and semiconductors are physical things you can see and touch. (It’s admittedly difficult to hold up a data packet during an onstage keynote.)

But sourcing data is an essential aspect of the rapidly expanding AI ecosystem. According to some estimates, the world is running out of “organic” data, with model developers reaching the limits of publicly available data — essentially copies of the entire internet — to pre-train ever-bigger models.

After AI models are constructed and pre-trained on huge data sets, they still require additional “test time compute” where a model is asked to answer specific questions or solve problems. This requires the right kind of data, which is sometimes lacking.

There is a lack of sufficient training data that shows humans “showing their work” in the steps to address complex problems. This is where companies with focused, well-organised, or highly logical data sets can become newly relevant. Imagine how a textbook company might use its archives of technical manuals and coursework to train an AI system to do complex scientific processes.

Recent data licensing deals show how different companies are selling access to their data to AI companies. Expect this trend to accelerate as companies get even more creative in doing so. So far, these deals have been negotiated individually with special terms, but you can imagine a marketplace — or multiple markets — for training data emerging.

Synthetic data, or data created at least in part by AI systems, is a critical part of the development of large language models and has emerged as one path for expanding the set of options for developers looking for new data sets.

For example, as robotic technology becomes more sophisticated, AI systems can increasingly create maps of our physical environment. Synthetic data for self-driving might involve setting up a “digital twin” of Los Angeles and having millions of “mock” vehicles navigate the city in a virtual space as training data.

And it is possible that types of data that have previously been difficult to analyse or use become newly accessible and valuable with the incredible computational power of AI systems. Think about what data we’ve collected about complex systems such as weather, quantum mechanics or viral mutations. As robots can perceive entire categories of data that are imperceptible to humans, collections of video and spatial data may also suddenly have a newfound value.

Recommended

Tesla uses the data collected by its fleet of autonomous driving vehicles to train the AI models that power its underlying self-driving technology. And Nvidia recently announced an expansion of its robot simulation environment, where it trains its robots in a virtual, digital representation of the physical world.

One of the most valuable repositories of data is human-generated data that remains locked away — proprietary research behind corporate and government firewalls. Today, the holders of this data are reluctant to make it accessible without knowing the implications. But the right structures and incentives can invite more deals.

In practical terms, different companies will devise different strategies. Some will treat data as a core business asset, not a byproduct, and work to monetise it through licensing or subscriptions. Others will need to upgrade their data infrastructure to make the best use of future AI capabilities.

How different jurisdictions decide to regulate AI and further regulate data usage will have profound implications for how those markets evolve — and where. Data privacy and security, questions about data provenance, ownership, authentication, are all potential new legislation areas.

This period of incredible innovation and upheaval offers opportunities for the companies that get their data strategy right.



Source link

Jaxon Bennett

Keep Reading

Nvidia chief announces major Taiwan chip investments

Five of the best new cameras

What do AI chatbots say about their own bosses — and their rivals?

Nvidia seeks to build its business beyond Big Tech

It pays to use AI on the sly at work

Randstad CEO on recruiting in a stalled market

Add A Comment
Leave A Reply Cancel Reply

Editors Picks
Latest Posts

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

Advertisement
Demo

News

  • World
  • US Politics
  • EU Politics
  • Business
  • Opinions
  • Connections
  • Science

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

© 2025 London Herald.
  • Privacy Policy
  • Terms
  • Accessibility

Type above and press Enter to search. Press Esc to cancel.