Let’s Spend $10 Trillion on AI That Improves the Real World, Not Just Ads
This article was first published in The Information.
Four years ago, right around the time the concept of the metaverse became fashionable, I wrote a post calling it a “dystopian nightmare.” I argued that instead of diving into virtual worlds, we should all do more to connect with the real world around us.
Time will tell if the metaverse will eventually come to fruition, but with global business and leisure travel, as well as live event attendance, blowing past pre-pandemic levels in 2024 and 2025, it’s pretty clear humans weren’t meant to live inside a headset.
Fast-forward to today, when some seek to create a world where content increasingly doesn’t come from friends or even human influencers, but rather straight from an AI model, an engineered confection of pixels and waveforms optimized to capture your attention until the next ad arrives.
When the prospect of advanced AI emerged a few short years ago, there was a surplus of enthusiasm about a future of human prosperity, one in which radical advances in healthcare, materials, manufacturing and robotics would transform the world for the better, raising the quality of life for all people on the planet.
I think it’s fair to ask at this point if that’s really where we are headed. How much of our gargantuan investment into AI will be used to truly better the human condition, and how much will be subverted to create better ways to entertain and distract us?
Why It Matters Now
We all understand the momentous era we are entering as large language models have emerged as one of the most disruptive innovations in the history of technology. They are upending every aspect of the tech market, from startups to established enterprises, from chips to software, data centers and even power generation.
Since the founding of OpenAI and the advent of the modern AI industry, as much as $1 trillion has been invested, and that number is growing daily, much of it now in the form of infrastructure: chips, servers, data centers and power to scale solutions in anticipation of massive, lucrative applications. Analysts believe AI investment could reach a total of $10 trillion by 2030. This is substantial, even when considered against global gross domestic product, which is projected to exceed $150 trillion in that same time.
Perhaps even more importantly, the process of building that infrastructure is consuming vast amounts of natural resources, including oil and gas and precious supplies of fresh water. There is growing pressure to demonstrate a return on the investment, and that pressure will only increase as these massive, leveraged investments continue to mount.
Where will those returns come from? Is there a path that can generate the needed economic returns and truly build a better future for humanity?
One Answer Lies in the Real World
Online goods and services represent about 20% of the global economy. Certainly AI will make online ads, social networks and gaming better and more lucrative, as well as streamlining white-collar work in professions including software engineering, customer support, marketing, law and medicine.
However, the other 80% of the global economy is outside that realm, out in the real world, in industries like energy, agriculture, manufacturing, construction, transportation and logistics—in other words, the acts of extracting, refining, growing, assembling, combining and shipping the atoms that warm us, shelter us, feed us and generally make life possible for human beings. These are our most essential human needs, not chatbots.
To justify the massive investment in overall AI spending, you have to believe AI can transform not just the 20% of the economy that is online but the 80% that is not. If we can unleash AI on that part of the economy—and assuming we can manage the transition in how humans work alongside machines (not a small matter)—then we have a real shot at a future that can increase the standard of living for humanity as a whole. That would be worth the trillions needed to bring AI into existence, not to mention the use of precious resources like power and water.
The problem is that AI is in many ways trapped inside the screen, deeply knowledgeable about concepts derived from the mountains of text on the internet, and yet woefully ignorant about the world outside the door of the data center, much less the factory floor, the farm, the construction site, the oil refinery and the cities in which we live. To unleash the power of AI on this massive swath of the economy, we must give AI knowledge of the world, skills to interact with it and embodied forms to manipulate it physically. It needs a brain adapted for the real world and a body to move through it.
LLMs and World Models Are Not Enough
This is the opportunity at hand. It’s why the AI industry is excited about what people are calling physical AI, world models and spatial intelligence. This is why Nvidia’s Jensen Huang is so excited about humanoid robots, calling them “the next multitrillion-dollar industry.” We can adapt AI to increase productivity and do real, meaningful work in the physical world—often tasks that are either undesirable or dangerous to humans.
LLMs alone aren’t enough to make this vision a reality. Models for physical AI (trained on video and other inputs to control robot movement), world models (which attempt to simulate how environments function and evolve, often generating synthetic 3D simulations of scenes) and spatial models (which capture and re-create the physical world) will all play a part in realizing this vision.
In the realm of physical AI, breakthroughs in simulation and transfer learning are bringing fluid movement to robots, enabling them to amaze us with new skills involving moving and manipulating real-world objects. World models help by making simulation training easier and more realistic, conjuring an infinite variety of synthetic training environments.
All those are necessary but not sufficient advances to bring AI fully into the real world.
Building the Large Geospatial Model
At Niantic Spatial, we are focused on the final missing piece, spatial intelligence. To reason, plan and act on problems involving the world, AIs must know it. But they lack the kind of intuitive spatial understanding that human hunter-gatherers naturally evolved. And the textual sources they train on do little to give them a coherent, accurate grasp of the physical world’s structure, shape, contents and topology.
For the past several years, we’ve been building a large geospatial model that acts as a living, breathing map of the world, one that is native to robots and AI.
Unlike what I’ve worked on previously, it is a map built not for people but for machines—to assist robots in navigation and task planning, and to help AIs complete tasks and answer questions that require grounding in real-world data. This map can help a robot figure out the safest path to take through an urban maze, transport supplies over rugged terrain to a remote destination, or move within a job site or factory complex to perform work at different locations. This kind of map can also help AI agents solve complex real-world problems like computing fire risk in suburban environments or optimizing a city to improve quality of life.
We are building on everything we have learned from building maps for people—Google Maps, Earth, Local, and Street View—but reimagining it in a world where AI understanding is the primary goal. Just as Google Maps became a key building block of Web 2.0, we seek to make the Niantic geospatial model a building block for the future of AI, working alongside physical AI models from companies like Physical Intelligence, Skild AI, and Flexion Robotics, and world models from companies such as World Labs, General Intuition, and Nvidia. These are all part of a burgeoning ecosystem that also includes robotics firms like Boston Dynamics, Agility Robotics, and Apptronik and a host of companies creating industry-specific mobile robots for manufacturing, agriculture, healthcare and other sectors.
Where We Go Next
It’s an exciting world, with many separate innovations pursued concurrently. Over the next two months, we’ll be launching new versions of our model that can reconstruct reality in a way humans can interact with and that allow machines to “see” and navigate with pinpoint accuracy. Future versions will add the semantics needed for deeper understanding, planning and problem solving.
AI truly has great potential. But it will be up to all of us to make sure we channel this massive investment into tech that will not merely entertain and distract us but truly help create a better reality. That’s something I think we can all get excited about.
-jh