From Scanning Coca-Cola Cans to Building an AI Map of the Real World

Ground Truth: An Interview with Niantic Spatial's Chief Scientist Victor Prisacariu
This post is part of our Ground Truth series exploring the future of AI and the physical world.
From scanning a Coca-Cola can with a webcam in 2008 to mapping millions of real-world locations today, Victor Prisacariu has been at the forefront of computer vision and spatial AI. As Chief Scientist for Niantic Spatial and newly appointed Professor of Artificial Intelligence at the University of Oxford, he bridges cutting-edge academic research with real-world applications.
In this Q&A, Victor reflects on the rapid evolution of spatial computing, the promise of Large Geospatial Models (LGMs), and how AI is transforming the way people and machines understand and interact with the physical world.
What excites you most about the future of spatial computing and its potential impact on how we experience and interact with the world around us?
While Niantic Spatial’s vision is hugely exciting, what personally excites me most about the future of spatial computing is the incredible progress we’re witnessing.
Back in 2008, when I first started in this field, I could scan and reconstruct a known object like a can of Coca-Cola using a webcam. Over time, this capability expanded from small objects to half-rooms, then entire buildings, and eventually to working seamlessly on mobile phones without specialized depth cameras. At Niantic Spatial, we’ve scaled this to millions of locations, adding precise geo-location to a system that 15 years ago would only have been able to track local movement. Now, we have positioning systems that allow you to know exactly where you are in the geo-aligned world.
I always stress that while reconstruction and localization are vital basic services, the truly exciting part is the layer of understanding built on top of them - the semantics. We've progressed from simple object recognition to open-set understanding, where I can now type a query and get a relevant answer about the environment. As we develop large models that are precisely positioned, detailed, and well-mapped, adding this layer of understanding will unlock many more interesting possibilities than ever before. The sheer progress is what truly excites me.
You hold roles as Chief Scientist at Niantic Spatial and now Professor of Artificial Intelligence at the University of Oxford. What unique advantages do you find in having a foot in both the cutting-edge research of academia and the practical application and product development of a company like Niantic Spatial?
The academic side keeps me informed about broader developments beyond the immediate focus of the company. Niantic Spatial's work, like our initial focus on the Visual Positioning System (VPS), tends to be very targeted. However, at the university, my research can be much more diverse, covering a wider breadth of topics. This helps me stay more informed about general advancements in the field.
Another advantage of the university is the ability to take more risks. The primary goal of company research is to improve the product, whereas university research aims to push science forward. While both contribute to scientific progress, the university's mission allows for greater experimentation and weird things that might be too risky for a company.
Can you share an example of how one role has directly informed or accelerated your work in the other?
While both worlds are complementary, I keep both worlds separate. However, multiple students from Oxford have undertaken internships at Niantic Spatial, resulting in co-authored publications that also formed part of their PhD theses. For example, one student applied their PhD research on Absolute Pose Regression to Niantic Spatial's system, resulting in a paper recognized as a Highlight at CVPR, an honor reserved for only the top few percent of submissions. These instances demonstrate how students can effectively apply things they learned in both environments, pushing scientific boundaries through academic output while gaining industry experience.
What does your new appointment as Professor of Artificial Intelligence at the University of Oxford signify for you personally and for the field of AI?
My new appointment as Professor of Artificial Intelligence doesn't signify a gigantic change in my mission. While much of my past work was rooted in 3D, I'd argue that since the early days, my focus has increasingly shifted towards Artificial Intelligence, particularly using neural networks for 3D. At the university, we're now expanding into areas like LLMs in 3D. So, while we retain a connection to 3D, we're definitely broadening our scope.
What specific areas of research are you excited about?
Currently, our key research areas are quite diverse. For example, we're working on measuring the accuracy of diffusion methods (2D and 3D generative methods), which is challenging because there's often no ground truth. We're also actively exploring LLMs in 3D. Additionally, we're researching how to generate consistent 3D data across multiple views, which has led to interesting interactions between Niantic Spatial and the university.
Niantic Spatial is pioneering the Large Geospatial Model (LGM). Can you explain, in layperson's terms, what an LGM is and why it's such a significant undertaking?
The best way to explain the Large Geospatial Model (LGM) is to relate it to a "world model,” but with a key distinction. Many other world models aim to learn how the world works in a generic sense. For example, they might predict actions based on an environment or generate a generic room from text. These generated environments, while potentially realistic, are not specific places that actually exist.
I view our LGM as a world model that neurally describes the world itself by capturing highly specific, local information. While it will eventually need to predict how the world works from incomplete data, its core function is to describe specific, real-world locations like my street, or the areas around our offices in San Francisco and London. It focuses on specific areas of the world, enabling capabilities like localization, reconstruction, and semantic understanding within those precise contexts. So, it's more of a "global local" model, geo-specific, and distinct from generic world models that describe how the world works.
What are the key applications and industries where you foresee the most impactful use cases for the Large Geospatial Model?
One of our core applications is localization. Two main client categories need precise localization in a virtual map: people and machines. For people, especially with emerging technologies like smart glasses, knowing exactly where you are and interacting with the virtual world is critical. For machines, like robots, our world model describes the world in a way they can interact with. It tells them, for example, "that's the door you need to go towards," rather than explaining how a door operates generically. Our technology helps robots understand and interact with specific elements in a given location. This relevancy is why companies like Snap want to work with us, as it's crucial for augmented reality experiences on phones or glasses to understand your precise location and what's around you. For robots, our technology helps them navigate and interact with specific objects or places in the real world.
About Victor Prisacariu
Victor Prisacariu is Chief Scientist at Niantic Spatial and Professor of Artificial Intelligence at the University of Oxford, where he leads pioneering research in computer vision, 3D reconstruction, and spatial AI. His work bridges cutting-edge academic research with large-scale real-world applications, advancing how people and machines understand and interact with the physical world.
Over the past 15+ years, Victor’s research has evolved from reconstructing simple objects to developing technologies that map and localize millions of real-world locations with centimeter-level precision. At Niantic Spatial, he has been instrumental in advancing the Visual Positioning System (VPS) and in shaping the company’s Large Geospatial Model (LGM), a neurally powered map of the real world.