Why AI has great vision but a terrible sense of direction

Date:6/23/2025
Author:Niantic Spatial
Category:TECHNOLOGY
Descriptive alt text

Why AI has great vision but a terrible sense of direction

This month, Niantic Spatial announced a partnership with Snap to build an AI map that will serve as the foundation for AR glasses and AI agents to understand, navigate and interact with the real world.

With generative vision models improving fast, people ask: do AR glasses and robots still need a map? Isn’t AI vision enough?

The answer from our perspective is a definitive no.

AI vision models are great at taking in an image and describing what they see in broad strokes (“a city street with shops and traffic lights”). They can pair that image with a GPS location.

But AR glasses and robots need much more than a generic description, and GPS location might be off by as much as a half-city block. This is the difference between geotypical understanding and geo-specific certainty.




Why VPS-powered geospatial maps fill the gap

Precision localisation
AR glasses and other devices have to know exactly where you are and where you’re looking – GPS and a compass aren’t close enough, and AI vision can’t infer pose to centimetre accuracy from a single frame.

Orientation and occlusion
To place virtual objects believably, the system must understand depth, surfaces and occluders in real time. It must also know the exact orientation of the user, so, for example, it knows that a subway entrance is behind you or slightly to the right. That comes from a 3D point-cloud / mesh, rather than an LLM's description of a scene.

Shared AR
Because VPS gives every device the same centimeter-accurate coordinate system, multiple users at the same spot see the very same virtual object anchored to the same exact location.

Scalable, fresh coverage
Building out the map through devices like phones – using apps like Scaniverse or devices like Photon – lets us reach places that cars with cameras can’t access, private enterprise spaces, and interiors that no public data sets cover. It also means we can build a shared, fresh map that is regularly updated.

So while AI vision can tell you roughly what you’re looking, a geo-specific map that works with a Visual Positioning System can tell devices and machines exactly where they are, what is around them, and help annotate the world at cm-scale.

Learn more about the Large Geospatial Model we’re building and Niantic VPS.

You Might Also Like

How Large Geospatial Models Help Businesses Predict, Plan, and Scale

Discover how LGMs transform enterprise operations—turning complex location data into predictive insights that optimize logistics, infrastructure, and spatial planning.

Visual Positioning System (VPS): The Future of Navigation

Discover how VPS technology works, its real-world applications across industries, and why it’s essential for cm-level navigation in indoor, urban, and GPS-denied environments.