Building Towards a Large Geospatial Model: Niantic Spatial at CVPR 2025

Artificial intelligence has made remarkable progress in language, images, and code. But it still struggles with something humans find intuitive: understanding the physical world.

At CVPR 2025 this week, Niantic Spatial is presenting two research papers that address this challenge. The papers focus on core problems in 3D perception and scene generation — how to infer depth from a handful of images, and how to reconstruct and stylize real-world spaces in 3D.

Both are early building blocks in our broader effort to create a Large Geospatial Model to fuse the physical and digital worlds.

Spatial understanding is critical for the next generation of AI-powered systems, from augmented reality to robotics. The two papers Niantic Spatial is presenting at CVPR take on different but related parts of this challenge: MVSAnywhere focuses on perception and depth, while Morpheus explores transformation and generative reconstruction.

MVSAnywhere: Zero-Shot Multi-View Stereo

‎

Multi-view stereo (MVS) – the task of estimating 3D structure from multiple images – is a foundational problem in computer vision. Most models in this space work well only in tightly defined domains: indoor environments, static objects, fixed depth ranges.

MVSAnywhere combines two types of depth estimation: single-image (monocular) cues and multi-view geometry. The system starts with a main image and compares it to several others taken from different angles. By analyzing how the scene shifts between images, it can infer depth – much like how our two eyes create depth perception. But unlike traditional systems, MVSAnywhere also draws on what it has learned from single images – such as typical room layouts or outdoor patterns – to improve its guesses.

All of this information is processed by a transformer-based neural network, which flexibly adapts to different numbers of images and a wide range of scene depths. This makes the model much more general and reliable: it works even when the number of input photos varies or when the depth of the scene is unknown in advance– a key capability for building a scalable, world-scale geospatial model.

The model outperforms prior methods on the Robust Multi-View Depth Benchmark, and produces 3D reconstructions with sharper edges, more consistent geometry – a useful trait for downstream applications like meshing, AR occlusion, or robotic navigation – as well as gaussian splats with higher geometric and visual quality

More importantly, it shows that it's possible to build one depth model that works anywhere – an essential step toward a single, scalable geospatial model of the world.

3DGS 3DGS w/ MVSAnywhere 3DGS w/ Metric3D

To read more, see the team’s GitHub page here.

Morpheus: Reconstructing the World with Your Own Take

While MVSAnywhere focuses on perception, Morpheus explores generation. Specifically, it investigates how to take real-world scans and not just reconstruct them but also transform them into stylized, view-consistent 3D scenes.

Morpheus is built on Gaussian Splatting, and uses a novel RGBD diffusion model – one that learns to generate both color and depth – to modify scenes at the level of both appearance and geometry.

The system allows users to apply text-driven prompts (e.g. “ancient ruins”, “ice castle”) to real-world environments, creating new versions that remain coherent across different viewpoints.

This opens the door to interactive 3D spaces that are not just mapped but reimagined, blending real-world structure with expressive visual design.

To read more, see the team’s GitHub page here.

A Path Toward Geospatial Intelligence

Together, these projects reflect a larger ambition: to make AI systems that are spatially aware – able to perceive, interpret, and understand the physical world. That means moving beyond maps-as-basemaps, and toward maps-as-models: representations of space, structure, and semantics.

While both MVSAnywhere and Morpheus are research contributions, their implications stretch into real-world applications: logistics, AR, entertainment, simulation, and the broader field of physical AI.

Reconstruct

Localize

Understand

Capture

Intelligent Logistics

Spatial Collaboration

Immersive Experiences