Skip to main content

The Large Geospatial Model Powering Embodied AI at Scale

Date:2/19/2026
Author:Hugh Hayden
Category:
Featured News

Executive Summary

We are entering the era of embodied AI – the integration of artificial intelligence into physical systems, including robots, AR glasses, phones and drones, enabling them to understand and interact with the physical world at scale. To deploy millions and perhaps billions of these systems across complex environments and in coordination with one another a foundational layer of spatial intelligence is required. This is only possible with a shared, persistent map and understanding of the physical world, provided by the Niantic Spatial Large Geospatial Model (LGM).

Today, AI systems do not have a grounded and contextual understanding of the physical world. When machines perceive the environment at all, they do so in isolation. Each robot, drone, or AR device generally builds its own local map, localizes independently, and reasons about its surroundings as if no other machines exist. While sufficient for small-scale deployments or demonstrations, this approach fails as soon as scale, coordination, or long-term persistence are required.

We are building the LGM to enable embodied AI across any environment through a three-step process.   First, spatial capture occurs using all available sensors within an area of interest. These sensors may be ground-based (e.g. robots, AR glasses, 360 degree cameras, fixed cameras), aerial (e.g. drones, fixed wing), or space-based (e.g. satellites). Second, multisensor fusion and reconstruction generate a comprehensive 3D model of the environment, supporting visualization, localization, semantic understanding, and inference. Finally, machines operating in the environment query the LGM for localization and semantics to gain shared spatial awareness and context.

Existing approaches such as SLAM, world models, or isolated perception systems are important, but insufficient on their own. Niantic Spatial is building the spatial intelligence to provide persistent, shared understanding across machines and agents - the foundation required for embodied AI to exist.

Shared Spatial Context for Machines and AI

Embodied AI is the integration of artificial intelligence into physical systems, including robots and drones, enabling them to understand and interact with the physical world. Niantic Spatial is building the LGM which will allow machines to talk to a 3D map of the world. To enable this, the LGM can provide a shared coordinate system, providing common spatial context across machines, AI agents, and human operators. This will enable machines to reason not only about their own position and surroundings, but also about the positions, trajectories, and activities of other machines or agents operating nearby. This is how we are enabling embodied AI at scale.

This living 3D reconstruction of the world is based on terrestrial, aerial, and space-based data sources. Spatial data is fused at the edge and in the cloud to create a continuously updating 3D model. Fleets of machines will contribute data captured by their onboard sensors to maintain freshness and coverage, while simultaneously querying the LGM for localization and semantic understanding. Typical queries include: Where am I? How am I oriented? What am I observing? What is important or changing and how does this inform my next decision?

As machines move through an environment, their observations can be made available to others operating nearby. For instance, the Niantic Spatial LGM will enable a robot entering an area to inherit and build on the spatial understanding established by previous robots, drones, or human-operated devices. Sensor data captured by each device is processed through the reconstruction pipeline and fused into the shared 3D model, while the same device continuously queries the LGM for localization and semantics. Eventually, real time sensor fusion (RTSF) will stitch together fragmented observations from multiple sensors to provide a complete picture of an environment. This capture-and-query loop will repeat across all devices throughout an operation, inspection, or visit.

Capture and Processing

Capture

Once at scale, spatial data will eventually arrive continuously from thousands of devices: robots, drones, mobile phones, AI glasses, satellites, and fixed sensors. Each source contributes incremental updates as data is captured. Satellite passes, drone flights, robot patrols, or human movement through an area all improve model freshness, coverage, and resolution over time. This can be thought of as an RSS feed for geospatial data – continuous and shared across machines, AI agents, and human users. The output of this stage is a fused, semantically encoded 3D model of the world.  Today at Niantic Spatial, we ingest data from a variety of sensors as well as existing customer / partner data to build these 3D maps or digital twins.

Data Ingestion

The volume and variability of data are significant. Sensor inputs are noisy and subject to environmental effects. The ingestion pipeline must handle many sensor modalities, quality variation, partial observations, and large areas.  Niantic Spatial’s experience working with user generated data across millions of iOS, Android and other devices has provided an excellent basis for this processing pipeline.

Multisensor Fusion

Today, spatial data from AR devices, robots, drones, and satellites is rarely interoperable and is often siloed or not retained at all. The LGM works with diverse data sources and enables interoperability and sharing across systems for large-scale 3D reconstruction and spatial intelligence. Niantic Spatial is bringing on more and more data types and formats for ingestion and building towards this vision. Our work in the public sector is heavily focused on this type of unified data layer.

The Operational Backbone of Embodied AI

Operational constraints like connectivity limitations, adversarial interference, latency requirements, and data volume inform how processing is distributed across devices, edge infrastructure, and the cloud. The objective is to deliver fused spatial models with minimal latency for navigation and coordination use cases, while also supporting compute-intensive tasks such as change detection and large-area analysis. This hybrid processing model exists today in parts, but not at the scale required for embodied AI.

Machine Queries & LGM Services

Reconstruction / Visualization

The fused 3D model enables visualization of physical environments on a personal computer or head-mounted displays. Visualization supports training, planning, and area familiarization. The multisensor nature of the reconstruction enables large-scale, high-resolution environments using splats or meshes. Reconstruction inference through the Niantic Spatial LGM is working towards doing more with less data – able to generate realistic environments with incomplete data when the LGM is fully trained.

More and more, visualization functionality is needed throughout planning / preparation phases for familiarization, analytics and immersive rehearsals.  Visualization functionality is critical during execution for command and control of an operation in defense or enterprise contexts to understand location of resources in the 3D fused environment.

Localization

Machines continuously query the LGM for localization, enabling precise position and orientation within a shared coordinate system. This shared spatial context aligns machines, AI agents, and human operators. Localization services support high-frequency queries for use cases such as AR overlays, visual navigation, and operation in GPS-denied environments.

Localization functionality allows machines to locate themselves, but also understand the positioning of their counterparts in the environment. Our customers consistently share that the hyper precise understanding of location and pose provided by the LGM is needed in an industrial or defense context.  Localization using the Visual Positioning System (VPS) enables localization, navigation and coordination in GPS-denied environments as well, which is also increasingly required.

Semantics

Semantic services enable machines to understand their physical environment. The LGM recognizes objects, detects changes over time, and incorporates historical and contextual data about buildings, infrastructure, and terrain. The world is encoded semantically at the voxel level, making each pixel meaningful to the machine. As we continue to train and improve the LGM, this will enable analytics such as line-of-sight analysis, airflow or plume modeling, and terrain reasoning to be performed at the edge or in the cloud. Semantic understanding supports rapid AI decision-making, automated analysis, and human-in-the-loop operations.

Semantic understanding functionality is powerful during planning and preparation. Open queries or automated insights inform planning and assessments of the 3D reconstructed area. Semantic understanding becomes critical operationally in the physical environment by informing decisions in real time based on an intuition for the environment occupied by machines, AI agents and humans.

The LGM for Embodied and Spatially Aware AI at Scale

Single-device SLAM systems cannot provide the level of spatial awareness required for large-scale, coordinated operations. Likewise, traditional computer vision approaches that enabled early robotics and AR systems are insufficient for the future of embodied AI.

World foundation models (WFMs) are an important component of spatial intelligence, particularly for training and generalization. However, they do not provide the shared coordinate system, persistence, or real-time spatial context required for machines to operate together at scale.

At Niantic Spatial, we envision a future in which large language models, world foundation models, and the LGM operate together to enable shared spatial intelligence. By providing a foundational layer of real-world spatial data and services, the LGM enables AI agents and machines to operate autonomously, collaboratively, and persistently in the physical world. Over time, hundreds of thousands of sensors will contribute spatial data to the LGM, while a comparable number of machines and AI agents query it for spatial awareness, enabling embodied AI at global scale.