Enabling World-Aware XR Through Advanced Geospatial Intelligence

Table of Contents

Abstract
Introduction
State of the XR Industry
Unpacking Geospatial Intelligence
Geospatial Intelligence on XR Headsets
Summary

Abstract

“Spatial AI is turbocharging AI’s value by fusing 3D context with physics,” Gartner notes, underscoring the essential role of spatial AI and geospatial services in making extended reality (XR) practical at scale. The XR industry is steadily advancing toward more practical applications. A key trend that makes XR more practical is the increasing "world-awareness" of the devices. Traditional virtual reality (VR) headsets are evolving into mixed reality (MR) devices that can perceive and interact with the physical world. AI glasses are emerging as a new category of wearable tech, using cameras to understand the world and guide the user, in both indoor and outdoor environments. This hardware evolution is supported by new software based on geospatial AI. At the forefront of this transformation is Niantic Spatial, whose Niantic Spatial SDK provides the geospatial foundation that enables developers to build contextually aware, location-based experiences across devices. By combining tools for 3D scanning, scene understanding, and precise geospatial positioning, Niantic Spatial makes it possible to create contextual, location-based experiences, anywhere. These tools can be used on various devices - from XR headsets to mobile phones and PCs - opening up new markets and enabling innovative solutions across a wider range of industries.

Introduction

Extended reality (XR) is an overarching term for all technologies that combine digital and physical realities. This includes virtual reality (VR), augmented reality (AR), mixed reality (MR) and AI smart glasses. These technologies create immersive and interactive experiences by combining virtual and real-world elements, visually, audibly or both.

Recent hardware advancements have made the real world a more central part of the XR experience. New generations of previously VR devices are now equipped with high-resolution cameras, transforming them into MR devices that allow users to see their surroundings. Likewise, AR glasses are becoming smaller, lighter, and more affordable, making them suitable for outdoor use, for both businesses and consumers. The rapid advancement of AI technologies resulted in a rise of a new category of smart glasses, which rely on audio and AI services to provide contextual and geospatial information.

Because of these changes, XR experiences are increasingly dependent on understanding the real world. In MR and AR, the app must recognize the geometry of the physical environment and the objects within it. Similarly, AI glasses, which are used mostly outdoors, need precise location data to offer users contextual information and intuitive guidance. Niantic Spatial’s Visual Positioning System (VPS) plays a key role here—providing accurate localization and orientation even in GPS-denied or compromised environments, enabling spatially consistent experiences across diverse conditions.

The Niantic Spatial SDK builds on this foundation by providing the “world intelligence” layer often missing from XR platforms. The SDK provides the ability to capture and reconstruct the world, from small objects to large areas, to create 3D models or Gaussian splats, or to map locations for localization. Once scanned, these locations can be saved to a public or private map, allowing users to be seamlessly localized within them, extracting an accurate 6DOF pose. To enable interaction with the real world's 3D geometry, the SDK provides data like depth maps, occlusions, and real-time meshing. To understand the world, developers can use object detection and semantic segmentation. Combination of these spatial layers enables novel and insightful ways to contextualize and visualize data in the real world or on a digital twin. These powerful tools have been deployed on mobile devices and head-mounted displays over the past three years, with developers testing them for a wide range of applications.

State of the XR Industry: A Leap from Novelty to Practicality with Geospatial Intelligence

The Extended Reality (XR) industry is undergoing a significant transformation, moving beyond its initial phase of novelty to establish itself as a practical and indispensable technology. XR devices have gone through an evolution in the last few years: Virtual Reality (VR) headsets are developing into Mixed Reality (MR) systems, gaining the ability to sense and interact with the physical world around them, smart glasses are increasingly leveraging AI to emerge as a distinct and powerful category of wearable technology, and AR headsets are becoming more suitable for consumer and enterprise uses, indoors and outdoors. This pivotal shift is driven by evolution of hardware and software technologies, but also by integration with spatial intelligence, which is essential in enabling devices to become truly “world-aware”. A recent and profound transformation in this domain has been the move from traditional computer vision algorithms to sophisticated neural networks, further enhanced by the capabilities of generative AI.

The fundamental challenge for machines to comprehend and operate within our complex world necessitates the deployment of multiple layers of intelligence. Over the past few decades and exponentially in recent years, these layers have evolved from models that understand language, to those that understand the world. Large Language Models (LLMs), trained on a vast corpora of text, revolutionized human-computer interaction by enabling systems to generate and reason over language. Building upon this, World Models (WFMs) extend understanding into dynamic, physics-informed 3D environments, allowing machines to simulate and predict real-world behaviors. The next frontier is represented by Large Geospatial Models (LGMs), which operate at planetary scale using scans, splats, drone imagery, and GIS data to provide geospatial context and predictive spatial mapping. These models enable systems to anchor intelligence to specific places in the physical world.

The combination of all these new capabilities, allows machines to achieve a much deeper understanding of the user's environment, contextualize questions more effectively, and proactively offer insightful suggestions, implications, and predictions. This marks a crucial advancement from merely responding to explicit queries; it represents the core of what is known as spatial intelligence.

Spatial intelligence is already profoundly reshaping various business operations. In logistics, it optimizes routes, manages inventory with greater precision, and improves overall workflow efficiency. In manufacturing, it enables the design and simulation of projects in digital twin environments, allowing for virtual testing before physical construction, and facilitating the real-time monitoring and management of existing facilities with rich 3D information. In field services, it empowers workers with real-time location-based information, helps detect potential problems and mitigate them swiftly. Furthermore, spatial intelligence is a critical enabler for outdoor robotics, whether airborne drones or land-based robots, allowing them to navigate and interact with complex environments, even in GPS-denied or compromised environments through technologies like VPS that, as Gartner notes, “provide precise localization when GNSS and wireless signals are limited or unavailable for spatial or physical AI use cases.”

Unpacking Geospatial Intelligence: Making sense of the World

Geospatial intelligence is the cornerstone of enabling machines to perceive, understand, and infer information about the physical world, even when presented with incomplete data. This sophisticated capability is powered by large geospatial models (LGMs). Unlike Large Language Models (LLMs) that process text from the internet, LGMs are rigorously trained on vast datasets of real-world imagery, each datum intrinsically linked to its geographical coordinates. This training regimen equips machines with a profound contextual understanding of space and structures.

A critical component of spatial intelligence involves two distinct types of geospatial world models: Geotypical and Geospecific.

Geotypical models focus on understanding the inherent behavior of objects within the world. This includes comprehending material properties, the dynamics of interactions between objects, and the principles of cause and effect. They provide a general framework for how things operate.
Geospecific models, on the other hand, are much closer to traditional maps. They capture the exact state of the world at a particular moment in time, describing it in rich 3D detail. These models not only depict objects but also understand their relationships and allow both humans and machines to effectively use and interact with the map. They are meticulously constructed from diverse data sources, including satellites, drones, human input, and specialized data collection devices. Depending on the task and requirements, machines and robots operating in the real world may utilize either one or both of these model types to perform tasks and assist humans.

Niantic Spatial is actively developing and implementing a comprehensive geospecific model for both indoor and outdoor environments. The sophisticated processing of this data yields two crucial outcomes: visually rich output that is readily understandable by humans, and the activation of places for precise machine localization, navigation, and world understanding. To ensure the accuracy and relevance of this continually evolving map, it is constantly refreshed with new information sourced from a diverse array of inputs, including satellites, drones, human contributions, robots, and other dedicated data collection mechanisms.

These advanced geospatial models and the new generation of maps serve as the foundation for developing a suite of spatial services that enable immersive real-world XR experiences:

Geofencing and Spatial Boundaries: This involves defining virtual perimeters within real-world geographic areas. When a device (and its user) enters, exits, or moves within a geofenced zone, the system can automatically trigger Augmented Reality (AR) content, send notifications, or initiate specific actions. This allows for the restriction of features or content to designated locations, or the adaptation of experiences based on the user's precise position, offering contextually relevant information or interactions tied to the physical environment. For example:
- AR Scavenger Hunts: Geofences conceal clues or virtual items that are only discoverable within specific real-world locations.
- Location-Based Games: Popular XR games often restrict or enable in-game actions only when players are within defined geofenced real-world areas.
- Room-Scale VR: Spatial boundaries, established during initial setup, ensure users remain within a safe play area, preventing collisions with physical obstacles.

Spatial Anchoring: This powerful technique binds digital content to a stable, real-world coordinate frame or a universal map. The ideal public scenario involves a universal and sharable map, ensuring that digital content appears in the exact same physical place across multiple sessions and different devices. Professional users can scan and activate their own locations and manage the access permissions. For example:
- Planning and Simulation in Industrial Settings: Proposed layouts and simulations are anchored to the real environment, allowing for the validation of placements and optimization of configurations before any physical changes are committed.
- Persistent Geocached Content: Digital items are stored at real-world locations using persistent anchors, allowing users to discover and revisit them over time, creating a sense of digital permanence.
- Site Navigation and Operations Management: Visual navigation cues and spatial annotations are anchored within large facilities, facilitating coordinated shift handoffs and real-time status tracking within a live digital twin.
- Retail/Entertainment Activations: Interactive brand or game content is precisely placed at partner locations using persistent anchors, driving repeat engagement and increasing foot traffic.

Navigation without GPS: There are frequent scenarios where satellite communication is either unavailable (e.g., indoors, dense urban canyons) or practically irrelevant due to insufficient accuracy. In these cases, VPS provides a critical alternative, using computer vision to localize the user relative to visual features in the environment. Localization relies on referencing previous scans of the area captured through various tools and leveraging other available map data, ensuring continuous and accurate positioning.
Path Planning and Wayfinding Services: Path planning is the intricate process of determining the optimal route for a user, taking into account detailed map data, a comprehensive understanding of the user's immediate environment, and the specified target destination. Wayfinding, on the other hand, focuses on creating intuitive and natural guidance for the user to reach that destination, dynamically adapting to real-time changes and conditions within the user's environment.

Geospatial Intelligence on XR Headsets: Fueling Growth and Innovation

The XR market, though still emerging, is experiencing significant growth and attracting substantial investment from both public and private sectors. While the underlying technologies for XR have been in development for some time, it is only recently that it has reached a sufficient level of maturity and quality to support the creation of suitable and valuable devices. Concurrently, a vast array of new use cases for XR technologies have been identified, making it increasingly clear that a single "one-size-fits-all" device cannot adequately address every need. Instead, a diverse range of XR products is required to achieve optimal product-market fit across different applications. The ideal product for each specific use case will carefully balance specifications, price point, ergonomics, and numerous other factors, ultimately aiming to dismantle common adoption barriers for end customers.

Beyond the device itself, which must seamlessly integrate into users' existing workflows, the delivered functionality needs to be precisely tailored to their specific requirements. For those use cases where an XR device is deemed necessary, spatial intelligence will serve as an instrumental and indispensable component of the overall experience.

Niantic Spatial is strategically focused on providing advanced Geospatial Services to industrial sectors. We are dedicated to developing the foundational underlying technologies and then meticulously tailoring these solutions to meet the precise and often unique needs of each customer. Niantic Spatial core service offerings, for mobile devices and XR headsets, include:

Capture: This comprehensive service captures objects, rooms, facilities, and even expansive outdoor sites as high-fidelity 3D assets, which can be rendered as meshes or Gaussian splats. These assets are then meticulously prepared for precise localization and seamless AR integration.
- On-Demand Scanning (Turnkey Service): Niantic Spatial manages the entire on-site capture process, utilizing mobile devices, Photon technology, or drones. We then expertly process the collected data and deliver production-ready spatial datasets specifically tailored to the client's site and use case.
- Scanning SDK: This offers on-device reconstruction modules that are fully integrated with VPS. Developers can leverage these modules to scan, reconstruct, and localize directly from their own applications, forming an integral part of the broader Niantic Spatial SDK stack.
- Scaniverse for Enterprise: This provides dedicated enterprise support for Scaniverse capture, enabling fast interactions with mapping and activation pipelines, all seamlessly coordinated with Niantic's platform services.

Reconstruct: This service transforms diverse input data from drones, robots, handheld devices, or third-party captures into accurate, geo-referenced 3D digital twins of objects, buildings, or large areas. Raw sensor data is cleaned, filtered, normalized, and aligned (merging multi-pass scans) to ensure centimeter-level spatial accuracy. Once aligned, the data is geo-referenced, anchoring the reconstruction to real-world coordinate systems, making it interoperable with GIS platforms, robotics, simulation, or autonomy systems.
- On-Device (on iOS or Android): Provides real-time, offline generation of textured meshes or Gaussian splats (ideal for field capture, edge computing, or AR experiences without cloud dependency).
- Cloud-Based: High-fidelity, geo-referenced models (meshes or splats enriched by semantic understanding) are produced for large-scale mapping, multi-user projects, or enterprise digital twins.

Localize: This crucial service precisely determines a device's georeferenced pose (its exact position and orientation) within the real world. This ensures that XR content and guidance align with unparalleled accuracy, whether indoors or outdoors. The service supports multiple localization modes and outputs globally referenced coordinates (e.g., ECEF/WGS84) for effortless integration with existing maps, routing systems, and simulation platforms. This is a universal localization service that ingeniously employs multiple modalities under the hood, guaranteeing the user the best possible result based on available map sources and additional sensor information.
Understand: This advanced service discerns both spatial and semantic context from scans, meshes, and splats. This empowers people and AI agents to query the world using open-ended questions and receive structured, georeferenced answers. The underlying technology performs sophisticated object detection and segmentation on the map, constructs detailed scene graphs that meticulously label and structure environments for efficient search and automation. In essence, this service creates an entirely new paradigm of map that serves both human users and AI agents across any type of device.

These services are offered as part of Niantic Spatial SDK, which is developed for Unity, native Android and iOS mobile devices, Quest 3 and Magic Leap 2. Support for more XR devices is planned to be added in the future.

Summary

The Extended Reality (XR) industry is rapidly transitioning from a phase of novelty to one of practicality, driven by hardware advancements that give devices greater world-awareness. This shift necessitates a new layer of software intelligence, with geospatial services and Spatial AI serving as the foundational enablers.

Niantic Spatial provides this essential geospatial foundation through the Niantic Spatial SDK. This platform is crucial for building contextually aware, location-based experiences across a wide range of devices. The core offering is the Visual Positioning System (VPS), which delivers precise localization and orientation, overcoming the limitations of GPS in dense urban or indoor environments.

The Niantic Spatial SDK introduces a "world intelligence" layer, enabling three core capabilities:

Reconstruct: Creating high-fidelity 3D assets (meshes or Gaussian splats) of objects, rooms, and expansive sites for precise localization and AR integration.
Localize: Determining a device's accurate georeferenced pose (position and orientation) anywhere, ensuring XR content aligns with the physical world.
Understand: Utilizing advanced semantic segmentation and object detection on 3D maps to enable AI agents and users to query and interact with the world contextually, creating a new paradigm of map for both human and machine use.

This geospatial intelligence is powered by Large Geospatial Models (LGMs), which operate at a planetary scale, integrating data from satellites, drones, and user input to create both Geotypical (understanding object behavior) and Geospecific (capturing the precise 3D state of the world) models.

By offering these advanced geospatial services, Niantic Spatial is strategically focused on enterprises and the public sector, aiming to dismantle adoption barriers and fuel significant growth across logistics, manufacturing, field services, and defense. The company's comprehensive SDK supports Unity, native mobile devices, and major XR platforms like Quest 3 and Magic Leap 2, ensuring that spatially aware experiences can be developed and deployed across the increasingly diverse landscape of XR hardware.

Reconstruct

Localize

Understand

Capture