Ground Truth, Now from the Sky
Infrastructure teams, construction managers, and site planners have high demands of digital twins: not only do they need to look accurate, but they have to be capable of providing accurate measurements, and often cover large and complex areas.
In partnership with Spexi and its globally standardized drone imagery network, we’ve proven that Niantic Spatial’s industry-leading Gaussian splat 3D reconstruction can provide industrial-scale digital twins. Affordably and at scale.
Engineered for efficiency, built for scale
A pilot in Spexi's network takes their off-the-shelf sub-250g drone and follows a fully automated flight plan precision-engineered for splat processing: a disciplined grid pattern at 80 meters altitude above ground level, covering roughly 100,000 square meters (about 25 acres) in 20 minutes. Flying at 10 meters per second and collecting approximately 400 frames of nadir and oblique aerial imagery, the procedure is designed to give each area of the scene consistent, reliable coverage from multiple angles.
This is efficient capture by design. No low passes, no ground-level crews, no LiDAR. It’s a well-executed fully autonomous aerial survey that balances coverage, speed, and cost at a scale that would be impractical with more labor-intensive methods. The result is a focused, high-quality, and fully standardized dataset.
From that input alone, we produce a model that looks remarkably close to photorealistic from a wide range of viewpoints. Ten or so city blocks where you can read the street signs, count how many girders have been installed, and measure the height and width of a wall.
As Bill Lakeland, Spexi Geospatial’s CEO, puts it, “The fidelity of these reconstructions is unprecedented. By pairing our standardized capture network with Niantic Spatial’s technology, we’re delivering 3D models that aren't just photorealistic – they are geometrically grounded and packed with the granular detail needed for high-stakes infrastructure and site analysis."
Here’s how we do it. But first, a bit about splats.
The right tool, and what makes it work
If you're new to Gaussian splatting, our primer is a good place to start. To summarize: as opposed to meshes, which define 3D objects or scenes explicitly by their surfaces, splats represent them as millions of small semi-transparent volumes, each with a position, size, orientation, and color. Rendered together, they produce photorealistic output viewable from any angle.
For complex, fine-grained structures, splats have a natural advantage. A fence, for example, is difficult to represent faithfully as a mesh: the geometry is intricate, with thin lines that intermittently occlude what’s behind. In a splat, thicker fences render each bar faithfully; thinner ones, like chain-link, show semi-transparently – you can see through it to what's behind, at the right depth, from any angle. Such fidelity would be prohibitively difficult to automatically generate as a mesh.
Even more important is that the objects be rendered faithfully from any angle. A splat that looks right is not the same as a splat that is right. Most pipelines optimize purely for visual appearance: they find whatever arrangement of gaussian splats that match the training views, but tilt the angle a tiny bit and you often see disembodied blobs that look nothing like the flat wall you were just looking at. That’s not good enough for commercial applications.
What we bring to it
Everything Niantic Spatial has built for ground-level reconstruction applies directly here. The challenge of recovering precise camera positions from intermittent, wide-baseline, aerial imagery (variable lighting, limited texture, large distances between frames) follows the same processes we’ve developed for unstructured captures on consumer hardware. Published research including ACE, MicKey, and ACE-G represents the foundation we've built on to create a system that knows, with high precision, exactly where each camera was when it captured each frame.
The second layer is depth. Our pipeline applies geometric constraints derived from our depth estimation research, including MVSAnywhere, a prior understanding of where surfaces are in 3D space, baked into the splat before training begins. As engineering director Filipe Gaspar puts it, “There are probably millions of gaussian splat arrangements that can explain a scene from the training views, but not all of them will generalize well as a user navigates freely in the scene. We leverage our accurate camera poses and our depth estimation methods to produce gaussian splats that are both geometrically accurate and multiview consistent in appearance."
The power of constraint reflects a principle that’s (re)gaining currency in the age of generative AI. Unconstrained optimization finds solutions that look right but often crumble under scrutiny. Constraints get you to solutions that are right – consistent not just with the input data, but also with physics. The result is a reconstruction that is stable at edges, solid in low-texture areas (standard splat reconstruction struggles with white walls!), and geometrically consistent regardless of the viewpoint.
See it yourself
These are the frames the reconstruction was built from. Snapshots at 30 degrees from 80 meters up, at 12 megapixels, the same as an iPhone 12’s camera. Around 8 to 10 views per area of the scene. This is the full extent of the input.
Above is a sample of the full resolution of an image. And below is a video of the reconstructed Gaussian splat; no drone flew in this pattern. If you’re familiar with meshes, you’ll be impressed by the fidelity of the framework of the girders and their accurate transparency; if you’ve seen splats turn into nonsensical blobs at various angles, you’ll appreciate how the girders maintain their geometry at all angles. Their grounding in reality makes them reliable for measurement.
These details are not decorative. The geometry is real, which means the reconstruction supports measurement, remote inspection, change detection over time, and simulation use cases. All from an inexpensive 22-minute flight on a drone with a mid-grade camera.
Here’s another flyover showing the broader scene, again based on a reconstructed splat built entirely from still images.
What this is for
A model you can only view is a visualization. A model you can measure, compare against last quarter, and hand to a field team or simulation system is an operational asset.
For infrastructure inspection, that means assessing asset condition without sending someone to site. For construction monitoring, it means tracking progress and catching discrepancies across large areas as work develops. For site planning, it means spatial decisions grounded in geometry that reflects what's actually there. You can imagine how this could be useful for insurance, search and rescue, real estate sales, and beyond.
This is what geometric accuracy makes possible, and what we're now delivering at city scale.
Interested in what this pipeline could do for your operations? Talk to Spexi about their affordable drone-capture service, or get in touch with our reconstruction team to see how you can put our applied research to commercial benefit with your existing and future imagery capture.