PlaceIt3D: Teaching AI to Put Things Where You Mean (ICCV 2025)

Date:9/30/2025
Author:Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Abdelrahman Eldesokey, Peter Wonka, Gabriel Brostow, Sara Vicente, Guillermo Garcia-Hernando
Category:RESEARCH

Imagine asking a robot to “put the chair between the sofa and the window, facing the table” or an AR system to “place this character where it can’t be seen from the doorway.” These instructions feel natural to us, but they require deep reasoning about objects, space, and user intent — challenges that remain unsolved for AI.

That’s why Niantic Spatial, together with researchers at KAUST, is introducing PlaceIt3D — a new benchmark, dataset, and baseline method designed to advance language-guided 3D object placement. PlaceIt3D tackles the task of taking a text instruction, a 3D environment, and an object, and determining how to position and orient that object correctly according to the instruction.



Why It Matters

Large Language Models (LLMs) are rapidly moving beyond text into multimodal domains like vision, audio, and now 3D. But while they excel at reasoning in 2D, extending them to 3D is harder. Even for a simple instruction like “put the chair between the sofa and the window, facing the table”, a model must:

  • Understand the 3D semantics of the scene — not just recognize objects like “sofa” or “window,” but also locate them accurately in a 3D environment. Unlike text or 2D images, this requires grounding language in geometry, depth, and spatial context.

  • Interpret spatial relationships — words like “between,” “behind,” or “facing” are intuitive for people but ambiguous in 3D space. The model needs to reason about regions, orientations, and multiple possible valid placements, then choose one that matches intent.

  • Reason about free space and geometry — an instruction only works if the placement is physically plausible. The model must account for object size, shape, and orientation while avoiding collisions, respecting scene layout, and still following the user’s instructions.

PlaceIt3D provides one of the first systematic ways to train and evaluate models on exactly this frontier — combining natural language, 3D perception, and physical reasoning into a unified task.

Real-World Impact: the now and the future

Bridging language and 3D placement unlocks natural collaboration with machines across industries. As larger 3D scenes and richer models become available, we envision a wide range of applications:

  • Robotics: interact with and direct robots using natural language, e.g. “place the chair between the sofa and the window.”

  • Augmented & Virtual Reality: create content for AR/VR experiences using intuitive language, e.g. “put this virtual object so it’s hidden from the doorway.”

  • Navigation & Assistance: receive suggestions about where to stand or move in 3D space, e.g. “where could I stand to get a clear view of the stage?”

  • Digital Twins & Simulation – Automating layouts and object placement in virtual spaces.

As robots, AR glasses, and digital assistants become part of daily life, the ability to follow natural instructions in 3D will be essential.

Looking Ahead

PlaceIt3D marks an early step toward generalist 3D LLMs — models that jointly understand language, 3D objects, and 3D space. Our baseline method, PlaceWizard, shows what’s possible today, but we invite the research community to build on this foundation and push the boundaries of 3D reasoning. With our new dataset and benchmarks, researchers can now train and evaluate models directly on this challenging task.

📢 This work will be presented in October at ICCV 2025 in Honolulu, Hawaii, one of the premier conferences in computer vision.

Explore the full paper and resources here: PlaceIt3D Project Page