SCOUT enables efficient open-world interactive object search by leveraging the 3D scene graph representation and relational semantics distilled from LLM knowledge as search heuristics, achieving LLM-level semantic reasoning while remaining computationally efficient for real-world robotic deployment.

Open-world interactive object search in household environments requires understanding semantic relationships between objects and their surrounding context to guide exploration efficiently. Prior methods either rely on vision-language embeddings similarity, which does not reliably capture task-relevant relational semantics, or large language models (LLMs), which are too slow and costly for real-time deployment. We introduce SCOUT: Scene Graph-Based Exploration with Learned Utility for Open-World Interactive Object Search, a novel method that searches directly over 3D scene graphs by assigning utility scores to rooms, frontiers, and objects using relational exploration heuristics such as room-object containment and object-object co-occurrence. To make this practical without sacrificing open-vocabulary generalization, we propose an offline procedural distillation framework that extracts structured relational knowledge from LLMs into lightweight models for on-robot inference. Furthermore, we present SymSearch, a scalable symbolic benchmark for evaluating semantic reasoning in interactive object search tasks. Extensive evaluations across symbolic and simulation environments show that SCOUT outperforms embedding similarity-based methods and matches LLM-level performance while remaining computationally efficient. Finally, real-world experiments demonstrate effective transfer to physical environments, enabling open-world interactive object search under realistic sensing and navigation constraints.

Overview

Overview of our approach
SCOUT: Scene Graph-Based Exploration with Learned Utility for Open-World Interactive Object Search. SCOUT procedurally distills structured, relational semantic knowledge between scene elements from large language models into lightweight models (1–2). During exploration, the agent assigns utility scores to scene graph nodes based on exploration heuristics previously learned (3) and grounds high-level actions through low-level navigation and manipulation policies (4). SymSearch, our symbolic benchmark (5), enables scalable evaluation of relational semantic reasoning over scene graphs with no simulation overhead.

Technical Approach

Full pipeline of SCOUT illustrated. From left to right: 3DSG is constructed online from raw observations. Scene graph nodes are scored based on their utility in finding the query. We define utility as a measure of how informative a scene-graph node is for localizing the queried object which we approximate with object-object co-occurence and room-object containment probabilities. Among actionable nodes, i.e. nodes that have exploration affordances, we select the closest high scoring node. Once the node to explore is selected, we map its affordances to the associated low-level policies.

Code

This work is released under CC BY-NC-SA license. A software implementation of this project can be found on GitHub.

Publications

If you find our work useful, please consider citing our paper:

Imen Mahdi, Matteo Cassinelli, Fabien Despinoy, Tim Welschehold, and Abhinav Valada
Relational Semantic Reasoning on 3D Scene Graphs for Open World Interactive Object Search
Under review, 2026.

(PDF) (BibTeX)

Authors

Imen Mahdi

Imen Mahdi

University of Freiburg

Matteo Cassinelli

Matteo Cassinelli

Toyota Motor Europe

Fabien Despinoy

Fabien Despinoy

Toyota Motor Europe

Tim Welschehold

Tim Welschehold

University of Freiburg

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by Toyota Motor Europe.