KinScene: Model-Based Mobile Manipulation of Articulated Scenes

Cheng-Chun Hsu    Ben Abbatematteo    Zhenyu Jiang   
Yuke Zhu    Roberto Martín-Martín    Joydeep Biswas   

The University of Texas at Austin   

Sequentially interacting with articulated objects is crucial for a mobile manipulator to operate effectively in everyday environments. To enable long-horizon tasks involving articulated objects, this study explores building scene-level articulation models for indoor scenes through autonomous exploration. While previous research has studied mobile manipulation with articulated objects by considering object kinematic constraints, it primarily focuses on individual-object scenarios, lacking extension to a scene-level context for task-level planning. To manipulate multiple object parts sequentially, the robot needs to reason about the resultant motion of each part and anticipate its impact on future actions. We introduce KinScene, a full-stack approach for long-horizon manipulation tasks with articulated objects. The robot maps the scene, detects and physically interacts with articulated objects, collects observations, and infers the articulation properties. For sequential tasks, the robot plans a feasible series of object interactions based the inferred articulation model. We demonstrate that our approach repeatably constructs accurate scene-level kinematic and geometric models, enabling long-horizon mobile manipulation in a real-world scene.



Problem Definition

Our goal is to change the kinematic state of the scene to a desired configuration. We assume this configuration is generated by a higher-level task planner and it is needed to perform a long-horizon task. For example, to unload the dishwasher the robot needs to open the door of the dishwasher before picking the clean dishes, and open the cabinet doors before putting away the dishes. Our goal will be to achieve these states, i,e., planning and executing a sequence of single joint interactions to bring the environment to the desired configuration, taking into account the constraints introduced by the scene-level articulation.



Framework

Our approach involves three stages: a mapping stage (left), where the robot conducts a 3D scan and detects handles; an articulation discovery stage (middle), where the robot navigates, interacts, collects observations, and estimates the scene-level articulation model; and a scene-level manipulation stage (right), where the robot plans tasks using the scene-level articulation model and executes long-horizon actions through the articulation planner.




Real World Experiment

We evaluate our approach in a real-world kitchen that features diverse everyday articulated objects of varying sizes and positions. The experiments demonstrate that our robot can accurately infer articulation properties through exploration. Leveraging the scene-level articulation model, our robot achieves a higher success in planning and manipulating the articulated objects by reasoning at a scene scale.

Articulation Discovery Stage

Scene-level Manipulation