Reason3D

Abstract

Recent advancements in multimodal large language models (LLMs) have shown their potential in various domains, especially concept reasoning. Despite these developments, applications in understanding 3D environments remain limited. They primarily offer textual or numerical outputs without the capability to generate dense, informative segmentation masks.

This paper introduces Reason3D, a novel LLM designed for comprehensive 3D understanding. Reason3D takes point cloud data and text prompts as input to produce textual responses and segmentation masks, facilitating advanced tasks like 3D reasoning segmentation, hierarchical searching, express referring, and question answering with detailed mask outputs.

Specifically, we propose a hierarchical mask decoder to locate small objects within expansive scenes. This decoder initially generates a coarse location estimate covering the object’s general area. This foundational estimation facilitates a detailed, coarse-to-fine segmentation strategy that significantly enhances the precision of object identification and segmentation.

Method: Reason3D

Initially, we utilize a point encoder to extract dense features from the input scene, simplified by a superpoint pooling layer to reduce complexity. An interactor merges superpoint features with a learnable query, input into a frozen LLM along with instructions to generate an output containing critical tokens, [LOC] and [SEG]. A hierarchical decoder then uses the [LOC] embedding to estimate a coarse location that likely covers the object. Finally, this estimated location integrates with the [SEG] embedding, enabling the prediction of the final segmentation masks.

@article{reason3d, title={Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model}, author={Kuan-Chih Huang and Xiangtai Li and Lu Qi and Shuicheng Yan and Ming-Hsuan Yang}, journal={arXiv}, year={2024} }

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

Abstract

Reason3D Data

Method: Reason3D

Visualization

BibTeX