CLIP for point cloud understanding
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model has added a new direction in the point cloud classification research domain. In this thesis, we propose two novel methods for CLIP-based point cloud classification. First, we propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps for CLIP based point cloud classification. In addition, we propose a novel viewpoint adapter that combines the view feature processed by each viewpoint as well as the global intertwined knowledge that exists across the multi-view features. Next, we propose a novel meta-episodic learning framework for CLIP-based point cloud classification. In addition, we introduce dynamic task sampling within the episode based on performance memory. The experimental results demonstrate the superior performance of the proposed model over existing state-of-the-art CLIP-based models on ModelNet10, ModelNet40, and ScanobjectNN datasets.