CLIP for point cloud understanding

Loading...
Thumbnail Image
Date
2023-08-01
Authors
Ghose, Shuvozit
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model has added a new direction in the point cloud classification research domain. In this thesis, we propose two novel methods for CLIP-based point cloud classification. First, we propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps for CLIP based point cloud classification. In addition, we propose a novel viewpoint adapter that combines the view feature processed by each viewpoint as well as the global intertwined knowledge that exists across the multi-view features. Next, we propose a novel meta-episodic learning framework for CLIP-based point cloud classification. In addition, we introduce dynamic task sampling within the episode based on performance memory. The experimental results demonstrate the superior performance of the proposed model over existing state-of-the-art CLIP-based models on ModelNet10, ModelNet40, and ScanobjectNN datasets.

Description
Keywords
CLIP, Point Cloud understanding, Meta Learning, Few shot classification, Contrastive Language-Image Pre-Training, Point Cloud to Image Translation
Citation