CLIP for point cloud understanding

Ghose, Shuvozit

CLIP for point cloud understanding

Files

Shuvozit_Ghose_M_Sc__Thesis.pdf(1.09 MB)

Date

2023-08-01

Authors

Ghose, Shuvozit

Abstract

Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model has added a new direction in the point cloud classification research domain. In this thesis, we propose two novel methods for CLIP-based point cloud classification. First, we propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps for CLIP based point cloud classification. In addition, we propose a novel viewpoint adapter that combines the view feature processed by each viewpoint as well as the global intertwined knowledge that exists across the multi-view features. Next, we propose a novel meta-episodic learning framework for CLIP-based point cloud classification. In addition, we introduce dynamic task sampling within the episode based on performance memory. The experimental results demonstrate the superior performance of the proposed model over existing state-of-the-art CLIP-based models on ModelNet10, ModelNet40, and ScanobjectNN datasets.

Keywords

CLIP, Point Cloud understanding, Meta Learning, Few shot classification, Contrastive Language-Image Pre-Training, Point Cloud to Image Translation

URI

http://hdl.handle.net/1993/37441

Collections

FGS - Electronic Theses and Practica

Full item page