Name	Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md	README.md

3D Vision with Transformers: A Survey

Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

This repo includes all the 3D computer vision papers with Transformers which are presented in our paper, and we aim to frequently update the latest relevant papers.

Object Classification

Point Transformer

Attentional shapecontextnet for point cloud recognition

Modeling point clouds with self-attention and gumbel subset sampling

PCT: Point cloud transformer

PVT: Point-Voxel Transformer for Point Cloud Learning

Sewer defect detection from 3D point clouds using a transformer-based deep learning model

Adaptive Wavelet Transformer Network for 3D Shape Representation Learning

3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification

Dual Transformer for Point Cloud Analysis

CpT: Convolutional Point Transformer for 3D Point Cloud Processing

LFT-Net: Local Feature Transformer Network for Point Clouds Analysis

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning

Masked Autoencoders for Point Cloud Self-supervised Learning

Patchformer: A versatile 3d transformer based on patch attention

3crossnet: Cross-level cross-scale cross-attention network for point cloud representation

3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis

Centroid transformers: Learning to abstract with attention

Point cloud learning with transformer

3D Object Detection

3D object detection with pointformer

Voxel Transformer for 3D Object Detection

Improving 3D Object Detection with Channel-wise Transformer

Group-Free 3D Object Detection via Transformers

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

An End-to-End Transformer Model for 3D Object Detection

SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

Embracing Single Stride 3D Object Detector with Sparse Transformer

Fast Point Transformer

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

ARM3D: Attention-based relation module for indoor 3D object detection

"Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving"

Attention-based Proposals Refinement for 3D Object Detection

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention

SCANet: Spatial-channel attention network for 3d object detection

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

BoxeR: Box-Attention for 2D and 3D Transformers

Bridged Transformer for Vision and Point Cloud 3D Object Detection

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

Point Density-Aware Voxels for LiDAR 3D Object Detection

3D Segmentation

For part segmentation, check Object Classification

Complete Scenes Segmentation

Fast Point Transformer

Spatial transformer point convolution

Stratified Transformer for 3D Point Cloud Segmentation

Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation

Point Cloud Video Segmentation

Point 4D transformer networks for spatio-temporal modeling in point cloud videos

Spatial-Temporal Transformer for 3D Point Cloud Sequences

Medical Imaging Segmentation

Unetr: Transformers for 3d medical image segmentation

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation

Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation

T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging

Transfuse: Fusing transformers and cnns for medical image segmentation

Convolution-free medical image segmentation using transformers

Spectr: Spectral transformer for hyperspectral pathology image segmentation

Transbts: Multimodal brain tumor segmentation using transformer

Medical image segmentation using squeezeand-expansion transformers

nnformer: Interleaved transformer for volumetric segmentation

Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation

After-unet: Axial fusion transformer unet for medical image segmentation

A volumetric transformer for accurate 3d tumor segmentation

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images

3D Point Cloud Completion

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

Learning Local Displacements for Point Cloud Completion

PointAttN: You Only Need Attention for Point Cloud Completion

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Point cloud completion on structured feature map with feedback network

PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation

ShapeFormer: Transformer-based Shape Completion via Sparse Representation

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion

MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching

3D Pose Estimation

3D Human Pose Estimation with Spatial and Temporal Transformers

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Epipolar Transformer for Multi-view Human Pose Estimation

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers

Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation

Efficient Virtual View Selection for 3D Hand Pose Estimation

End-to-End Human Pose and Mesh Reconstruction with Transformers

HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation

Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation

6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning

Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World

Zero-Shot Category-Level Object Pose Estimation

Other Tasks

3D Tracking

3d object tracking with transformer

Pttr: Relational 3d point cloud object tracking with transformer

3D Motion Prediction

History repeats itself: Human motion prediction via motion attention

Learning progressive joint propagation for human motion prediction

A spatio-temporal transformer for 3d human motion prediction

Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction

Pose transformers (potr): Human motion prediction with non-autoregressive transformer

Gimo: Gaze-informed human motion prediction in context

3D Reconstruction

Multi-view 3d reconstruction with transformer

Thundr: Transformer-based 3d human reconstruction with marker

Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction

Point Cloud Registration

Deep closest point: Learning representations for point cloud registration

Robust point cloud registra tion framework based on deep graph matching

Regtr: End-to-end point cloud correspondences with transformer

Citation

If you find the listing or the survey useful for your work, please cite our paper:

@misc{lahoud20223d,
      title={3D Vision with Transformers: A Survey}, 
      author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
      year={2022},
      eprint={2208.04309},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D Vision with Transformers: A Survey

Content

Object Classification

3D Object Detection

3D Segmentation

Complete Scenes Segmentation

Point Cloud Video Segmentation

Medical Imaging Segmentation

3D Point Cloud Completion

3D Pose Estimation

Other Tasks

3D Tracking

3D Motion Prediction

3D Reconstruction

Point Cloud Registration

Citation

About

Releases

Packages

lahoud/3d-vision-transformers

Folders and files

Latest commit

History

Repository files navigation

3D Vision with Transformers: A Survey

Content

Object Classification

3D Object Detection

3D Segmentation

Complete Scenes Segmentation

Point Cloud Video Segmentation

Medical Imaging Segmentation

3D Point Cloud Completion

3D Pose Estimation

Other Tasks

3D Tracking

3D Motion Prediction

3D Reconstruction

Point Cloud Registration

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages