Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang
This repo includes all the 3D computer vision papers with Transformers which are presented in our paper, and we aim to frequently update the latest relevant papers.
- Object Classification
- 3D Object Detection
- 3D Segmentation
- 3D Point Cloud Completion
- 3D Pose Estimation
- Other Tasks
Point Transformer
Point Transformer
Attentional shapecontextnet for point cloud recognition
Modeling point clouds with self-attention and gumbel subset sampling
PCT: Point cloud transformer
PVT: Point-Voxel Transformer for Point Cloud Learning
Sewer defect detection from 3D point clouds using a transformer-based deep learning model
Adaptive Wavelet Transformer Network for 3D Shape Representation Learning
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
Dual Transformer for Point Cloud Analysis
CpT: Convolutional Point Transformer for 3D Point Cloud Processing
LFT-Net: Local Feature Transformer Network for Point Clouds Analysis
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning
Masked Autoencoders for Point Cloud Self-supervised Learning
Patchformer: A versatile 3d transformer based on patch attention
3crossnet: Cross-level cross-scale cross-attention network for point cloud representation
3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis
Centroid transformers: Learning to abstract with attention
Point cloud learning with transformer
3D object detection with pointformer
Voxel Transformer for 3D Object Detection
Improving 3D Object Detection with Channel-wise Transformer
Group-Free 3D Object Detection via Transformers
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
An End-to-End Transformer Model for 3D Object Detection
SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection
M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
Embracing Single Stride 3D Object Detector with Sparse Transformer
Fast Point Transformer
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds
ARM3D: Attention-based relation module for indoor 3D object detection
"Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving"
Attention-based Proposals Refinement for 3D Object Detection
MLCVNet: Multi-Level Context VoteNet for 3D Object Detection
LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention
SCANet: Spatial-channel attention network for 3d object detection
MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
PETR: Position Embedding Transformation for Multi-View 3D Object Detection
BoxeR: Box-Attention for 2D and 3D Transformers
Bridged Transformer for Vision and Point Cloud 3D Object Detection
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
Point Density-Aware Voxels for LiDAR 3D Object Detection
For part segmentation, check Object Classification
Fast Point Transformer
Spatial transformer point convolution
Stratified Transformer for 3D Point Cloud Segmentation
Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation
Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation
Point 4D transformer networks for spatio-temporal modeling in point cloud videos
Spatial-Temporal Transformer for 3D Point Cloud Sequences
Unetr: Transformers for 3d medical image segmentation
D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation
Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation
T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging
Transfuse: Fusing transformers and cnns for medical image segmentation
Convolution-free medical image segmentation using transformers
Spectr: Spectral transformer for hyperspectral pathology image segmentation
Transbts: Multimodal brain tumor segmentation using transformer
Medical image segmentation using squeezeand-expansion transformers
nnformer: Interleaved transformer for volumetric segmentation
Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation
After-unet: Axial fusion transformer unet for medical image segmentation
A volumetric transformer for accurate 3d tumor segmentation
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
Learning Local Displacements for Point Cloud Completion
PointAttN: You Only Need Attention for Point Cloud Completion
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
Point cloud completion on structured feature map with feedback network
PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion
AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
ShapeFormer: Transformer-based Shape Completion via Sparse Representation
A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion
MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching
3D Human Pose Estimation with Spatial and Temporal Transformers
CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Epipolar Transformer for Multi-view Human Pose Estimation
RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers
Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation
Efficient Virtual View Selection for 3D Hand Pose Estimation
End-to-End Human Pose and Mesh Reconstruction with Transformers
HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation
Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation
Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation
6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning
Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World
Zero-Shot Category-Level Object Pose Estimation
3d object tracking with transformer
Pttr: Relational 3d point cloud object tracking with transformer
History repeats itself: Human motion prediction via motion attention
Learning progressive joint propagation for human motion prediction
A spatio-temporal transformer for 3d human motion prediction
Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction
Pose transformers (potr): Human motion prediction with non-autoregressive transformer
Gimo: Gaze-informed human motion prediction in context
Multi-view 3d reconstruction with transformer
Thundr: Transformer-based 3d human reconstruction with marker
Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction
Deep closest point: Learning representations for point cloud registration
Robust point cloud registra tion framework based on deep graph matching
Regtr: End-to-end point cloud correspondences with transformer
If you find the listing or the survey useful for your work, please cite our paper:
@misc{lahoud20223d,
title={3D Vision with Transformers: A Survey},
author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
year={2022},
eprint={2208.04309},
archivePrefix={arXiv},
primaryClass={cs.CV}
}