Skip to content

lahoud/3d-vision-transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

This repo supplements our 3D Vision with Transformers Survey

Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

This repo includes all the 3D computer vision papers with Transformers which are presented in our paper, and we aim to frequently update the latest relevant papers.

Content

Object Classification

Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning [PDF]

Masked Autoencoders for Point Cloud Self-supervised Learning [PDF][Code]

3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [PDF]

LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [Paper]

Sewer defect detection from 3D point clouds using a transformer-based deep learning model [PDF]

3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis [PDF][Code]

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [PDF][Code]

CpT: Convolutional Point Transformer for 3D Point Cloud Processing [PDF]

Patchformer: A versatile 3d transformer based on patch attention [PDF]

PVT: Point-Voxel Transformer for Point Cloud Learning [PDF][Code]

Adaptive Wavelet Transformer Network for 3D Shape Representation Learning [PDF]

Point cloud learning with transformer [PDF]

3crossnet: Cross-level cross-scale cross-attention network for point cloud representation [PDF]

Dual Transformer for Point Cloud Analysis [PDF]

Centroid transformers: Learning to abstract with attention [PDF]

PCT: Point cloud transformer [PDF][Code]

Point Transformer [PDF][Code]

Point Transformer [PDF][Code]

Modeling point clouds with self-attention and gumbel subset sampling [PDF]

Attentional shapecontextnet for point cloud recognition [PDF][Code]

3D Object Detection

Bridged Transformer for Vision and Point Cloud 3D Object Detection [PDF]

Multimodal Token Fusion for Vision Transformers [PDF] [Code]

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [PDF]

Focused Decoding Enables 3D Anatomical Detection by Transformers [PDF][Code]

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection [PDF][Code]

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [PDF][Code]

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [PDF][Code]

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [PDF][Code]

Point Density-Aware Voxels for LiDAR 3D Object Detection [PDF][Code]

PETR: Position Embedding Transformation for Multi-View 3D Object Detection [PDF][Code]

ARM3D: Attention-based relation module for indoor 3D object detection [PDF][Code]

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer [PDF][Code]

Attention-based Proposals Refinement for 3D Object Detection [PDF][Code]

Embracing Single Stride 3D Object Detector with Sparse Transformer [PDF][Code]

Fast Point Transformer [PDF][Code]

BoxeR: Box-Attention for 2D and 3D Transformers [PDF][Code]

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [PDF][Code]

An End-to-End Transformer Model for 3D Object Detection [PDF][Code]

Voxel Transformer for 3D Object Detection [PDF][Code]

Improving 3D Object Detection with Channel-wise Transformer [PDF][Code]

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [PDF][Code]

Group-Free 3D Object Detection via Transformers [PDF][Code]

SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [PDF][Code]

3D object detection with pointformer [PDF][Code]

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving [PDF]

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection [PDF][Code]

LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention [PDF][Code]

SCANet: Spatial-channel attention network for 3d object detection [Paper][Code]

3D Segmentation

For part segmentation, check Object Classification

Complete Scenes Segmentation

Stratified Transformer for 3D Point Cloud Segmentation [PDF][Code]

Multimodal Token Fusion for Vision Transformers [PDF] [Code]

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation [PDF]

Fast Point Transformer [PDF][Code]

Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [PDF]

Point Cloud Video Segmentation

Spatial-Temporal Transformer for 3D Point Cloud Sequences [PDF]

Point 4D transformer networks for spatio-temporal modeling in point cloud videos [PDF][Code]

Medical Imaging Segmentation

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images [PDF][Code]

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation [PDF]

A volumetric transformer for accurate 3d tumor segmentation [PDF][Code]

T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging [PDF]

After-unet: Axial fusion transformer unet for medical image segmentation [PDF]

Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation [PDF]

nnformer: Interleaved transformer for volumetric segmentation [PDF][Code]

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [PDF][Code]

Medical image segmentation using squeezeand-expansion transformers [PDF][Code]

Unetr: Transformers for 3d medical image segmentation [PDF][Code]

Transbts: Multimodal brain tumor segmentation using transformer [PDF][Code]

Spectr: Spectral transformer for hyperspectral pathology image segmentation [PDF][Code]

Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [PDF][Code]

Convolution-free medical image segmentation using transformers [PDF]

Transfuse: Fusing transformers and cnns for medical image segmentation [PDF][Code]

3D Point Cloud Completion

Learning Local Displacements for Point Cloud Completion [PDF][Code]

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [PDF][Code]

PointAttN: You Only Need Attention for Point Cloud Completion [PDF][Code]

Point cloud completion on structured feature map with feedback network [PDF]

ShapeFormer: Transformer-based Shape Completion via Sparse Representation [PDF][Code]

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion [PDF][Code]

MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching [PDF]

PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion [PDF][Code]

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [PDF][Code]

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [PDF][Code]

3D Pose Estimation

Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [PDF]

Zero-Shot Category-Level Object Pose Estimation [PDF][Code]

Efficient Virtual View Selection for 3D Hand Pose Estimation [PDF][Code]

Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [PDF][Code]

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation [PDF][Code]

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers [PDF]

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [PDF][Code]

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [PDF][Code]

6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning [PDF]

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [PDF][Code]

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation [PDF][Code]

3D Human Pose Estimation with Spatial and Temporal Transformers [PDF][Code]

End-to-End Human Pose and Mesh Reconstruction with Transformers [PDF][Code]

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation [PDF][Code]

HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [PDF]

Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [PDF]

Epipolar Transformer for Multi-view Human Pose Estimation [PDF][Code]

Other Tasks

3D Tracking

Pttr: Relational 3d point cloud object tracking with transformer [PDF][Code]

3d object tracking with transformer [PDF]

3D Motion Prediction

Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction [PDF]

Gimo: Gaze-informed human motion prediction in context [PDF][Code]

Pose transformers (potr): Human motion prediction with non-autoregressive transformer [PDF][Code]

Learning progressive joint propagation for human motion prediction [PDF]

History repeats itself: Human motion prediction via motion attention [PDF][Code]

A spatio-temporal transformer for 3d human motion prediction [PDF][Code]

3D Reconstruction

Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction [PDF]

Thundr: Transformer-based 3d human reconstruction with marker [PDF]

Multi-view 3d reconstruction with transformer [PDF]

Point Cloud Registration

Regtr: End-to-end point cloud correspondences with transformer [PDF][Code]

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [PDF][Code]

Robust point cloud registra tion framework based on deep graph matching [PDF][Code]

Deep closest point: Learning representations for point cloud registration [PDF][Code]

Citation

If you find the listing or the survey useful for your work, please cite our paper:

@misc{lahoud20223d,
      title={3D Vision with Transformers: A Survey}, 
      author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
      year={2022},
      eprint={2208.04309},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

A list of 3D computer vision papers with Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published