Skip to content

whwtraffic12306/CVPR2023-Papers-with-Code

 
 

Repository files navigation

CVPR 2023 论文和开源项目合集(Papers with Code)

CVPR 2023 论文和开源项目合集(papers with code)!

25.78% = 2360 / 9155

CVPR2023 decisions are now available on OpenReview! This year, wereceived a record number of 9155 submissions (a 12% increase over CVPR2022), and accepted 2360 papers, for a 25.78% acceptance rate.

注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

【CVPR 2023 论文开源目录】

Backbone

Integrally Pre-Trained Transformer Pyramid Networks

Stitchable Neural Networks

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

BiFormer: Vision Transformer with Bi-Level Routing Attention

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Vision Transformer with Super Token Sampling

Hard Patches Mining for Masked Image Modeling

  • Paper: None
  • Code: None

CLIP

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

MAE

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Generic-to-Specific Distillation of Masked Autoencoders

GAN

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

NeRF

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis

Panoptic Lifting for 3D Scene Understanding with Neural Fields

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer

DETR

DETRs with Hybrid Matching

NAS

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Avatars

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

ReID(重识别)

Clothing-Change Feature Augmentation for Person Re-Identification

  • Paper: None
  • Code: None

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

Diffusion Models(扩散模型)

Video Probabilistic Diffusion Models in Projected Latent Space

Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models

Imagic: Text-Based Real Image Editing with Diffusion Models

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems

DiffRF: Rendering-guided 3D Radiance Field Diffusion

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption

DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Vision Transformer

Integrally Pre-Trained Transformer Pyramid Networks

Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

Learning Trajectory-Aware Transformer for Video Super-Resolution

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

BiFormer: Vision Transformer with Bi-Level Routing Attention

Vision Transformer with Super Token Sampling

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation

  • Paper: None
  • Code: None

视觉和语言(Vision-Language)

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Teaching Structured Vision&Language Concepts to Vision&Language Models

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

All in One: Exploring Unified Video-Language Pre-training

Position-guided Text Prompt for Vision Language Pre-training

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

Multi-Modal Representation Learning with Text-Driven Soft Masks

Learning to Name Classes for Vision and Language Models

目标检测(Object Detection)

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

DETRs with Hybrid Matching

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

目标跟踪(Object Tracking)

Simple Cues Lead to a Strong Multi-Object Tracker

语义分割(Semantic Segmentation)

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

医学图像分割(Medical Image Segmentation)

Label-Free Liver Tumor Segmentation

视频目标分割(Video Object Segmentation)

Two-shot Video Object Segmentation

参考图像分割(Referring Image Segmentation )

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

3D点云(3D-Point-Cloud)

Physical-World Optical Adversarial Attacks on 3D Face Recognition

3D目标检测(3D Object Detection)

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

3D Video Object Detection with Learnable Object-Centric Global Optimization

  • Paper: None
  • Code: None

3D语义分割(3D Semantic Segmentation)

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

3D语义场景补全(3D Semantic Scene Completion)

Low-level Vision

Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

Burstormer: Burst Image Restoration and Enhancement Transformer

超分辨率(Video Super-Resolution)

Super-Resolution Neural Operator

视频超分辨率

Learning Trajectory-Aware Transformer for Video Super-Resolution

图像生成(Image Generation)

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

视频生成(Video Generation)

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

视频理解(Video Understanding)

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

Frame Flexible Network

行为检测(Action Detection)

TriDet: Temporal Action Detection with Relative Boundary Modeling

文本检测(Text Detection)

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

知识蒸馏(Knowledge Distillation)

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Generic-to-Specific Distillation of Masked Autoencoders

模型剪枝(Model Pruning)

DepGraph: Towards Any Structural Pruning

图像压缩(Image Compression)

Context-Based Trit-Plane Coding for Progressive Image Compression

异常检测(Anomaly Detection)

Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images

三维重建(3D Reconstruction)

OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields

SparsePose: Sparse-View Camera Pose Regression and Refinement

NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

3D Cinemagraphy from a Single Image

Revisiting Rotation Averaging: Uncertainties and Robust Losses

FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction

深度估计(Depth Estimation)

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

轨迹预测(Trajectory Prediction)

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

图像描述(Image Captioning)

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

Cross-Domain Image Captioning with Discriminative Finetuning

视觉问答(Visual Question Answering)

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

手语识别(Sign Language Recognition)

Continuous Sign Language Recognition with Correlation Network

Paper: https://arxiv.org/abs/2303.03202

Code: https://github.com/hulianyuyy/CorrNet

视频预测(Video Prediction)

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

新视点合成(Novel View Synthesis)

3D Video Loops from Asynchronous Input

Zero-Shot Learning(零样本学习)

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

Semantic Prompt for Few-Shot Learning

  • Paper: None
  • Code: None

立体匹配(Stereo Matching)

Iterative Geometry Encoding Volume for Stereo Matching

Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation

场景图生成(Scene Graph Generation)

Prototype-based Embedding Network for Scene Graph Generation

隐式神经表示(Implicit Neural Representations)

Polynomial Implicit Neural Representations For Large Diverse Datasets

图像质量评价(Image Quality Assessment)

Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild

数据集(Datasets)

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

GeoNet: Benchmarking Unsupervised Adaptation across Geographies

CelebV-Text: A Large-Scale Facial Text-Video Dataset

其他(Others)

Interactive Segmentation as Gaussian Process Classification

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

SCOTCH and SODA: A Transformer Video Shadow Detection Framework

DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

Token Turing Machines

Single Image Backdoor Inversion via Robust Smoothed Classifiers

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness

Learning Neural Parametric Head Models

A Meta-Learning Approach to Predicting Performance and Data Requirements

MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision

Masked Images Are Counterfactual Samples for Robust Fine-tuning

HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling

Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

UniHCP: A Unified Model for Human-Centric Perceptions

CUDA: Convolution-based Unlearnable Datasets

Masked Images Are Counterfactual Samples for Robust Fine-tuning

AdaptiveMix: Robust Feature Representation via Shrinking Feature Space

Physical-World Optical Adversarial Attacks on 3D Face Recognition

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models

  • Paper: None
  • Code: None

Sharpness-Aware Gradient Matching for Domain Generalization

Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization

  • Paper: None
  • Code: None

Blind Video Deflickering by Neural Filtering with a Flawed Atlas

RiDDLE: Reversible and Diversified De-identification with Latent Encryptor

PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

Upcycling Models under Domain and Category Shift

Modality-Agnostic Debiasing for Single Domain Generalization

Progressive Open Space Expansion for Open-Set Model Attribution

Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies

GFPose: Learning 3D Human Pose Prior with Gradient Fields

PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment

Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

Boundary Unlearning

ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing

Zero-shot Model Diagnosis

GeoNet: Benchmarking Unsupervised Adaptation across Geographies

Quantum Multi-Model Fitting

DivClust: Controlling Diversity in Deep Clustering

Neural Volumetric Memory for Visual Locomotion Control

MonoHuman: Animatable Human Neural Field from Monocular Video

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification

HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering

On the Stability-Plasticity Dilemma of Class-Incremental Learning

About

CVPR 2023 论文和开源项目合集

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published