[1112.6209] Building high-level features using large scale unsupervised learning
[1207.0580] Improving neural networks by preventing co-adaptation of feature detectors
[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.6114] Auto-Encoding Variational Bayes
[1401.4082] Stochastic Backpropagation and Approximate Inference in Deep Generative Models
[1403.6652] DeepWalk: Online Learning of Social Representations
[1404.7828] Deep Learning in Neural Networks: An Overview
[1406.2661] Generative Adversarial Networks
[1406.5298] Semi-Supervised Learning with Deep Generative Models
[1406.6247] Recurrent Models of Visual Attention
[1411.1784] Conditional Generative Adversarial Nets
[1411.1792] How transferable are features in deep neural networks?
[1412.3555] Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
[1412.6572] Explaining and Harnessing Adversarial Examples
[1412.6980] Adam: A Method for Stochastic Optimization
[1502.05477] Trust Region Policy Optimization
[1503.02531] Distilling the Knowledge in a Neural Network
[1503.03578] LINE: Large-scale Information Network Embedding
[1504.00702] End-to-End Training of Deep Visuomotor Policies
[1505.05424] Weight Uncertainty in Neural Networks
[1505.05770] Variational Inference with Normalizing Flows
[1505.07818] Domain-Adversarial Training of Neural Networks
[1506.02142] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
[1506.02438] High-Dimensional Continuous Control Using Generalized Advantage Estimation
[1507.06228] Training Very Deep Networks
[1509.02971] Continuous control with deep reinforcement learning
[1509.06461] Deep Reinforcement Learning with Double Q-learning
[1511.02274] Stacked Attention Networks for Image Question Answering
[1511.05493] Gated Graph Sequence Neural Networks
[1511.05952] Prioritized Experience Replay
[1511.06581] Dueling Network Architectures for Deep Reinforcement Learning
[1602.01783] Asynchronous Methods for Deep Reinforcement Learning
[1602.04938] "Why Should I Trust You?": Explaining the Predictions of Any Classifier
[1603.00748] Continuous Deep Q-Learning with Model-based Acceleration
[1605.06676] Learning to Communicate with Deep Multi-Agent Reinforcement Learning
[1606.01868] Unifying Count-Based Exploration and Intrinsic Motivation
[1606.02647] Safe and Efficient Off-Policy Reinforcement Learning
[1606.03498] Improved Techniques for Training GANs
[1606.04080] Matching Networks for One Shot Learning
[1606.04474] Learning to learn by gradient descent by gradient descent
[1606.04671] Progressive Neural Networks
[1606.05908] Tutorial on Variational Autoencoders
[1606.07792] Wide & Deep Learning for Recommender Systems
[1606.09375] Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
[1607.00653] node2vec: Scalable Feature Learning for Networks
[1607.06450] Layer Normalization
[1607.08022] Instance Normalization: The Missing Ingredient for Fast Stylization
[1608.03983] SGDR: Stochastic Gradient Descent with Warm Restarts
[1609.02907] Semi-Supervised Classification with Graph Convolutional Networks
[1610.09585] Conditional Image Synthesis With Auxiliary Classifier GANs
[1611.00712] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
[1611.01578] Neural Architecture Search with Reinforcement Learning
[1611.02167] Designing Neural Network Architectures using Reinforcement Learning
[1611.03530] Understanding deep learning requires rethinking generalization
[1611.05397] Reinforcement Learning with Unsupervised Auxiliary Tasks
[1611.07308] Variational Graph Auto-Encoders
[1612.00796] Overcoming catastrophic forgetting in neural networks
[1612.01474] Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
[1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
[1701.06548] Regularizing Neural Networks by Penalizing Confident Output Distributions
[1703.03400] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
[1703.05175] Prototypical Networks for Few-shot Learning
[1703.06103] Modeling Relational Data with Graph Convolutional Networks
[1704.01212] Neural Message Passing for Quantum Chemistry
[1704.03732] Deep Q-learning from Demonstrations
[1705.07204] Ensemble Adversarial Training: Attacks and Defenses
[1705.07874] A Unified Approach to Interpreting Model Predictions
[1706.02216] Inductive Representation Learning on Large Graphs
[1706.02275] Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
[1706.06083] Towards Deep Learning Models Resistant to Adversarial Attacks
[1706.08500] GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
[1706.10295] Noisy Networks for Exploration
[1707.01926] Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
[1707.02286] Emergence of Locomotion Behaviours in Rich Environments
[1707.03141] A Simple Neural Attentive Meta-Learner
[1707.06347] Proximal Policy Optimization Algorithms
[1707.06887] A Distributional Perspective on Reinforcement Learning
[1710.02298] Rainbow: Combining Improvements in Deep Reinforcement Learning
[1710.10196] Progressive Growing of GANs for Improved Quality, Stability, and Variation
[1710.10903] Graph Attention Networks
[1711.00937] Neural Discrete Representation Learning
[1711.03938] CARLA: An Open Urban Driving Simulator
[1711.04043] Few-Shot Learning with Graph Neural Networks
[1711.10907] Deep Reinforcement Learning for De-Novo Drug Design
[1712.01815] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
[1801.04406] Which Training Methods for GANs do actually Converge?
[1801.10247] FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling
[1802.01548] Regularized Evolution for Image Classifier Architecture Search
[1802.03268] Efficient Neural Architecture Search via Parameter Sharing
[1802.05957] Spectral Normalization for Generative Adversarial Networks
[1802.09477] Addressing Function Approximation Error in Actor-Critic Methods
[1803.08494] Group Normalization
[1805.07722] Task-Agnostic Meta-Learning for Few-shot Learning
[1805.08318] Self-Attention Generative Adversarial Networks
[1805.11604] How Does Batch Normalization Help Optimization?
[1806.01261] Relational inductive biases, deep learning, and graph networks
[1806.01973] Graph Convolutional Neural Networks for Web-Scale Recommender Systems
[1806.02473] Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation
[1806.07366] Neural Ordinary Differential Equations
[1806.09055] DARTS: Differentiable Architecture Search
[1807.00734] The relativistic discriminator: a key element missing from standard GAN
[1807.03039] Glow: Generative Flow with Invertible 1x1 Convolutions
[1807.03748] Representation Learning with Contrastive Predictive Coding
[1807.05960] Meta-Learning with Latent Embedding Optimization
[1808.06670] Learning deep representations by mutual information estimation and maximization
[1809.01999] Recurrent World Models Facilitate Policy Evolution
[1809.11096] Large Scale GAN Training for High Fidelity Natural Image Synthesis
[1810.07218] Incremental Few-Shot Learning with Attention Attractor Networks
[1810.09502] How to train your MAML
[1811.03962] A Convergence Theory for Deep Learning via Over-Parameterization
[1812.00332] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
[1812.04948] A Style-Based Generator Architecture for Generative Adversarial Networks
[1812.09926] SNAS: Stochastic Neural Architecture Search
[1902.06720] Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
[1902.10197] RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space
[1903.00374] Model-Based Reinforcement Learning for Atari
[1903.07293] Heterogeneous Graph Attention Network
[1904.08082] Self-Attention Graph Pooling
[1904.09237] On the Convergence of Adam and Beyond
[1904.12848] Unsupervised Data Augmentation for Consistency Training
[1905.00414] Similarity of Neural Network Representations Revisited
[1905.05301] Hierarchically Structured Meta-learning
[1905.06549] TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning
[1905.12265] Strategies for Pre-training Graph Neural Networks
[1906.00446] Generating Diverse High-Fidelity Images with VQ-VAE-2
[1906.00910] Learning Representations by Maximizing Mutual Information Across Views
[1906.02629] When Does Label Smoothing Help?
[1907.04931] GraphSAINT: Graph Sampling Based Inductive Learning Method
[1907.08610] Lookahead Optimizer: k steps forward, 1 step back
[1908.03265] On the Variance of the Adaptive Learning Rate and Beyond
[1909.00025] Meta-Learning with Warped Gradient Descent
[1909.09157] Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
[1911.06455] Graph Transformer Networks
[1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
[1912.01603] Dream to Control: Learning Behaviors by Latent Imagination
[1912.01703] PyTorch: An Imperative Style, High-Performance Deep Learning Library
[1912.02762] Normalizing Flows for Probabilistic Modeling and Inference
[1912.02781] AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
[1912.03820] Meta-Learning without Memorization
[2002.01680] MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding
[2002.09405] Learning to Simulate Complex Physics with Graph Networks
[2003.01332] Heterogeneous Graph Transformer
[2003.10580] Meta Pseudo Labels
[2006.05582] Contrastive Multi-View Representation Learning on Graphs
[2006.07733] Bootstrap your own latent: A new approach to self-supervised Learning
[2006.09661] Implicit Neural Representations with Periodic Activation Functions
[2006.09963] GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training
[2006.10029] Big Self-Supervised Models are Strong Semi-Supervised Learners
[2111.09266] GFlowNet Foundations
[1310.1531] DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
[1311.2524] Rich feature hierarchies for accurate object detection and semantic segmentation
[1311.2901] Visualizing and Understanding Convolutional Networks
[1312.4400] Network In Network
[1312.6199] Intriguing properties of neural networks
[1403.6382] CNN Features off-the-shelf: an Astounding Baseline for Recognition
[1404.7584] High-Speed Tracking with Kernelized Correlation Filters
[1405.0312] Microsoft COCO: Common Objects in Context
[1405.3531] Return of the Devil in the Details: Delving Deep into Convolutional Nets
[1406.2199] Two-Stream Convolutional Networks for Action Recognition in Videos
[1406.4729] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
[1409.0575] ImageNet Large Scale Visual Recognition Challenge
[1409.1556] Very Deep Convolutional Networks for Large-Scale Image Recognition
[1409.4842] Going Deeper with Convolutions
[1411.4038] Fully Convolutional Networks for Semantic Segmentation
[1411.4389] Long-term Recurrent Convolutional Networks for Visual Recognition and Description
[1411.4555] Show and Tell: A Neural Image Caption Generator
[1412.0767] Learning Spatiotemporal Features with 3D Convolutional Networks
[1412.2306] Deep Visual-Semantic Alignments for Generating Image Descriptions
[1412.7062] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
[1501.00092] Image Super-Resolution Using Deep Convolutional Networks
[1502.03044] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
[1502.03240] Conditional Random Fields as Recurrent Neural Networks
[1502.04623] DRAW: A Recurrent Neural Network For Image Generation
[1503.03832] FaceNet: A Unified Embedding for Face Recognition and Clustering
[1505.04597] U-Net: Convolutional Networks for Biomedical Image Segmentation
[1506.01497] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
[1506.02025] Spatial Transformer Networks
[1506.02640] You Only Look Once: Unified, Real-Time Object Detection
[1506.05751] Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
[1508.06576] A Neural Algorithm of Artistic Style
[1511.00561] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
[1511.04587] Accurate Image Super-Resolution Using Very Deep Convolutional Networks
[1511.05298] Structural-RNN: Deep Learning on Spatio-Temporal Graphs
[1511.07122] Multi-Scale Context Aggregation by Dilated Convolutions
[1512.00567] Rethinking the Inception Architecture for Computer Vision
[1512.02325] SSD: Single Shot MultiBox Detector
[1512.03385] Deep Residual Learning for Image Recognition
[1512.04150] Learning Deep Features for Discriminative Localization
[1601.06759] Pixel Recurrent Neural Networks
[1602.01528] EIE: Efficient Inference Engine on Compressed Deep Neural Network
[1602.07261] Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
[1602.07360] SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
[1603.05027] Identity Mappings in Deep Residual Networks
[1603.05279] XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
[1603.06937] Stacked Hourglass Networks for Human Pose Estimation
[1603.08155] Perceptual Losses for Real-Time Style Transfer and Super-Resolution
[1603.08511] Colorful Image Colorization
[1604.01685] The Cityscapes Dataset for Semantic Urban Scene Understanding
[1604.03540] Training Region-based Object Detectors with Online Hard Example Mining
[1604.06573] Convolutional Two-Stream Network Fusion for Video Action Recognition
[1604.07379] Context Encoders: Feature Learning by Inpainting
[1605.05396] Generative Adversarial Text to Image Synthesis
[1605.06211] Fully Convolutional Networks for Semantic Segmentation
[1605.06409] R-FCN: Object Detection via Region-based Fully Convolutional Networks
[1605.07146] Wide Residual Networks
[1606.05328] Conditional Image Generation with PixelCNN Decoders
[1606.07536] Coupled Generative Adversarial Networks
[1606.09549] Fully-Convolutional Siamese Networks for Object Tracking
[1607.02533] Adversarial examples in the physical world
[1608.00859] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
[1608.06993] Densely Connected Convolutional Networks
[1608.08710] Pruning Filters for Efficient ConvNets
[1609.03552] Generative Visual Manipulation on the Natural Image Manifold
[1609.04802] Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
[1609.05143] Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
[1610.02357] Xception: Deep Learning with Depthwise Separable Convolutions
[1610.02391] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
[1611.04076] Least Squares Generative Adversarial Networks
[1611.05431] Aggregated Residual Transformations for Deep Neural Networks
[1611.07004] Image-to-Image Translation with Conditional Adversarial Networks
[1611.08050] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
[1611.10012] Speed/accuracy trade-offs for modern convolutional object detectors
[1612.00593] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
[1612.01105] Pyramid Scene Parsing Network
[1612.03144] Feature Pyramid Networks for Object Detection
[1612.08242] YOLO9000: Better, Faster, Stronger
[1702.05464] Adversarial Discriminative Domain Adaptation
[1703.06211] Deformable Convolutional Networks
[1703.06868] Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
[1703.10593] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
[1704.00028] Improved Training of Wasserstein GANs
[1704.02901] Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
[1704.04861] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
[1704.06904] Residual Attention Network for Image Classification
[1705.07750] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[1706.02413] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
[1706.05587] Rethinking Atrous Convolution for Semantic Image Segmentation
[1707.01083] ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
[1707.02921] Enhanced Deep Residual Networks for Single Image Super-Resolution
[1707.07012] Learning Transferable Architectures for Scalable Image Recognition
[1707.07998] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[1708.02002] Focal Loss for Dense Object Detection
[1709.01507] Squeeze-and-Excitation Networks
[1710.09829] Dynamic Routing Between Capsules
[1711.03213] CyCADA: Cycle-Consistent Adversarial Domain Adaptation
[1711.04340] Data Augmentation Generative Adversarial Networks
[1711.06025] Learning to Compare: Relation Network for Few-Shot Learning
[1711.06897] Single-Shot Refinement Neural Network for Object Detection
[1711.07767] Receptive Field Block Net for Accurate and Fast Object Detection
[1711.07971] Non-local Neural Networks
[1711.11248] A Closer Look at Spatiotemporal Convolutions for Action Recognition
[1711.11585] High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
[1712.00559] Progressive Neural Architecture Search
[1712.00726] Cascade R-CNN: Delving into High Quality Object Detection
[1712.04621] The Effectiveness of Data Augmentation in Image Classification using Deep Learning
[1801.03924] The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
[1801.04381] MobileNetV2: Inverted Residuals and Linear Bottlenecks
[1801.07455] Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
[1801.07698] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
[1801.07829] Dynamic Graph CNN for Learning on Point Clouds
[1802.02611] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
[1802.08797] Residual Dense Network for Image Super-Resolution
[1803.07728] Unsupervised Representation Learning by Predicting Image Rotations
[1804.02767] YOLOv3: An Incremental Improvement
[1804.04732] Multimodal Unsupervised Image-to-Image Translation
[1804.08328] Taskonomy: Disentangling Task Transfer Learning
[1804.09458] Dynamic Few-Shot Visual Learning without Forgetting
[1805.09501] AutoAugment: Learning Augmentation Policies from Data
[1805.11724] Rethinking Knowledge Graph Propagation for Zero-Shot Learning
[1807.05520] Deep Clustering for Unsupervised Learning of Visual Features
[1807.06521] CBAM: Convolutional Block Attention Module
[1807.11164] ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
[1807.11626] MnasNet: Platform-Aware Neural Architecture Search for Mobile
[1808.00897] BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
[1809.00219] ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
[1809.02983] Dual Attention Network for Scene Segmentation
[1811.08883] Rethinking ImageNet Pre-training
[1812.01187] Bag of Tricks for Image Classification with Convolutional Neural Networks
[1812.02391] Meta-Transfer Learning for Few-Shot Learning
[1812.05050] Fast Online Object Tracking and Segmentation: A Unifying Approach
[1812.08928] Slimmable Neural Networks
[1812.11703] SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
[1901.02446] Panoptic Feature Pyramid Networks
[1901.05103] DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
[1902.09212] Deep High-Resolution Representation Learning for Human Pose Estimation
[1903.00241] Mask Scoring R-CNN
[1903.07291] Semantic Image Synthesis with Spatially-Adaptive Normalization
[1903.12355] Local Aggregation for Unsupervised Learning of Visual Embeddings
[1904.01355] FCOS: Fully Convolutional One-Stage Object Detection
[1904.07392] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
[1904.07850] Objects as Points
[1905.02244] Searching for MobileNetV3
[1905.08233] Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
[1905.09272] Data-Efficient Image Recognition with Contrastive Predictive Coding
[1905.11946] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[1906.05849] Contrastive Multiview Coding
[1906.06818] Stacked Capsule Autoencoders
[1907.02544] Large Scale Adversarial Representation Learning
[1907.05740] Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
[1907.10786] Interpreting the Latent Space of GANs for Semantic Face Editing
[1907.11922] MaskGAN: Towards Diverse and Interactive Facial Image Manipulation
[1909.06161] Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs
[1911.04252] Self-training with Noisy Student improves ImageNet classification
[1911.05722] Momentum Contrast for Unsupervised Visual Representation Learning
[1911.09070] EfficientDet: Scalable and Efficient Object Detection
[1912.01991] Self-Supervised Learning of Pretext-Invariant Representations
[1912.04958] Analyzing and Improving the Image Quality of StyleGAN
[2001.00326] NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search
[2001.08735] Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation
[2002.05709] A Simple Framework for Contrastive Learning of Visual Representations
[2003.04297] Improved Baselines with Momentum Contrastive Learning
[2003.04668] Incremental Few-Shot Object Detection
[2003.13678] Designing Network Design Spaces
[2004.08955] ResNeSt: Split-Attention Networks
[2004.10934] YOLOv4: Optimal Speed and Accuracy of Object Detection
[2005.05535] DeepFaceLab: A simple, flexible and extensible face swapping framework
[2005.10243] What Makes for Good Views for Contrastive Learning?
[2005.12872] End-to-End Object Detection with Transformers
[2006.09882] Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
[2010.11929] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
[2011.10566] Exploring Simple Siamese Representation Learning
[2012.12877] Training data-efficient image transformers & distillation through attention
[2103.00020] Learning Transferable Visual Models From Natural Language Supervision
[2103.03230] Barlow Twins: Self-Supervised Learning via Redundancy Reduction
[2103.14030] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[2103.15459] Capsule Network is Not More Robust than Convolutional Network
[2104.07658] Self-supervised Video Object Segmentation by Motion Grouping
[2105.01601] MLP-Mixer: An all-MLP Architecture for Vision
[2105.04906] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
[2107.08430] YOLOX: Exceeding YOLO Series in 2021
[2109.10852] Pix2seq: A Language Modeling Framework for Object Detection
[2110.07641] Non-deep Networks
[2111.06377] Masked Autoencoders Are Scalable Vision Learners
[1301.3781] Efficient Estimation of Word Representations in Vector Space
[1303.5778] Speech Recognition with Deep Recurrent Neural Networks
[1308.0850] Generating Sequences With Recurrent Neural Networks
[1310.4546] Distributed Representations of Words and Phrases and their Compositionality
[1404.2188] A Convolutional Neural Network for Modelling Sentences
[1405.4053] Distributed Representations of Sentences and Documents
[1408.5882] Convolutional Neural Networks for Sentence Classification
[1409.0473] Neural Machine Translation by Jointly Learning to Align and Translate
[1409.1259] On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
[1409.3215] Sequence to Sequence Learning with Neural Networks
[1410.5401] Neural Turing Machines
[1412.5567] Deep Speech: Scaling up end-to-end speech recognition
[1503.00075] Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
[1505.00468] VQA: Visual Question Answering
[1506.03340] Teaching Machines to Read and Comprehend
[1506.05869] A Neural Conversational Model
[1506.07285] Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
[1506.07503] Attention-Based Models for Speech Recognition
[1508.04025] Effective Approaches to Attention-based Neural Machine Translation
[1508.05326] A large annotated corpus for learning natural language inference
[1508.07909] Neural Machine Translation of Rare Words with Subword Units
[1509.00685] A Neural Attention Model for Abstractive Sentence Summarization
[1509.01626] Character-level Convolutional Networks for Text Classification
[1511.06709] Improving Neural Machine Translation Models with Monolingual Data
[1511.08308] Named Entity Recognition with Bidirectional LSTM-CNNs
[1512.02595] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
[1601.06733] Long Short-Term Memory-Networks for Machine Reading
[1602.02410] Exploring the Limits of Language Modeling
[1603.01354] End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
[1603.01360] Neural Architectures for Named Entity Recognition
[1606.05250] SQuAD: 100,000+ Questions for Machine Comprehension of Text
[1607.01759] Bag of Tricks for Efficient Text Classification
[1607.04606] Enriching Word Vectors with Subword Information
[1609.03499] WaveNet: A Generative Model for Raw Audio
[1611.01603] Bidirectional Attention Flow for Machine Comprehension
[1612.08083] Language Modeling with Gated Convolutional Networks
[1701.02810] OpenNMT: Open-Source Toolkit for Neural Machine Translation
[1701.06547] Adversarial Learning for Neural Dialogue Generation
[1703.03130] A Structured Self-attentive Sentence Embedding
[1703.10135] Tacotron: Towards End-to-End Speech Synthesis
[1704.00051] Reading Wikipedia to Answer Open-Domain Questions
[1704.04368] Get To The Point: Summarization with Pointer-Generator Networks
[1705.03122] Convolutional Sequence to Sequence Learning
[1706.01427] A simple neural network module for relational reasoning
[1706.03762] Attention Is All You Need
[1707.07328] Adversarial Examples for Evaluating Reading Comprehension Systems
[1708.00107] Learned in Translation: Contextualized Word Vectors
[1712.05884] Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
[1801.06146] Universal Language Model Fine-tuning for Text Classification
[1802.05365] Deep contextualized word representations
[1802.08435] Efficient Neural Audio Synthesis
[1806.03822] Know What You Don't Know: Unanswerable Questions for SQuAD
[1807.03819] Universal Transformers
[1809.05679] Graph Convolutional Networks for Text Classification
[1809.07454] Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
[1809.08267] Neural Approaches to Conversational AI
[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
[1811.00002] WaveGlow: A Flow-based Generative Network for Speech Synthesis
[1901.02860] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
[1901.07291] Cross-lingual Language Model Pretraining
[1901.11117] The Evolved Transformer
[1901.11504] Multi-Task Deep Neural Networks for Natural Language Understanding
[1904.09223] ERNIE: Enhanced Representation through Knowledge Integration
[1904.09751] The Curious Case of Neural Text Degeneration
[1904.10509] Generating Long Sequences with Sparse Transformers
[1905.02450] MASS: Masked Sequence to Sequence Pre-training for Language Generation
[1905.03197] Unified Language Model Pre-training for Natural Language Understanding and Generation
[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities
[1906.01502] How multilingual is Multilingual BERT?
[1906.04341] What Does BERT Look At? An Analysis of BERT's Attention
[1906.08237] XLNet: Generalized Autoregressive Pretraining for Language Understanding
[1907.11692] RoBERTa: A Robustly Optimized BERT Pretraining Approach
[1907.12412] ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
[1908.08530] VL-BERT: Pre-training of Generic Visual-Linguistic Representations
[1908.10084] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
[1909.10351] TinyBERT: Distilling BERT for Natural Language Understanding
[1909.11942] ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
[1910.01108] DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
[1910.10683] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
[1910.11856] On the Cross-lingual Transferability of Monolingual Representations
[1911.00536] DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
[1911.02116] Unsupervised Cross-lingual Representation Learning at Scale
[1911.03894] CamemBERT: a Tasty French Language Model
[2001.04451] Reformer: The Efficient Transformer
[2003.10555] ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
[2004.05150] Longformer: The Long-Document Transformer
[2005.14165] Language Models are Few-Shot Learners
[1206.5538] Representation Learning: A Review and New Perspectives
[1603.07285] A guide to convolution arithmetic for deep learning
[1705.02801] Graph Embedding Techniques, Applications, and Performance: A Survey
[1709.07604] A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
[1801.00553] Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey
[1802.00614] Visual Interpretability for Deep Learning: a Survey
[1802.03601] Deep Visual Domain Adaptation: A Survey
[1804.06655] Deep Face Recognition: A Survey
[1808.05377] Neural Architecture Search: A Survey
[1812.08434] Graph Neural Networks: A Review of Methods and Applications
[1901.00596] A Comprehensive Survey on Graph Neural Networks
[1901.04407] Self-Driving Cars: A Survey
[1901.06032] A Survey of the Recent Architectures of Deep Convolutional Neural Networks
[1902.06068] Deep Learning for Image Super-resolution: A Survey
[1904.05046] Generalizing from a Few Examples: A Survey on Few-Shot Learning
[1904.08067] Text Classification Algorithms: A Survey
[1905.05055] Object Detection in 20 Years: A Survey
[1906.01529] Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy
[1907.09408] A Survey of Deep Learning-based Object Detection
[1907.12740] Deep Learning in Video Multi-Object Tracking: A Survey
[1912.00535] Deep Learning for Visual Tracking: A Comprehensive Survey
[2001.01582] A Survey on Machine Reading Comprehension Systems
[2001.05566] Image Segmentation Using Deep Learning: A Survey
[2002.00388] A Survey on Knowledge Graphs: Representation, Acquisition and Applications
[2003.08271] Pre-trained Models for Natural Language Processing: A Survey
[2004.05439] Meta-Learning in Neural Networks: A Survey
[2004.11149] A Comprehensive Overview and Survey of Recent Advances in Meta-Learning
[2006.01423] Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods
[2006.08218] Self-supervised Learning: Generative or Contrastive
[2006.14799] Evaluation of Text Generation: A Survey
[2009.06732] Efficient Transformers: A Survey
[2102.10757] Self-Supervised Learning of Graph Neural Networks: A Unified Review
[2105.04387] Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
[2109.06668] Exploration in Deep Reinforcement Learning: A Comprehensive Survey
[2111.07624] Attention Mechanisms in Computer Vision: A Survey
[1604.06778] Benchmarking Deep Reinforcement Learning for Continuous Control
[1804.07461] GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
[1810.00826] How Powerful are Graph Neural Networks?
[1904.04232] A Closer Look at Few-shot Classification
[1905.00537] SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
[1909.02729] A Baseline for Few-Shot Image Classification
[2003.00982] Benchmarking Graph Neural Networks
[2003.04390] Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning
[2005.00687] Open Graph Benchmark: Datasets for Machine Learning on Graphs