NLP Embedding

Requirement: Tensorflow 1.14, Tensorboard 1.14

The code is not rigorously tested, if you find a bug, welcome PR ^_^ ~

Model List

Word2Vec: sogou新闻数据
Fasttext: quora kaggle 分类数据
Doc2Vec[PV-DBOW/PV-DM]: sogou新闻数据
skip-thought: bookcorpus爬虫数据
quick-thought: bookcorpus爬虫数据
CNN-LSTM: bookcorpus爬虫数据
transformer: WMT英翻中任务

Paper List

词向量

[Word2Vec] Distributed Representations of Words and Phrases and their Compositionality (Google 2013)
[Word2Vec] Efficient Estimation of Word Representations in Vector Space (Google 2013)
[Word2Vec] word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method (2014)
[Word2Vec] word2vec Parameter Learning Explained (2016)
[Fasttext] Enriching Word Vectors with Subword Information (Facebook 2017)
[Fasttext] Bag of Tricks for Efficient Text Classification (Facebook 2016)
[Glove] Global Vectors for Word Representation (2014)
[ELMo] Deep contextualized word representations (2018)

文本向量

[Doc2vec] Distributed Representations of Sentences and Documents (Google 2014)
[Doc2vec] A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SEN- TENCE EMBEDDINGS (2017)
[Encoder-Decoder: Skip-Thought] Skip-Thought Vectors (2015)
[Encoder-Decoder: Skip-Thought] Rethinking Skip-thought- A Neighborhood based Approach (2017)
[Encoder-Decoder: CNN-LSTM]Learning Generic Sentence Representations Using Convolutional Neural Networks (2017)
[Encoder-Decoder: Quick-Thought] Quick-Thought: AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS (Google 2018)
[Transformer] Attention is all you need (2017)
[FastSent|DVAE]Learning Distributed Representations of Sentences from Unlabelled Data (2016)
[Siamese] Learning Text Similarity with Siamese Recurrent Networks (2016)
[InferSent] Supervised Learning of Universal Sentence Representations from Natural Language Inference Data (2018)
[GenSen] LEARNING GENERAL PURPOSE DISTRIBUTED SENTENCE REPRESENTATIONS VIA LARGE SCALE MULTITASK LEARNING (2018)
[USE] Universal Sentence Encoder (Google 2018)
[ULMFit] Universal Language Model Fine-tuning for Text Classification (fastai 2018)
[GPT] Improving Language Understanding by Generative Pre-Training (openai 2018)
[Bert] Pre-training of Deep Bidirectional Transformers for Language Understanding（Google 2019)
[Sentence-BERT] Sentence Embeddings using Siamese BERT-Networks (2019)
[Bert-flow] On the Sentence Embedding from Pre-trained Language Model(2020)
[Representation] Fine-Grained Analysis of Sentence Embedding Using Auxiliary Prediction Tasks (2017)
[Representation] What you can cram into a single vector: Probing Sentence Embedding for linguistic properties(2018)
[Representation] Assessing Composition in Sentence Vector Representations (2018)

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
bookcorpus @ 6579799		bookcorpus @ 6579799
cnn_lstm		cnn_lstm
config		config
const		const
data		data
doc2vec		doc2vec
encoder_decodeer_helper		encoder_decodeer_helper
fasttext		fasttext
finetune_use		finetune_use
paper		paper
quick_thought		quick_thought
skip_thought		skip_thought
skip_thought_archived		skip_thought_archived
transformer		transformer
word2vec		word2vec
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
BaseDataset.py		BaseDataset.py
README.md		README.md
layers.py		layers.py
train_utils.py		train_utils.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Embedding

Model List

Paper List

词向量

文本向量

Blog

About

Releases

Packages

Languages

DSXiangLi/Embedding

Folders and files

Latest commit

History

Repository files navigation

NLP Embedding

Model List

Paper List

词向量

文本向量

Blog

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages