Files

c

klb3713

Oct 30, 2013

bd814d9 · Oct 30, 2013

Name	Name	Last commit message	Last commit date
LICENSE	LICENSE	add c++ version	Oct 29, 2013
README.txt	README.txt	add c++ version	Oct 29, 2013
compute-accuracy.c	compute-accuracy.c	add c++ version	Oct 29, 2013
demo-analogy.sh	demo-analogy.sh	add c++ version	Oct 29, 2013
demo-classes.sh	demo-classes.sh	add c++ version	Oct 29, 2013
demo-phrase-accuracy.sh	demo-phrase-accuracy.sh	add c++ version	Oct 29, 2013
demo-phrases.sh	demo-phrases.sh	add c++ version	Oct 29, 2013
demo-word-accuracy.sh	demo-word-accuracy.sh	add c++ version	Oct 29, 2013
demo-word.sh	demo-word.sh	add c++ version	Oct 29, 2013
distance.c	distance.c	add c++ version	Oct 29, 2013
makefile	makefile	add c++ version	Oct 29, 2013
questions-phrases.txt	questions-phrases.txt	add python version	Oct 30, 2013
questions-words.txt	questions-words.txt	add python version	Oct 30, 2013
word-analogy.c	word-analogy.c	add c++ version	Oct 29, 2013
word2phrase.c	word2phrase.c	add c++ version	Oct 29, 2013
word2vec.c	word2vec.c	add c++ version	Oct 29, 2013

README.txt

Tools for computing distributed representtion of words
------------------------------------------------------

We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.

Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous
Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:
 - desired vector dimensionality
 - the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
 - training algorithm: hierarchical softmax and / or negative sampling
 - threshold for downsampling the frequent words 
 - number of threads to use
 - the format of the output word vector file (text or binary)

Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets. 

The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training
is finished, the user can interactively explore the similarity of the words.

More information about the scripts is provided at https://code.google.com/p/word2vec/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

c

c

README.txt

Files

c

Directory actions

More options

Directory actions

More options

Latest commit

History

c

Folders and files

README.txt