Skip to content

English vectors for spaCy v3.4

Pre-release
Pre-release
Compare
Choose a tag to compare
@adrianeboyd adrianeboyd released this 14 Jul 11:07
· 1 commit to master since this release

English vectors trained for spaCy v3.4.0 using floret.

The en_vectors_fasttext vectors were trained with floret in fasttext mode and are the same vectors as in en_core_web_lg v3.4.0.

The floret vectors are trained in floret mode on the same data with 50K entries (md) and 200K entries (lg).

Note that the .bin files are only compatible with floret, not fasttext. Load with the command-line floret or the python module:

import floret
model = floret.load_model("en_vectors_floret_md.bin")
model.get_subwords("covid")
# (['<covid>', '<covi', 'covid', 'ovid>'], array([517646, 541731, 558180, 540981, 527325, 538060, 559280, 538021]))
model.get_nearest_neighbors("covid")
# [(0.70456463098526, 'Covid'), (0.6891582012176514, 'COVID'), (0.6806262135505676, 'covid-19'), (0.607974648475647, 'Covid-19'), (0.5875810384750366, 'COVID-19'), (0.5560713410377502, 'covid19'), (0.5450572371482849, 'coronavirus'), (0.5238808393478394, 'Covid19'), (0.5168178081512451, 'pandemic'), (0.5062406659126282, 'Coronavirus')]