English vectors for spaCy v3.4
Pre-release
Pre-release
English vectors trained for spaCy v3.4.0 using floret.
The en_vectors_fasttext
vectors were trained with floret in fasttext mode and are the same vectors as in en_core_web_lg
v3.4.0.
The floret vectors are trained in floret mode on the same data with 50K entries (md
) and 200K entries (lg
).
Note that the .bin
files are only compatible with floret
, not fasttext
. Load with the command-line floret
or the python module:
import floret
model = floret.load_model("en_vectors_floret_md.bin")
model.get_subwords("covid")
# (['<covid>', '<covi', 'covid', 'ovid>'], array([517646, 541731, 558180, 540981, 527325, 538060, 559280, 538021]))
model.get_nearest_neighbors("covid")
# [(0.70456463098526, 'Covid'), (0.6891582012176514, 'COVID'), (0.6806262135505676, 'covid-19'), (0.607974648475647, 'Covid-19'), (0.5875810384750366, 'COVID-19'), (0.5560713410377502, 'covid19'), (0.5450572371482849, 'coronavirus'), (0.5238808393478394, 'Covid19'), (0.5168178081512451, 'pandemic'), (0.5062406659126282, 'Coronavirus')]