You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically what I would want is to run something like this:
import os
import random
from annoy import AnnoyIndex
num_rows = 10000
num_trees = 10
num_dims = 512
try:
os.remove("annoy_idx.ann")
except FileNotFoundError:
pass
annoy_idx = AnnoyIndex(num_dims, "angular")
annoy_idx.on_disk_build("annoy_idx.ann")
for idx in range(num_rows):
vector = [random.gauss(0, 1) for _ in range(num_dims)]
annoy_idx.add_item(idx, vector)
annoy_idx.serialize("annoy_idx.state") # XXX - This is the magic I'm looking for
and then (after that program is done and exited) I would like to continue appending data like something like this (this adds 10,000 new rows with indices 10,000, ..., 19,999):
import os
import random
from annoy import AnnoyIndex
num_rows = 10000
num_trees = 10
num_dims = 512
try:
os.remove("annoy_idx.ann")
except FileNotFoundError:
pass
annoy_idx = AnnoyIndex(num_dims, "angular")
annoy_idx.deserialize("annoy_idx.state") # XXX - This is the magic I'm looking for
for idx in range(num_rows):
vector = [random.gauss(0, 1) for _ in range(num_dims)]
annoy_idx.add_item(idx +num_rows, vector) # XXX - Note the increase in idx variable
So basically what I want is for there to be a serialize/deserialize ability so that I can continue the flow. It seems to me like the protected data here would need to be serialized:
How realistic is this? More specifically, assuming that I am able to successfully serialize/deserialize the state, does it seem like this would play well with the mmap in the on_disk_build() step? This is maybe too general a question, but basically my point is: is this totally crazy? Are there obvious flaws with my thinking if I decided to go this route?
Thanks for any help!
The text was updated successfully, but these errors were encountered:
Basically what I would want is to run something like this:
and then (after that program is done and exited) I would like to continue appending data like something like this (this adds 10,000 new rows with indices 10,000, ..., 19,999):
So basically what I want is for there to be a serialize/deserialize ability so that I can continue the flow. It seems to me like the protected data here would need to be serialized:
https://github.com/spotify/annoy/blob/master/src/annoylib.h#L847-L885
In my case it seems to basically serializing the node here:
https://github.com/spotify/annoy/blob/master/src/annoylib.h#L442-L463
So my question is the following:
How realistic is this? More specifically, assuming that I am able to successfully serialize/deserialize the state, does it seem like this would play well with the mmap in the
on_disk_build()
step? This is maybe too general a question, but basically my point is: is this totally crazy? Are there obvious flaws with my thinking if I decided to go this route?Thanks for any help!
The text was updated successfully, but these errors were encountered: