Skip to content
This repository was archived by the owner on Jun 2, 2020. It is now read-only.

iterator works slow #12

Closed
gabrieldodan opened this issue Aug 28, 2014 · 9 comments
Closed

iterator works slow #12

gabrieldodan opened this issue Aug 28, 2014 · 9 comments

Comments

@gabrieldodan
Copy link

Hi, I have made some performance tests and I noticed that the iterator works very slow. I think is because of callback for each iteration. On each iteration it call the callback function, that's expensive. I think a Sync version for next() function would be very useful, e.g. iterator.nextSync() that should return the current key/value pair. What you think ?

Besides that, the library is really fast !

@stephenmathieson
Copy link
Owner

hmm yeah a sync version may be nice. I haven't hit performance issues with #next(), but we do jump from JS to c++ a lot, so I believe it may be slow.

the next addition I make to this lib should be node 0.11 support, but I may find time for sync iterarors. if you opened a pr, I'd happily merge, btw ;)

@gabrieldodan
Copy link
Author

Thanks :) Most probably is slow because of callbacks. I have also compared the performance of Async get/set vs Sync get/set. Async versions are much slower.

Also, a new suggestion. Would be very useful to have batch operations, for example to read multiple keys in one shot , a functionality somehow similar to Redis MULTI. e.g. getMulti(['k1', 'k2', 'k3'], cb) where cb is cb(err, values) In this way you can avoid so much jumps from JS to C++, you call only one callback ;)

@stephenmathieson
Copy link
Owner

yeah, the async stuff has to:

  1. cross from JS to c++
  2. be added to the libuv pool
  3. wait its turn for execution
  4. actually run
  5. callback js

whereas the sync versions:

  1. cross from JS to c++
  2. run

quite a few less things to do ;)

a mutli-get makes sense too. it lands out of scope of the iterator, unless we'd do something like:

var iterator = db.iterator()
iterator.next(5 /* get 5 the next keys */, function (err, values) {
  if (err) handleError(err)
  // values = [ ... ]
})

also, if you run $ make benchmarks in the root of this repo, you'll be able to see iterator performance vs LevelDB. i think you'll be impressed ;)

edit:

                      Sophist#iterator
              70 op/s » iterate 1000 keys

                      LevelDOWN#iterator
              46 op/s » iterate 1000 keys

@gabrieldodan
Copy link
Author

I am already impressed by this library :) seems it is perfect for my needs. It is really fast except Async iterators, according to sophia benchmarks http://sphia.org/benchmarks.html reading data using iterators should by much much faster than using get(key)

iterator.next(5, cb) would be useful also but not as critical as Sync iterator. I have made many tests with thousands of databases and hundreds of keys per database, it reads data very fast, I am curious how fast it reads using a Sync iterator. Any idea how would take you to implement the Sync iterator ? If you are very busy I could take a look at C++ source maybe I can implement myself Sync iterator.

@stephenmathieson
Copy link
Owner

a sync #next() shouldn't be hard. the async method is here.

for sync, maybe something like (this has not been tested!):

NAN_METHOD(Iterator::NextSync) {
  NanScope();
  Iterator *self = ObjectWrap::Unwrap<Iterator>(args.This());
  sophia::IteratorResult *result = self->it->Next();
  if (result) {
    v8::Local<v8::Object> obj = NanNew<v8::Object>();
    obj->Set(NanNew("key"), NanNew(result->key));
    obj->Set(NanNew("value"), NanNew(result->value));
    NanReturnValue(obj);
  } else {
    NanReturnUndefined();
  }
}

stephenmathieson pushed a commit that referenced this issue Sep 2, 2014
@stephenmathieson
Copy link
Owner

@gabrieldodan I've been working on a new release (1.0.0). I've added Iterator#nextSync() as well as Iterator#endSync(). If you've got time, checkout the wip-node-11 branch ;)

@gabrieldodan
Copy link
Author

Thanks, will do some tests tomorrow :) Also if you could implement muti-read that will be great! The idea is to read batch of data in only one call, to avoid too many jumps from JS to C++. An iterator like this:

iterate({start: 'k1', end:'k100', reverse: false}, function (err, values) {
});

iterate({start: 'k1', limit:20, reverse: false}, function (err, values) { // read only 20 records starting from k1
});

It doesn't make sense to return object by object, do the iteration on C++ and give me only the results! Will be also useful an option to return only the keys or only the values, or both. Thanks.

@stephenmathieson
Copy link
Owner

hmm.. iterating like that over a large repository would cause significant performance issues when dealing with any sort of concurrency. assuming we'd only push one task into uv's queue (read everything), everything else would be blocked (http stuff, other iterators, etc.).

now, if we queued a job for every read in uv's pool, we'd avoid blocking the main thread, but the iterator would be slow (again). all we'd do is reduce the number of times we go from js <-> c++.

also: opened separate thread: #13.

@stephenmathieson
Copy link
Owner

version 1.0.0 now has iterator#nextSync() implemented.

Will address iterator#next(N, fn) later on.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants