Retrieval speed up #638

isaac-chung · 2024-05-06T13:41:20Z

isaac-chung
May 6, 2024
Collaborator

Some of the longest eval time for retrieval datasets are Robust04InstructionRetrieval (1943s), NeuCLIR2022Retrieval (37k s - 10+ hours!) and NeuCLIR2023Retrieval (similar). Open to suggestions about ways to speed up retrieval tasks.

If the bottleneck is in the encoding process (e.g. for large corpora), maybe we can leverage multiprocessing from sentence transformers

CC @KennethEnevoldsen @imenelydiaker @Muennighoff @x-tabdeveloping @orionw and anyone who'd be interested.

orionw · 2024-05-06T13:55:21Z

orionw
May 6, 2024
Maintainer

30 minutes seems right for Robust04InstructionRetrieval, for a smaller model size. It's 52k * 2 embeddings for that dataset. You can cut that in half by passing in the flag save_corpus_embeddings=True

For NeuCLIR, implementing a cache (as in #381 and #354) would cut the time in half also.

For the full-scale retrieval tasks like NeuCLIR or the to-be-implemented MiRACL datasets (#198) we're talking about millions of passages to embed. I assume this is comparable to the MSMarco dataset in the standard MTEB benchmark but I don't have those numbers offhand.

I think in #381 there was some discussion about ways we could speed it up - the most effective will be subsampling down the collection, although that will make our scores incompatible with the current benchmark scores on the task and lead to some confusion. I don't think there's an easy solution that will solve this and keeps the benchmark comparable other than adding more GPUs... However, we can probably shave off some percent (~20%) with some of the items mentioned in that thread about dataloaders and such.

4 replies

isaac-chung May 6, 2024
Collaborator Author

passing in the flag save_corpus_embeddings=True

Amazing. Do we do that currently? If not, maybe we could pass that in by default.

subsampling down the collection, although that will make our scores incompatible with the current benchmark scores

If we'd like to keep that compatibility (I think we do 👀 ), then either adding more GPUs or multiprocessing w/ CPUs may be the way to go.

Muennighoff May 6, 2024
Maintainer

If we'd like to keep that compatibility (I think we do 👀 ), then either adding more GPUs or multiprocessing w/ CPUs may be the way to go.

You can already do multi-GPU with MTEB as is, for example done here: https://github.com/ContextualAI/gritlm/blob/9883da1e77812e6ba2c107dc7b65d8c5ddc7396b/gritlm/gritlm.py#L75 or here https://github.com/microsoft/unilm/blob/73761b60b1746c6a4e746cb0abac1c55eb9f385b/e5/mteb_beir_eval.py#L47

(but maybe you mean something different, sorry if I'm misunderstanding!)

isaac-chung May 7, 2024
Collaborator Author

I guess the idea is to explore whether we could/should have multi-GPU/process enabled by default if the hardware is available, say to all visible devices. Might need to see if multiligual E5 or other common models have that already. I don't believe multi-process is supported. Do you think that's something worthwhile?

At the same time, I can also take a look at the linked issues above (e.g. re: dataset loading, disk cache).

KennethEnevoldsen May 7, 2024
Maintainer

re. @orionw's point on downsampling. I do agree that it is important to keep scores comparable, but at least for datasets without comparable scores it might be worth downsampling if it doesn't meaningfully change the performance.

Even then it might still be worth it to introduce a mini version of the largest retrieval datasets to allow for running the benchmarks faster (though we can just keep both or allow a downsampling flag).

x-tabdeveloping · 2024-05-08T17:02:58Z

x-tabdeveloping
May 8, 2024
Collaborator

I experimented a bit around with trying to make the embedding process faster.
I did this with a 1000 passages from 20Newsgroups on my i7 work laptop on the CPU.

A couple of things I tried to speed things up:

using HuggingFace transformers instead of SentenceTransformers - significantly slower
HuggingFace with BetterTransformer - still significantly slower
torch.compile in all sorts of combinations - it made things slower, even after warmup
tokenizing all sentences before batching - did nothing
playing around with batch sizes - tiny speed bump from 8 to 32, but not from 32 to 64 or 128

Things I haven't yet tried:

torchscript
FX tracing

These could in theory speed it up somewhat but I'm not sure it would be worth the effort.
I looked quite a bit into SentenceTransformers and it doesn't seem to me like there is an easy way to make it any more optimized.
Keep in mind that I did not use a GPU, results could be totally different there.

6 replies

isaac-chung May 8, 2024
Collaborator Author

Thanks for looking more into this. Originally I thought sentence transformers multiprocessing would help for the both CPU and GPU cases, but haven't got the chance to try it out yet.

But as I understand for the paper we are running these on GPU. And if we choose models that already have the implementation that Niklas shared, then maybe this avenue might not yield the highest impact.

x-tabdeveloping May 9, 2024
Collaborator

@KennethEnevoldsen agreed. Though it would have been nice if there was a quick and easy way for us to gain performance (torch.compile). It doesn't seem like it will help much though :D

x-tabdeveloping May 9, 2024
Collaborator

Using fast_sentence_transformers might still help, I will look into it.

x-tabdeveloping May 9, 2024
Collaborator

... which just uses ONNX as far as I understand, it might be worth it to give it a try

x-tabdeveloping May 9, 2024
Collaborator

Nope ONNX didn't help much either

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval speed up #638

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 10 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Retrieval speed up #638

isaac-chung May 6, 2024 Collaborator

Replies: 2 comments · 10 replies

orionw May 6, 2024 Maintainer

isaac-chung May 6, 2024 Collaborator Author

Muennighoff May 6, 2024 Maintainer

isaac-chung May 7, 2024 Collaborator Author

KennethEnevoldsen May 7, 2024 Maintainer

x-tabdeveloping May 8, 2024 Collaborator

isaac-chung May 8, 2024 Collaborator Author

x-tabdeveloping May 9, 2024 Collaborator

x-tabdeveloping May 9, 2024 Collaborator

x-tabdeveloping May 9, 2024 Collaborator

x-tabdeveloping May 9, 2024 Collaborator

isaac-chung
May 6, 2024
Collaborator

Replies: 2 comments 10 replies

orionw
May 6, 2024
Maintainer

isaac-chung May 6, 2024
Collaborator Author

Muennighoff May 6, 2024
Maintainer

isaac-chung May 7, 2024
Collaborator Author

KennethEnevoldsen May 7, 2024
Maintainer

x-tabdeveloping
May 8, 2024
Collaborator

isaac-chung May 8, 2024
Collaborator Author

x-tabdeveloping May 9, 2024
Collaborator

x-tabdeveloping May 9, 2024
Collaborator

x-tabdeveloping May 9, 2024
Collaborator

x-tabdeveloping May 9, 2024
Collaborator