Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random keys, the birthday paradox, and performance #1076

Open
LourensVeen opened this issue Oct 8, 2024 · 0 comments
Open

Random keys, the birthday paradox, and performance #1076

LourensVeen opened this issue Oct 8, 2024 · 0 comments

Comments

@LourensVeen
Copy link
Collaborator

By default, AMUSE generates particle keys as 64-bit (pseudo)random numbers. By the birthday paradox, you need about 5 billion particles to have a 50% chance of a collision. For 100 million particles (which is more realistic in AMUSE), the probability is about 1 in 3700 runs, for a billion particles it's about 1 in 37. Particle ids are 32-bit signed integers, so if you have enough memory it's possible to get in trouble.

It seems to me that it should be possible to just have a single key generator object (a singleton) that is used to generate keys in batches, and which hands out subsequent ranges of keys. There's already a class that does that actually, BasicUniqueKeyGenerator, it's just not used by default. If particles are made in batches, and codes give them subsequent ids (which they probably do, since it's probably the index into the data array), and keys are generated in the same batches, and are also given subsequent ids, then the mapping between them becomes a piecewise linear function with slope 1 that can be described as a list of boundaries and offsets.

That would save a ton of memory compared to the full index lists currently in amuse.datamodel, and would therefore probably speed things up.

Do note that the code that deals with this is complex, and I don't understand it all, so there may be something keeping this from working. Also, I don't know if this keys to indices conversion is actually a bottleneck. Still, it seemed worth recording, so here's an issue.

@LourensVeen LourensVeen changed the title Random keys and the birthday paradox Random keys, the birthday paradox, and performance Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant