You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, AMUSE generates particle keys as 64-bit (pseudo)random numbers. By the birthday paradox, you need about 5 billion particles to have a 50% chance of a collision. For 100 million particles (which is more realistic in AMUSE), the probability is about 1 in 3700 runs, for a billion particles it's about 1 in 37. Particle ids are 32-bit signed integers, so if you have enough memory it's possible to get in trouble.
It seems to me that it should be possible to just have a single key generator object (a singleton) that is used to generate keys in batches, and which hands out subsequent ranges of keys. There's already a class that does that actually, BasicUniqueKeyGenerator, it's just not used by default. If particles are made in batches, and codes give them subsequent ids (which they probably do, since it's probably the index into the data array), and keys are generated in the same batches, and are also given subsequent ids, then the mapping between them becomes a piecewise linear function with slope 1 that can be described as a list of boundaries and offsets.
That would save a ton of memory compared to the full index lists currently in amuse.datamodel, and would therefore probably speed things up.
Do note that the code that deals with this is complex, and I don't understand it all, so there may be something keeping this from working. Also, I don't know if this keys to indices conversion is actually a bottleneck. Still, it seemed worth recording, so here's an issue.
The text was updated successfully, but these errors were encountered:
LourensVeen
changed the title
Random keys and the birthday paradox
Random keys, the birthday paradox, and performance
Oct 8, 2024
By default, AMUSE generates particle keys as 64-bit (pseudo)random numbers. By the birthday paradox, you need about 5 billion particles to have a 50% chance of a collision. For 100 million particles (which is more realistic in AMUSE), the probability is about 1 in 3700 runs, for a billion particles it's about 1 in 37. Particle ids are 32-bit signed integers, so if you have enough memory it's possible to get in trouble.
It seems to me that it should be possible to just have a single key generator object (a singleton) that is used to generate keys in batches, and which hands out subsequent ranges of keys. There's already a class that does that actually,
BasicUniqueKeyGenerator
, it's just not used by default. If particles are made in batches, and codes give them subsequent ids (which they probably do, since it's probably the index into the data array), and keys are generated in the same batches, and are also given subsequent ids, then the mapping between them becomes a piecewise linear function with slope 1 that can be described as a list of boundaries and offsets.That would save a ton of memory compared to the full index lists currently in
amuse.datamodel
, and would therefore probably speed things up.Do note that the code that deals with this is complex, and I don't understand it all, so there may be something keeping this from working. Also, I don't know if this keys to indices conversion is actually a bottleneck. Still, it seemed worth recording, so here's an issue.
The text was updated successfully, but these errors were encountered: