-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simpler and clearer grouping configuration for AttractorsViaFeaturizing #84
Comments
Sure but do you compare each feature with each other feature? This means N^2 comparisons. Or do you just compare each feature with the previous feature? Your code has only one loop which makes me think that you are doing the latter, but N^2 comparisons would be the correct thing to do...? Also, this is somewhat related to In any case, yes, I think this should become a new grouping confing. Thankfully the API for adding a new Grouping configuration is well declared so it shouldn't be hard to put in a PR together! https://juliadynamics.github.io/Attractors.jl/dev/attractors/#Grouping-types |
BTW why are you making these functions in the first place instead of just going ahead and using I think, this is a major advantage of using interfaces, that you don't need to hack things together but rather follow the interface. It would probably save you lots of hacking and editing the source code of the package to get end results. |
Also, how does (but for histogram you need to know the end points of the histogram in advance) |
Compare each feature with the already-found clusters, so it's ~ N * number_attractors comparisons. Since I'm assuming that the features belonging to a same cluster are very close to each other, there's no reason to compare all of them pairwise. Or am I missing something?
But there you need to already know the attractors.
Yup, you're right. Implemented like this because was quicker, but I'll work on the API
Oh yeah good point, I'd forgotten about |
Yes, but it currently has the same limitation that Cartesian Indices has, because it uses cartesian indices to access the histogram. So, same as in #76 . A problem to solve first before solving this cartesian indexing issue is JuliaDynamics/StateSpaceSets.jl#18 |
The standard way to group features in the
AttractorsViaFeaturizing
method is to clusterize them using DBSCAN. As we've known for some time, the algorithm is more or less a black box, and so it can be quite hard to debug it. Non-optimal parameters can lead to (i) finding the same attractor more than once (it might separate two trajectories on the same attractor that have slightly different feature values), or (ii) not finding one or more attractors that are there (so grouping 2+ attractors together). Sometimes both can occur for the same parametrization. In particular for the projects I've been working on, it has been very frustrating to figure out an optimal parametrization, as you really need to look deep into all details of the computations.As I understand, DBSCAN is great if you have noisy data, then it works great for identifying the clouds (clusters) of higher density. But in a wide range of applications I have in mind, this doesn't seem necessary, since
As a consequence, feature space is composed of very small and dense clouds with a wide separation between themselves. DBSCAN is thus not really needed, we can use something much simpler / more intuitive. I've been having success with a simple comparison: just comparing the distances between each feature, and separating them if that distance is higher than a threshold.
So far I'm naming this grouping config as GroupingViaComparing. It has been working much better than DBSCAN, as I know exactly what it's doing all the time, so it's easier to figure a good parameter (which is just one: the maximum distance between in-cluster features) and there are no unexpected behaviors.
Note also that it has some nice advantages:
features
could very well just be the attractors themselves; then the distance metric could beHausdorff()
for instance, meaning we would distinguish attractors directly by their distances in state space, skipping entirely the need for features. Of course this would be slower, it can be parallelized and maybe the distances don't need to be computed for lots of time pointsIt needs more work and definitely more testing but I'm so far quite happy. What do you think?
The text was updated successfully, but these errors were encountered: