Cluster analysis #674

ivanstepanovftw · 2024-12-17T13:55:37Z

How to get clusters for each feature vector like displayed on the README page?

I'm currently using this function to implement clusterization algorithm, but it is not fast enough:

def annoy_clustering(data, num_trees=10, num_neighbors=10):
    n_samples, n_features = data.shape

    # Step 1: Build the Annoy index
    annoy_index = AnnoyIndex(n_features, metric='euclidean')
    for i in range(n_samples):
        annoy_index.add_item(i, data[i])
    annoy_index.build(num_trees)

    # Step 2: Assign clusters based on nearest neighbors
    labels = np.full(n_samples, -1)  # Initialize all labels as -1
    cluster_id = 0

    for i in range(n_samples):
        if labels[i] == -1:  # If the point is not yet labeled
            # Get nearest neighbors
            neighbors = annoy_index.get_nns_by_item(i, num_neighbors)
            # Assign the same cluster ID to the point and its neighbors
            labels[neighbors] = cluster_id
            cluster_id += 1

    return labels

Is this even possible with ANNOY algorithm to get clusters directly without involving get_nns_by_item, which bloats computational complexity?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster analysis #674

Cluster analysis #674

ivanstepanovftw commented Dec 17, 2024

Cluster analysis #674

Cluster analysis #674

Comments

ivanstepanovftw commented Dec 17, 2024