Skip to content

dataset_1.txt noise examples appear to be labeled as '-1' #1

@FelSiq

Description

@FelSiq

Hello,

I know the example provided in the package README is a synthetic example intended to showcase a basic program execution. However, I believe the score presented may be a bit misleading because in the dataset used, "dataset_1.txt", the noise instances appear to be labeled as "-1", not "0" as assumed by the package implementation itself. As far as I understand, this means they are considered a real cluster during the DBCV computation, thus substantially modifying the estimated metric score (reported estimation=0.6149, estimation w/ labels fixed=0.8576).

The same issue presumably applies to all other example datasets.

Also, am I correct by assuming that the distance metric used during the DBCV computation is the squared euclidean distance? I understand this is a legitimate choice; I just want to clarify if my understanding is correct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions