Skip to content

Commit bd05ffd

Browse files
committed
Updated to v2.1
1 parent 49755f1 commit bd05ffd

File tree

1 file changed

+29
-16
lines changed

1 file changed

+29
-16
lines changed

README

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
-----------------------
2-
HUB TOOLBOX VERSION 2
3-
November 5, 2013
4-
-----------------------
1+
-------------------------
2+
HUB TOOLBOX VERSION 2.1
3+
October 16, 2015
4+
-------------------------
55

66
This is the HUB TOOLBOX for Matlab/Octave
77
(c) 2013, Dominik Schnitzer <[email protected]>
@@ -65,36 +65,49 @@ selection challenge.
6565
http://archive.ics.uci.edu/ml/datasets/Dexter
6666

6767

68+
>> hubness_analysis
69+
70+
NO PARAMETERS GIVEN! Loading & evaluating DEXTER data set.
71+
72+
DEXTER is a text classification problem in a bag-of-word
73+
representation. This is a two-class classification problem
74+
with sparse continuous input variables.
75+
This dataset is one of five datasets of the NIPS 2003 feature
76+
selection challenge.
77+
78+
http://archive.ics.uci.edu/ml/datasets/Dexter
79+
80+
6881
Hubness Analysis
6982

7083
ORIGINAL DATA:
7184
data set hubness (S^n=5) : 4.22
7285
% of anti-hubs at k=5 : 26.67%
7386
% of k=5-NN lists the largest hub occurs: 23.67%
74-
k=5-NN classification accurracy : 56.67%
87+
k=5-NN classification accuracy : 80.33%
7588
Goodman-Kruskal index (higher=better) : 0.104
76-
original dimensionality : 300
89+
original dimensionality : 20000
7790
intrinsic dimensionality estimate : 161
7891

7992
MUTUAL PROXIMITY (Empiric/Slow):
80-
data set hubness (S^n=5) : 0.58
93+
data set hubness (S^n=5) : 0.64
8194
% of anti-hubs at k=5 : 3.33%
82-
% of k=5-NN lists the largest hub occurs: 5.67%
83-
k=5-NN classification accurracy : 67.00%
84-
Goodman-Kruskal index (higher=better) : 0.136
95+
% of k=5-NN lists the largest hub occurs: 6.00%
96+
k=5-NN classification accuracy : 90.00%
97+
Goodman-Kruskal index (higher=better) : 0.132
8598

8699
LOCAL SCALING (Original, k=10):
87100
data set hubness (S^n=5) : 1.42
88101
% of anti-hubs at k=5 : 5.33%
89102
% of k=5-NN lists the largest hub occurs: 7.67%
90-
k=5-NN classification accurracy : 66.00%
103+
k=5-NN classification accuracy : 86.00%
91104
Goodman-Kruskal index (higher=better) : 0.156
92105

93106
SHARED NEAREST NEIGHBORS (k=10):
94-
data set hubness (S^n=5) : 1.55
95-
% of anti-hubs at k=5 : 7.00%
96-
% of k=5-NN lists the largest hub occurs: 7.33%
97-
k=5-NN classification accurracy : 60.67%
98-
Goodman-Kruskal index (higher=better) : 0.369
107+
data set hubness (S^n=5) : 1.77
108+
% of anti-hubs at k=5 : 5.67%
109+
% of k=5-NN lists the largest hub occurs: 8.67%
110+
k=5-NN classification accuracy : 73.33%
111+
Goodman-Kruskal index (higher=better) : 0.152
99112

100113
>>

0 commit comments

Comments
 (0)