|
1 | | ------------------------ |
2 | | - HUB TOOLBOX VERSION 2 |
3 | | - November 5, 2013 |
4 | | ------------------------ |
| 1 | +------------------------- |
| 2 | + HUB TOOLBOX VERSION 2.1 |
| 3 | + October 16, 2015 |
| 4 | +------------------------- |
5 | 5 |
|
6 | 6 | This is the HUB TOOLBOX for Matlab/Octave |
7 | 7 | (c) 2013, Dominik Schnitzer < [email protected]> |
@@ -65,36 +65,49 @@ selection challenge. |
65 | 65 | http://archive.ics.uci.edu/ml/datasets/Dexter |
66 | 66 |
|
67 | 67 |
|
| 68 | +>> hubness_analysis |
| 69 | + |
| 70 | +NO PARAMETERS GIVEN! Loading & evaluating DEXTER data set. |
| 71 | + |
| 72 | +DEXTER is a text classification problem in a bag-of-word |
| 73 | +representation. This is a two-class classification problem |
| 74 | +with sparse continuous input variables. |
| 75 | +This dataset is one of five datasets of the NIPS 2003 feature |
| 76 | +selection challenge. |
| 77 | + |
| 78 | +http://archive.ics.uci.edu/ml/datasets/Dexter |
| 79 | + |
| 80 | + |
68 | 81 | Hubness Analysis |
69 | 82 |
|
70 | 83 | ORIGINAL DATA: |
71 | 84 | data set hubness (S^n=5) : 4.22 |
72 | 85 | % of anti-hubs at k=5 : 26.67% |
73 | 86 | % of k=5-NN lists the largest hub occurs: 23.67% |
74 | | -k=5-NN classification accurracy : 56.67% |
| 87 | +k=5-NN classification accuracy : 80.33% |
75 | 88 | Goodman-Kruskal index (higher=better) : 0.104 |
76 | | -original dimensionality : 300 |
| 89 | +original dimensionality : 20000 |
77 | 90 | intrinsic dimensionality estimate : 161 |
78 | 91 |
|
79 | 92 | MUTUAL PROXIMITY (Empiric/Slow): |
80 | | -data set hubness (S^n=5) : 0.58 |
| 93 | +data set hubness (S^n=5) : 0.64 |
81 | 94 | % of anti-hubs at k=5 : 3.33% |
82 | | -% of k=5-NN lists the largest hub occurs: 5.67% |
83 | | -k=5-NN classification accurracy : 67.00% |
84 | | -Goodman-Kruskal index (higher=better) : 0.136 |
| 95 | +% of k=5-NN lists the largest hub occurs: 6.00% |
| 96 | +k=5-NN classification accuracy : 90.00% |
| 97 | +Goodman-Kruskal index (higher=better) : 0.132 |
85 | 98 |
|
86 | 99 | LOCAL SCALING (Original, k=10): |
87 | 100 | data set hubness (S^n=5) : 1.42 |
88 | 101 | % of anti-hubs at k=5 : 5.33% |
89 | 102 | % of k=5-NN lists the largest hub occurs: 7.67% |
90 | | -k=5-NN classification accurracy : 66.00% |
| 103 | +k=5-NN classification accuracy : 86.00% |
91 | 104 | Goodman-Kruskal index (higher=better) : 0.156 |
92 | 105 |
|
93 | 106 | SHARED NEAREST NEIGHBORS (k=10): |
94 | | -data set hubness (S^n=5) : 1.55 |
95 | | -% of anti-hubs at k=5 : 7.00% |
96 | | -% of k=5-NN lists the largest hub occurs: 7.33% |
97 | | -k=5-NN classification accurracy : 60.67% |
98 | | -Goodman-Kruskal index (higher=better) : 0.369 |
| 107 | +data set hubness (S^n=5) : 1.77 |
| 108 | +% of anti-hubs at k=5 : 5.67% |
| 109 | +% of k=5-NN lists the largest hub occurs: 8.67% |
| 110 | +k=5-NN classification accuracy : 73.33% |
| 111 | +Goodman-Kruskal index (higher=better) : 0.152 |
99 | 112 |
|
100 | 113 | >> |
0 commit comments