same sign for all instance-level prediction #16

vista-analytics · 2018-07-17T19:58:01Z

I generated some synthetic data using the 20newsgroup to run experiment on mi-SVM and MI-SVM. I noticed that, if I predict the labels in instance-level (actually that happens for me on bag-level too), all predictions share the same sign (either positive or negative, depending on the data). The AUC looks good, which means the ranking is correct. I guess this might due to some issues caused by library version, or not? Does anyone come across the same issue? Or, which versions of libraries should we use? Thanks!

garydoranjr · 2018-07-22T22:29:39Z

I have not observed this behavior before; in the past, I have seen bag-level accuracies above chance level. But if all bags have the same sign, then I would have seen accuracy at chance level only. Maybe it has something to do with how the synthetic data is generated. Are the classes highly imbalanced at the bag level?

vista-analytics · 2018-07-23T14:30:32Z

Thank you Gary for the comment. I guess it's not because of the data, since I came across the same problem when I ran the example.py code using the musk1 dataset. In that code, I added a new classifier: miSVM(kernel='linear', C=1.0, max_iters=10). This classifier terminates after iteration 1. It told me that Class Changes = 0. When I added a print(svm._predictions) statement in the miSVM source code, it seems that all predictions are positive and the values are very close, which explains why Class Changes = 0 and the code terminates at iteration 1. It also happens to my synthetic data, so I guess it might be library version issue? Your insights are highly appreciated!

garydoranjr · 2018-07-26T04:34:13Z

I tried to replicate the results you are getting. I added miSVM to the example.py script and it runs with the following output:

$ ./example.py
Non-random start...

Iteration 1...
Training SVM...
     pcost       dcost       gap    pres   dres
 0: -4.7135e+01 -1.9465e+00  3e+03  5e+01  7e-09
 1: -6.9802e-01 -1.9425e+00  3e+01  6e-01  7e-09
 2: -2.2093e-01 -1.6692e+00  5e+00  6e-02  8e-10
 3: -1.4628e-01 -1.1486e+00  2e+00  3e-02  3e-10
 4: -9.0601e-02 -6.0664e-01  9e-01  9e-03  1e-10
 5: -4.3735e-02 -3.0183e-01  4e-01  3e-03  4e-11
 6: -2.4291e-02 -1.3206e-01  2e-01  1e-03  2e-11
 7: -1.5611e-02 -4.3816e-02  4e-02  2e-04  1e-11
 8: -1.6996e-02 -2.3812e-02  8e-03  3e-05  8e-12
 9: -1.7919e-02 -1.9351e-02  2e-03  4e-06  8e-12
10: -1.8219e-02 -1.8410e-02  2e-04  4e-07  8e-12
11: -1.8269e-02 -1.8278e-02  9e-06  2e-08  8e-12
12: -1.8272e-02 -1.8272e-02  4e-07  6e-10  9e-12
13: -1.8272e-02 -1.8272e-02  3e-08  4e-11  8e-12
Optimal solution found.
Recomputing classes...
Class Changes: 0
Test labels: [ 1. -1. -1. -1.  1.  1. -1.  1.  1.  1.]
Predictions: [ 1. -1. -1. -1.  1.  1.  1.  1.  1.  1.]

miSVM Accuracy: 90.0%

So it also finishes after one iteration, but i get predictions on the test set that are not all of the same sign. I added the following lines to print those out:

         predictions = classifier.predict(test_bags)
+        print('Test labels: %s' % str(test_labels))
+        print('Predictions: %s' % (np.sign(predictions)))
         accuracies[algorithm] = np.average(test_labels == np.sign(predictions))

Here are the list of packages I have installed and their versions:

alabaster==0.7.6
altgraph==0.15
appdirs==1.4.3
attrs==17.4.0
Babel==2.5.3
backports==1.0
backports-abc==0.5
backports.functools-lru-cache==1.2.1
backports.ssl-match-hostname==3.5.0.1
basemap==1.0.7
Beaker==1.8.1
beautifulsoup4==4.6.0
certifi==2018.1.18
cffi==1.11.4
chardet==3.0.4
Cheetah3==3.0.0
CherryPy==5.0.1
colored==1.3.5
cuttime==0.1
cvxopt==1.1.8
cycler==0.10.0
Cython==0.27.3
decorator==4.2.1
docutils==0.14
emd==1.0
flux-emd==1.0
flux-kernel==1.0
flux-migraph==1.0
funcsigs==1.0.2
functools32==3.2.3.post2
gdbm==2.7.14
gps==3.17
h5py==2.7.0
httplib2==0.9.2
idna==2.6
imagesize==0.7.1
Jinja2==2.10
libxml2-python==2.9.7
lxml==4.1.1
macholib==1.9
Mako==1.0.7
MarkupSafe==0.23
matplotlib==2.1.1
misvm==1.0
modulegraph==0.16
monotonic==1.4
mpmath==0.19
netCDF4==1.2.9
nose==1.3.7
numpy==1.14.0
oauth2==1.9.0.post1
olefile==0.44
passmash===master
pdfminer==20140328
Pillow==5.0.0
pkgconfig==1.1.0
pluggy==0.6.0
Polygon2==2.0.8
progressbar==2.3
py==1.5.2
py2app==0.14
pycairo==1.15.4
pycparser==2.18
Pygments==2.2.0
pygobject==3.26.1
pyobjc-core==3.0.4
pyobjc-framework-Cocoa==3.0.4
pyopencl==2017.2.2
PyOpenGL==3.1.0
PyOpenGL-accelerate==3.1.0
pyparsing==2.2.0
pytest==3.3.2
python-dateutil==2.6.1
pytools==2017.6
pytz==2017.3
PyYAML==3.12
pyzmq==16.0.4
rdc==1.0
requests==2.18.4
roman==2.0.0
scikit-learn==0.17.1
scipy==1.0.0
simplejson==3.6.5
singledispatch==3.4.0.3
six==1.11.0
snowballstemmer==1.2.0
Sphinx==1.6.6
sphinx-rtd-theme==0.2.4
sphinxcontrib-websupport==1.0.1
StereoVision==1.0.0
subprocess32==3.2.7
sympy==1.0
termcolor==1.1.0
Tkinter==0.0.0
tornado==4.5.2
tsne==0.1.1
typing==3.6.2
Unidecode==1.0.22
urllib3==1.22
virtualenv==15.1.0
wordcloud==1.2.1
wxPython==3.0.2.0
wxPython-common==3.0.2.0
yelp==1.0.2

vista-analytics · 2018-07-26T13:13:41Z

Thank you Gary. I will re-run the experiment using your library version. Thanks.

ventouris · 2020-05-19T18:28:31Z

Have you been able to find the problem? I have the same issue in my data as well. All predictions are from one class and the values too close to each other.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

same sign for all instance-level prediction #16

same sign for all instance-level prediction #16

vista-analytics commented Jul 17, 2018

garydoranjr commented Jul 22, 2018

vista-analytics commented Jul 23, 2018

garydoranjr commented Jul 26, 2018

vista-analytics commented Jul 26, 2018

ventouris commented May 19, 2020

same sign for all instance-level prediction #16

same sign for all instance-level prediction #16

Comments

vista-analytics commented Jul 17, 2018

garydoranjr commented Jul 22, 2018

vista-analytics commented Jul 23, 2018

garydoranjr commented Jul 26, 2018

vista-analytics commented Jul 26, 2018

ventouris commented May 19, 2020