Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

same sign for all instance-level prediction #16

Open
vista-analytics opened this issue Jul 17, 2018 · 5 comments
Open

same sign for all instance-level prediction #16

vista-analytics opened this issue Jul 17, 2018 · 5 comments

Comments

@vista-analytics
Copy link

I generated some synthetic data using the 20newsgroup to run experiment on mi-SVM and MI-SVM. I noticed that, if I predict the labels in instance-level (actually that happens for me on bag-level too), all predictions share the same sign (either positive or negative, depending on the data). The AUC looks good, which means the ranking is correct. I guess this might due to some issues caused by library version, or not? Does anyone come across the same issue? Or, which versions of libraries should we use? Thanks!

@garydoranjr
Copy link
Owner

I have not observed this behavior before; in the past, I have seen bag-level accuracies above chance level. But if all bags have the same sign, then I would have seen accuracy at chance level only. Maybe it has something to do with how the synthetic data is generated. Are the classes highly imbalanced at the bag level?

@vista-analytics
Copy link
Author

Thank you Gary for the comment. I guess it's not because of the data, since I came across the same problem when I ran the example.py code using the musk1 dataset. In that code, I added a new classifier: miSVM(kernel='linear', C=1.0, max_iters=10). This classifier terminates after iteration 1. It told me that Class Changes = 0. When I added a print(svm._predictions) statement in the miSVM source code, it seems that all predictions are positive and the values are very close, which explains why Class Changes = 0 and the code terminates at iteration 1. It also happens to my synthetic data, so I guess it might be library version issue? Your insights are highly appreciated!

@garydoranjr
Copy link
Owner

I tried to replicate the results you are getting. I added miSVM to the example.py script and it runs with the following output:

$ ./example.py
Non-random start...

Iteration 1...
Training SVM...
     pcost       dcost       gap    pres   dres
 0: -4.7135e+01 -1.9465e+00  3e+03  5e+01  7e-09
 1: -6.9802e-01 -1.9425e+00  3e+01  6e-01  7e-09
 2: -2.2093e-01 -1.6692e+00  5e+00  6e-02  8e-10
 3: -1.4628e-01 -1.1486e+00  2e+00  3e-02  3e-10
 4: -9.0601e-02 -6.0664e-01  9e-01  9e-03  1e-10
 5: -4.3735e-02 -3.0183e-01  4e-01  3e-03  4e-11
 6: -2.4291e-02 -1.3206e-01  2e-01  1e-03  2e-11
 7: -1.5611e-02 -4.3816e-02  4e-02  2e-04  1e-11
 8: -1.6996e-02 -2.3812e-02  8e-03  3e-05  8e-12
 9: -1.7919e-02 -1.9351e-02  2e-03  4e-06  8e-12
10: -1.8219e-02 -1.8410e-02  2e-04  4e-07  8e-12
11: -1.8269e-02 -1.8278e-02  9e-06  2e-08  8e-12
12: -1.8272e-02 -1.8272e-02  4e-07  6e-10  9e-12
13: -1.8272e-02 -1.8272e-02  3e-08  4e-11  8e-12
Optimal solution found.
Recomputing classes...
Class Changes: 0
Test labels: [ 1. -1. -1. -1.  1.  1. -1.  1.  1.  1.]
Predictions: [ 1. -1. -1. -1.  1.  1.  1.  1.  1.  1.]

miSVM Accuracy: 90.0%

So it also finishes after one iteration, but i get predictions on the test set that are not all of the same sign. I added the following lines to print those out:

         predictions = classifier.predict(test_bags)
+        print('Test labels: %s' % str(test_labels))
+        print('Predictions: %s' % (np.sign(predictions)))
         accuracies[algorithm] = np.average(test_labels == np.sign(predictions))

Here are the list of packages I have installed and their versions:

alabaster==0.7.6
altgraph==0.15
appdirs==1.4.3
attrs==17.4.0
Babel==2.5.3
backports==1.0
backports-abc==0.5
backports.functools-lru-cache==1.2.1
backports.ssl-match-hostname==3.5.0.1
basemap==1.0.7
Beaker==1.8.1
beautifulsoup4==4.6.0
certifi==2018.1.18
cffi==1.11.4
chardet==3.0.4
Cheetah3==3.0.0
CherryPy==5.0.1
colored==1.3.5
cuttime==0.1
cvxopt==1.1.8
cycler==0.10.0
Cython==0.27.3
decorator==4.2.1
docutils==0.14
emd==1.0
flux-emd==1.0
flux-kernel==1.0
flux-migraph==1.0
funcsigs==1.0.2
functools32==3.2.3.post2
gdbm==2.7.14
gps==3.17
h5py==2.7.0
httplib2==0.9.2
idna==2.6
imagesize==0.7.1
Jinja2==2.10
libxml2-python==2.9.7
lxml==4.1.1
macholib==1.9
Mako==1.0.7
MarkupSafe==0.23
matplotlib==2.1.1
misvm==1.0
modulegraph==0.16
monotonic==1.4
mpmath==0.19
netCDF4==1.2.9
nose==1.3.7
numpy==1.14.0
oauth2==1.9.0.post1
olefile==0.44
passmash===master
pdfminer==20140328
Pillow==5.0.0
pkgconfig==1.1.0
pluggy==0.6.0
Polygon2==2.0.8
progressbar==2.3
py==1.5.2
py2app==0.14
pycairo==1.15.4
pycparser==2.18
Pygments==2.2.0
pygobject==3.26.1
pyobjc-core==3.0.4
pyobjc-framework-Cocoa==3.0.4
pyopencl==2017.2.2
PyOpenGL==3.1.0
PyOpenGL-accelerate==3.1.0
pyparsing==2.2.0
pytest==3.3.2
python-dateutil==2.6.1
pytools==2017.6
pytz==2017.3
PyYAML==3.12
pyzmq==16.0.4
rdc==1.0
requests==2.18.4
roman==2.0.0
scikit-learn==0.17.1
scipy==1.0.0
simplejson==3.6.5
singledispatch==3.4.0.3
six==1.11.0
snowballstemmer==1.2.0
Sphinx==1.6.6
sphinx-rtd-theme==0.2.4
sphinxcontrib-websupport==1.0.1
StereoVision==1.0.0
subprocess32==3.2.7
sympy==1.0
termcolor==1.1.0
Tkinter==0.0.0
tornado==4.5.2
tsne==0.1.1
typing==3.6.2
Unidecode==1.0.22
urllib3==1.22
virtualenv==15.1.0
wordcloud==1.2.1
wxPython==3.0.2.0
wxPython-common==3.0.2.0
yelp==1.0.2

@vista-analytics
Copy link
Author

Thank you Gary. I will re-run the experiment using your library version. Thanks.

@ventouris
Copy link

Have you been able to find the problem? I have the same issue in my data as well. All predictions are from one class and the values too close to each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants