Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python unit test is broken? #37

Open
XiaoConstantine opened this issue Aug 22, 2022 · 5 comments · May be fixed by #38
Open

Python unit test is broken? #37

XiaoConstantine opened this issue Aug 22, 2022 · 5 comments · May be fixed by #38

Comments

@XiaoConstantine
Copy link

OS: MacOS Monterey 12.5 (Intel chip)
Python: 3.10.5

❯ pytest tests
============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/xiao/development/github.com/XiaoConstantine/bolt-1
collected 4 items

tests/test_encoder.py ..F.                                                                                                                                                 [100%]

==================================================================================== FAILURES ====================================================================================
________________________________________________________________________________ test_unquantize _________________________________________________________________________________

    def test_unquantize():
        X, Q = _load_digits_X_Q(nqueries=20)
>       enc = bolt.Encoder('dot', accuracy='high').fit(X)

tests/test_encoder.py:151:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../dblalock/bolt/venv/lib/python3.10/site-packages/pybolt-0.1.4-py3.10-macosx-11-x86_64.egg/bolt/bolt_api.py:466: in fit
    centroids = _learn_centroids(X, ncentroids=ncentroids,
../../dblalock/bolt/venv/lib/python3.10/site-packages/pybolt-0.1.4-py3.10-macosx-11-x86_64.egg/bolt/bolt_api.py:142: in _learn_centroids
    centroids, labels = kmeans(X_in, ncentroids)
../../dblalock/bolt/venv/lib/python3.10/site-packages/pybolt-0.1.4-py3.10-macosx-11-x86_64.egg/bolt/bolt_api.py:106: in kmeans
    seeds = kmc2.kmc2(X, k).astype(np.float32)
kmc2.pyx:97: in kmc2.kmc2
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: probabilities contain NaN

mtrand.pyx:935: ValueError
@clark-hive
Copy link

clark-hive commented Sep 9, 2022

I've made a PR to kmc2 code and added a hack to the bolt api the with a PR here: #38

The issue is that each row is padded with 0's: Since there are 16 rows, but we only get 15 values per codebook from python, I have zeroed out the last row at all columns here: #29 (comment).
When we pass in columns 1 at a time to get centroids for each column. The first column is all 0's. The kmc2 code errors when it has only 1 unique row: it updates points with the normalized the distances of every row from each other. This is nan if all the rows are the same, since the sum is 0.

This is mentioned in the thread where the external KMC2 package is included: #4 (comment).

@XiaoConstantine
Copy link
Author

Make sense to me 👍 will wait for @dblalock to take a look when he gets time

@algebravic
Copy link

I'm using Python 3.10.0 on my intel mac. I couldn't pip install kmc2 because the cython interface has changed. I did clone the kmc2 repository and rand cython kmc2 which then built. However, I still got the Nan error reported above.

@clark-hive
Copy link

Did you run python setup.py install inside the bolt repo after checking out the branch with the updated python/bolt/bolt_api.py?

Following the steps here #4 (comment) .

I just tested this on python 3.7 and ran python setup.py install in both repos; I've not tried with cython.

@clark-hive
Copy link

clark-hive commented Oct 13, 2022

Here's commands that pass the pytests on macOS 12.5:

git clone https://github.com/dblalock/bolt.git
pip install -r requirements.txt
python setup.py  install
cd ..
git clone  [email protected]:clark-hive/kmc2.git
cd kmc2/
git checkout clark/allow_duplicated_inputs
python setup.py install
cd ../bolt/
pytest tests/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants