Skip to content

Commit 1647095

Browse files
authored
Merge pull request #64 from linqiaozhi/master
Support variable degree of freedom, better documentation, added license
2 parents 3e78adf + e960b3c commit 1647095

11 files changed

+1591
-241
lines changed

LICENSE.txt

+135
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
Different attribution requirements and conditions apply to different files in this repository:
2+
3+
========================================================================================
4+
The following license applies to the following files in the src directory: tsne.cpp, tsne.h, sptree.cpp, sptree.h, vptree.h
5+
6+
Copyright (c) 2014, Laurens van der Maaten (Delft University of Technology)
7+
All rights reserved.
8+
9+
Redistribution and use in source and binary forms, with or without
10+
modification, are permitted provided that the following conditions are met:
11+
1. Redistributions of source code must retain the above copyright
12+
notice, this list of conditions and the following disclaimer.
13+
2. Redistributions in binary form must reproduce the above copyright
14+
notice, this list of conditions and the following disclaimer in the
15+
documentation and/or other materials provided with the distribution.
16+
3. All advertising materials mentioning features or use of this software
17+
must display the following acknowledgement:
18+
This product includes software developed by the Delft University of Technology.
19+
4. Neither the name of the Delft University of Technology nor the names of
20+
its contributors may be used to endorse or promote products derived from
21+
this software without specific prior written permission.
22+
23+
THIS SOFTWARE IS PROVIDED BY LAURENS VAN DER MAATEN ''AS IS'' AND ANY EXPRESS
24+
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
25+
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
26+
EVENT SHALL LAURENS VAN DER MAATEN BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
27+
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
28+
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
29+
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
30+
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
31+
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
32+
OF SUCH DAMAGE.
33+
34+
35+
========================================================================================
36+
The following license applies to the following files in the src directory: nbodyfft.h, nbodyfft.cpp, parallel_for.h, time_code.h
37+
38+
(The MIT License)
39+
40+
Copyright (c) [2019] [George Linderman]
41+
42+
Permission is hereby granted, free of charge, to any person obtaining a copy
43+
of this software and associated documentation files (the "Software"), to deal
44+
in the Software without restriction, including without limitation the rights
45+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
46+
copies of the Software, and to permit persons to whom the Software is
47+
furnished to do so, subject to the following conditions:
48+
49+
The above copyright notice and this permission notice shall be included in all
50+
copies or substantial portions of the Software.
51+
52+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
53+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
54+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
55+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
56+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
57+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
58+
SOFTWARE.
59+
60+
========================================================================================
61+
The following license applies to following files in the progress_bar directory: ProgressBar.hpp
62+
63+
(The MIT License)
64+
65+
Copyright (c) 2016 Prakhar Srivastav <[email protected]>
66+
67+
Permission is hereby granted, free of charge, to any person obtaining
68+
a copy of this software and associated documentation files (the
69+
'Software'), to deal in the Software without restriction, including
70+
without limitation the rights to use, copy, modify, merge, publish,
71+
distribute, sublicense, and/or sell copies of the Software, and to
72+
permit persons to whom the Software is furnished to do so, subject to
73+
the following conditions:
74+
75+
The above copyright notice and this permission notice shall be
76+
included in all copies or substantial portions of the Software.
77+
78+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
79+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
80+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
81+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
82+
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
83+
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
84+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
85+
86+
========================================================================================
87+
The following license applies to following files in the src directory: annoylib.h
88+
89+
90+
Copyright (c) 2013 Spotify AB
91+
92+
Licensed under the Apache License, Version 2.0 (the "License"); you may not
93+
use this file except in compliance with the License. You may obtain a copy of
94+
the License at
95+
96+
http://www.apache.org/licenses/LICENSE-2.0
97+
98+
Unless required by applicable law or agreed to in writing, software
99+
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
100+
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
101+
License for the specific language governing permissions and limitations under
102+
the License.
103+
104+
========================================================================================
105+
The following license applies to all files in the src/winlibs/fftw directory
106+
107+
FFTW is Copyright © 2003, 2007-11 Matteo Frigo, Copyright © 2003, 2007-11
108+
Massachusetts Institute of Technology.
109+
110+
FFTW is free software; you can redistribute it and/or modify it under the terms
111+
of the GNU General Public License as published by the Free Software Foundation;
112+
either version 2 of the License, or (at your option) any later version.
113+
114+
This program is distributed in the hope that it will be useful, but WITHOUT ANY
115+
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
116+
PARTICULAR PURPOSE. See the GNU General Public License for more details.
117+
118+
You should have received a copy of the GNU General Public License along with
119+
this program; if not, write to the Free Software Foundation, Inc., 51 Franklin
120+
Street, Fifth Floor, Boston, MA 02110-1301 USA You can also find the GPL on the
121+
GNU web site.
122+
123+
In addition, we kindly ask you to acknowledge FFTW and its authors in any
124+
program or publication in which you use FFTW. (You are not required to do so;
125+
it is up to your common sense to decide whether you want to comply with this
126+
request or not.) For general publications, we suggest referencing: Matteo Frigo
127+
and Steven G. Johnson, “The design and implementation of FFTW3,” Proc. IEEE 93
128+
(2), 216–231 (2005).
129+
130+
Non-free versions of FFTW are available under terms different from those of the
131+
General Public License. (e.g. they do not require you to accompany any object
132+
code using FFTW with the corresponding source code.) For these alternative
133+
terms you must purchase a license from MIT’s Technology Licensing Office. Users
134+
interested in such a license should contact us ([email protected]) for more
135+
information.

README.md

+19-11
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,25 @@
11
# FFT-accelerated Interpolation-based t-SNE (FIt-SNE)
22
## Introduction
3-
t-Stochastic Neighborhood Embedding ([t-SNE](https://lvdmaaten.github.io/tsne/)) is a highly successful method for dimensionality reduction and visualization of high dimensional datasets. A popular [implementation](https://github.com/lvdmaaten/bhtsne) of t-SNE uses the Barnes-Hut algorithm to approximate the gradient at each iteration of gradient descent. We modified this implementation as follows:
3+
t-Stochastic Neighborhood Embedding ([t-SNE](https://lvdmaaten.github.io/tsne/)) is a highly successful method for dimensionality reduction and visualization of high dimensional datasets. A popular [implementation](https://github.com/lvdmaaten/bhtsne) of t-SNE uses the Barnes-Hut algorithm to approximate the gradient at each iteration of gradient descent. We accelerated this implementation as follows:
44

55
* Computation of the N-body Simulation: Instead of approximating the N-body simulation using Barnes-Hut, we interpolate onto an equispaced grid and use FFT to perform the convolution, dramatically reducing the time to compute the gradient at each iteration of gradient descent. See the [this](http://gauss.math.yale.edu/~gcl22/blog/numerics/low-rank/t-sne/2018/01/11/low-rank-kernels.html) post for some intuition on how it works.
6-
* Computation of Input Similiarities: Instead of computing nearest neighbors using vantage-point trees, we approximate nearest neighbors using the [Annoy](https://github.com/spotify/annoy) library. The neighbor lookups are multithreaded to take advantage of machines with multiple cores. Using "near" neighbors as opposed to strictly "nearest" neighbors is faster, but also has a smoothing effect, which can be useful for embedding some datasets (see [Linderman et al. (2017)](https://arxiv.org/abs/1711.04712)). If subtle detail is required (e.g. in identifying small clusters), then use vantage-point trees (which is also multithreaded in this implementation).
7-
* Early exaggeration: In [Linderman and Steinerberger (2017)](https://arxiv.org/abs/1706.02582), we showed that appropriately choosing the early exaggeration coefficient can lead to improved embedding of swissrolls and other synthetic datasets.
8-
* Late exaggeration: Increasing the exaggeration coefficient late in the optimization process (e.g. after 800 of 1000 iterations) can improve separation of the clusters.
6+
* Computation of Input Similarities: Instead of computing nearest neighbors using vantage-point trees, we approximate nearest neighbors using the [Annoy](https://github.com/spotify/annoy) library. The neighbor lookups are multithreaded to take advantage of machines with multiple cores. Using "near" neighbors as opposed to strictly "nearest" neighbors is faster, but also has a smoothing effect, which can be useful for embedding some datasets (see [Linderman et al. (2017)](https://arxiv.org/abs/1711.04712)). If subtle detail is required (e.g. in identifying small clusters), then use vantage-point trees (which is also multithreaded in this implementation).
7+
98

109
Check out our [preprint](https://arxiv.org/abs/1712.09005) for more details and some benchmarks.
1110

12-
R, Matlab, and Python wrappers are `fast_tsne.R`, `fast_tsne.m`, and `fast_tsne.py` respectively. [Gioele La Manno](https://twitter.com/GioeleLaManno) implemented a Python (Cython) wrapper, which is available on PyPI [here](https://pypi.python.org/pypi/fitsne).
11+
## Features
12+
Additionally, this implementation includes the following features:
13+
* Early exaggeration: In [Linderman and Steinerberger (2017)](https://arxiv.org/abs/1706.02582), we showed that appropriately choosing the early exaggeration coefficient can lead to improved embedding of swissrolls and other synthetic datasets. Early exaggeration is built into all t-SNE implementations; here we highlight its importance as a parameter.
14+
* Late exaggeration: Increasing the exaggeration coefficient late in the optimization process can improve separation of the clusters. [Kobak and Berens (2018)](https://www.biorxiv.org/content/10.1101/453449v1) suggest starting late exaggeration immediately following early exaggeration.
15+
* Initialization: Custom initialization can be provided from Python/Matlab/R. As suggested by [Kobak and Berens (2018)](https://www.biorxiv.org/content/10.1101/453449v1), initializing t-SNE with the first two principal components (scaled to have standard deviation 0.0001) results in an embedding which often preserves the global structure more effectively than the default random normalization. See there for other initialisation tricks.
16+
* Variable degrees of freedom: [Kobak et al. (2019)]() show that decreasing the degree of freedom (df) of the t-distribution (resulting in heavier tails) reveals fine structure that is not visible in standard t-SNE.
17+
* Perplexity combination: The perplexity parameter determines the width of the Gaussian kernel, such that small perplexity values uncover local structure while larger values reveal global structure. [Kobak and Berens (2018)](https://www.biorxiv.org/content/10.1101/453449v1) show that using combination of several perplexity values, resulting in a multi-scale embedding, can be useful.
18+
* All optimisation parameters can be controlled from Python/Matlab/R. For example, [Belkina et al. (2018)](https://www.biorxiv.org/content/10.1101/451690v2) highlight the importance of increasing the learning rate when embedding large data sets.
19+
1320

1421
## Installation
22+
R, Matlab, and Python wrappers are `fast_tsne.R`, `fast_tsne.m`, and `fast_tsne.py` respectively. Each of these wrappers can be used after installing FFTW and compiling the C++ code, as below. [Gioele La Manno](https://twitter.com/GioeleLaManno) implemented a Python (Cython) wrapper, which is available on PyPI [here](https://pypi.python.org/pypi/fitsne).
1523

1624
**Note:** If you update to a new version of FIt-SNE using `git pull`, be sure to recompile.
1725

@@ -34,18 +42,18 @@ If you would like to compile it yourself see below. The code has been currently
3442
2. Copy the binary file ( e.g. `x64/Debug/FItSNE.exe`) generated by the build process to the `bin/` folder
3543
3. For Windows, we have added all dependencies, including the [FFTW library](http://www.fftw.org/), which is distributed under the GNU General Public License. For the binary to find the FFTW DLLs, you need to either add `src/winlibs/fftw/` to your PATH, or to copy the DLLs into the `bin/` directory.
3644

37-
As of this commit, only the R wrapper properly calls the Windows executable.
38-
The Python and Matlab wrappers can be trivially changed to call it (just
39-
changing `bin/fast_tsne` to `bin/FItSNE.exe` in the code), and will be changed
40-
in future commits.
45+
As of this commit, only the R wrapper properly calls the Windows executable. The Python and Matlab wrappers can be trivially changed to call it (just changing `bin/fast_tsne` to `bin/FItSNE.exe` in the code), and will be changed in future commits.
4146

4247
Many thanks to [Josef Spidlen](https://github.com/jspidlen) for this Windows implementation!
4348

44-
## References
45-
If you use our software, please cite:
49+
## Acknowledgements and References
50+
We are grateful for members of the community who have [contributed](https://github.com/KlugerLab/FIt-SNE/graphs/contributors) to improving FIt-SNE, especially [Dmitry Kobak](https://github.com/dkobak), [Pavlin Poličar](https://github.com/pavlin-policar), and [Josef Spidlen](https://github.com/jspidlen).
51+
52+
If you use FIt-SNE, please cite:
4653

4754
George C. Linderman, Manas Rachh, Jeremy G. Hoskins, Stefan Steinerberger, Yuval Kluger. (2017). Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding. (2017) *arXiv:1712.09005* ([link](https://arxiv.org/abs/1712.09005))
4855

4956
Our implementation is derived from the Barnes-Hut implementation:
5057

5158
Laurens van der Maaten (2014). Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research, 15(1):3221–3245. ([link](https://dl.acm.org/citation.cfm?id=2627435.2697068))
59+

examples/test.ipynb

+957-91
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)