Skip to content

Commit 7f97812

Browse files
committed
Update citation reference
1 parent f69fe12 commit 7f97812

File tree

2 files changed

+33
-14
lines changed

2 files changed

+33
-14
lines changed

README.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
[![Latest release](https://img.shields.io/github/release/fakenmc/generateData.svg)](https://github.com/fakenmc/generateData/releases)
22
[![MIT Licence](https://img.shields.io/badge/license-MIT-yellowgreen.svg)](https://opensource.org/licenses/MIT/)
3+
[![View Generate Data for Clustering on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://www.mathworks.com/matlabcentral/fileexchange/37435-generate-data-for-clustering)
34

45
# generateData
56

@@ -100,14 +101,28 @@ rand("state", 123);
100101
randn("state", 123);
101102
```
102103

104+
## Previous behaviors and reproducibility of results
105+
106+
Before [v2.0.0](https://github.com/fakenmc/generateData/tree/v2.0.0), lines
107+
supporting clusters were parameterized with slopes instead of angles. We found
108+
this caused difficulties when choosing line orientation, thus the change to
109+
angles, which are much easier to work with.
110+
Version [v1.3.0](https://github.com/fakenmc/generateData/tree/v1.3.0) still
111+
uses slopes, for those who prefer this behavior.
112+
113+
For reproducing results in studies published before May 2020, use version
114+
[v1.2.0](https://github.com/fakenmc/generateData/tree/v1.2.0) instead.
115+
Subsequent versions were optimized in a way that changed the order in which
116+
the required random values are generated, thus producing slightly different
117+
results.
118+
103119
## Reference
104120

105121
If you use this function in your work, please cite the following reference:
106122

107-
- Fachada, N., Figueiredo, M.A.T., Lopes, V.V., Martins, R.C., Rosa,
108-
A.C., [Spectrometric differentiation of yeast strains using minimum volume
109-
increase and minimum direction change clustering criteria](http://www.sciencedirect.com/science/article/pii/S0167865514000889),
110-
Pattern Recognition Letters, vol. 45, pp. 55-61 (2014), doi: http://dx.doi.org/10.1016/j.patrec.2014.03.008
123+
- Fachada, N., & Rosa, A. C. (2020).
124+
[generateData—A 2D data generator](https://doi.org/10.1016/j.simpa.2020.100017).
125+
Software Impacts, 4:100017. doi: [10.1016/j.simpa.2020.100017](https://doi.org/10.1016/j.simpa.2020.100017)
111126

112127
## License
113128

generateData.m

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@
1111
totalPoints, ...
1212
varargin ...
1313
)
14-
% GENERATEDATA Generates 2D data for clustering. Data is created along
14+
% GENERATEDATA Generates 2D data for clustering. Data is created along
1515
% straight lines, which can be more or less parallel
1616
% depending on the angleStd parameter.
1717
%
18-
% [data clustPoints idx centers angles lengths] =
18+
% [data clustPoints idx centers angles lengths] =
1919
% GENERATEDATA(angleMean, angleStd, numClusts, xClustAvgSep, ...
2020
% yClustAvgSep, lengthMean, lengthStd, lateralStd, ...
2121
% totalPoints, ...)
@@ -31,7 +31,7 @@
3131
% Line lengths are drawn from the folded normal
3232
% distribution.
3333
% lengthStd - Standard deviation of line lengths.
34-
% lateralStd - Cluster "fatness", i.e., the standard deviation of the
34+
% lateralStd - Cluster "fatness", i.e., the standard deviation of the
3535
% distance from each point to its projection on the
3636
% line. The way this distance is obtained is controlled by
3737
% the optional 'pointOffset' parameter.
@@ -64,18 +64,18 @@
6464
% of each point.
6565
% centers - Matrix (numClusts x 2) containing centers from where
6666
% clusters were generated.
67-
% angles - Vector (numClusts x 1) containing the effective angles
67+
% angles - Vector (numClusts x 1) containing the effective angles
6868
% of the lines used to generate clusters.
69-
% lengths - Vector (numClusts x 1) containing the effective lengths
69+
% lengths - Vector (numClusts x 1) containing the effective lengths
7070
% of the lines used to generate clusters.
7171
%
7272
% ----------------------------------------------------------
7373
% Usage example:
7474
%
7575
% [data cp idx] = GENERATEDATA(pi / 2, pi / 8, 5, 15, 15, 5, 1, 2, 200);
7676
%
77-
% This creates 5 clusters with a total of 200 points, with a mean angle
78-
% of pi/2 (std=pi/8), separated in average by 15 units in both x and y
77+
% This creates 5 clusters with a total of 200 points, with a mean angle
78+
% of pi/2 (std=pi/8), separated in average by 15 units in both x and y
7979
% directions, with mean length of 5 units (std=1) and a "fatness" or
8080
% spread of 2 units.
8181
%
@@ -84,8 +84,12 @@
8484
% scatter(data(:, 1), data(:, 2), 8, idx);
8585

8686
% Copyright (c) 2012-2020 Nuno Fachada
87-
% Distributed under the MIT License (See accompanying file LICENSE or copy
87+
% Distributed under the MIT License (See accompanying file LICENSE or copy
8888
% at http://opensource.org/licenses/MIT)
89+
%
90+
% Reference:
91+
% Fachada, N., & Rosa, A. C. (2020). generateData—A 2D data generator.
92+
% Software Impacts, 4:100017. doi: 10.1016/j.simpa.2020.100017
8993

9094
% Known distributions for sampling points along lines
9195
pointDists = {'unif', 'norm'};
@@ -225,7 +229,7 @@
225229
% each point
226230
perpAngles = angles(i) + sign(points_dist) * pi / 2;
227231
perpVecs = [cos(perpAngles) sin(perpAngles)];
228-
232+
229233
% Set vector magnitudes
230234
perpVecs = abs(points_dist) .* perpVecs;
231235

@@ -253,4 +257,4 @@
253257

254258
% Update idx
255259
idx(cumSumPoints(i) + 1 : cumSumPoints(i + 1)) = i;
256-
end;
260+
end;

0 commit comments

Comments
 (0)