Skip to content

Commit bcc9b08

Browse files
committed
Update documentation according to latest changes
1 parent 08e9dad commit bcc9b08

File tree

2 files changed

+58
-29
lines changed

2 files changed

+58
-29
lines changed

README.md

Lines changed: 40 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,23 +14,32 @@ depending on the selected input parameters.
1414
```MATLAB
1515
[data, clustPoints, idx, centers, slopes, lengths] = ...
1616
generateData(slope, slopeStd, numClusts, xClustAvgSep, yClustAvgSep, ...
17-
lengthMean, lengthStd, lateralStd, totalPoints)
17+
lengthMean, lengthStd, lateralStd, totalPoints, ...)
1818
```
1919

2020
## Input parameters
2121

22-
Parameter | Description
23-
-------------- | ------------------------------------------------------------------------------------------------------
24-
`slopeMean` | Mean slope of the lines on which clusters are based. Line slopes are drawn from the normal distribution.
25-
`slopeStd` | Standard deviation of line slopes.
26-
`numClusts` | Number of clusters (and therefore of lines) to generate.
27-
`xClustAvgSep` | Average separation of line centers along the X axis.
28-
`yClustAvgSep` | Average separation of line centers along the Y axis.
29-
`lengthMean` | Mean length of the lines on which clusters are based. Line lengths are drawn from the folded normal distribution.
30-
`lengthStd` | Standard deviation of line lengths.
31-
`lateralStd` | Cluster "fatness", i.e., the standard deviation of the distance from each point to the respective line, in both *x* and *y* directions. This distance is obtained from the normal distribution with zero mean.
32-
`totalPoints` | Total points in generated data. These will be randomly divided between clusters using the half-normal distribution with unit standard deviation.
33-
`linePtsDist` | Optional parameter which specifies the distribution of points along lines. Possible values are `'unif'` (default) and `'norm'`. The former will distribute points uniformly along lines, while the latter will use a normal distribution (mean equal to the line center, standard deviation equal to one sixth of the line length). In the latter case, the line includes three standard deviations of the normal distribution, meaning that there is a small chance that some points are projected outside line limits.
22+
### Required parameters
23+
24+
Parameter | Description
25+
-------------- | -----------
26+
`slopeMean` | Mean slope of the lines on which clusters are based. Line slopes are drawn from the normal distribution.
27+
`slopeStd` | Standard deviation of line slopes.
28+
`numClusts` | Number of clusters (and therefore of lines) to generate.
29+
`xClustAvgSep` | Average separation of line centers along the X axis.
30+
`yClustAvgSep` | Average separation of line centers along the Y axis.
31+
`lengthMean` | Mean length of the lines on which clusters are based. Line lengths are drawn from the folded normal distribution.
32+
`lengthStd` | Standard deviation of line lengths.
33+
`lateralStd` | Cluster "fatness", i.e., the standard deviation of the distance from each point to its projection on the line. The way this distance is obtained is controlled by the optional `'pointOffset'` parameter.
34+
`totalPoints` | Total points in generated data. These will be randomly divided between clusters using the half-normal distribution with unit standard deviation.
35+
36+
### Optional named parameters
37+
38+
Parameter name | Parameter values | Default value | Description
39+
-------------- | ---------------------------------- | ------------- | -----------
40+
`allowEmpty` | `true`, `false` | `false` | Allow empty clusters?
41+
`pointDist` | `'unif'`, `'norm'` | `unif` | Specifies the distribution of points along lines, with two possible values: 1) `'unif'` distributes points uniformly along lines; or, 2) `'norm'` distribute points along lines using a normal distribution (line center is the mean and the line length is equal to 3 standard deviations).
42+
`pointOffset` | `1D`, `2D` | `2D` | Controls how points are created from their projections on the lines, with two possible values: 1) `'1D'` places points on a second line perpendicular to the cluster line using a normal distribution centered at their intersection; or, 2) `'2D'` places point using a bivariate normal distribution centered at the point projection.
3443

3544
## Return values
3645

@@ -43,7 +52,9 @@ depending on the selected input parameters.
4352
`slopes` | Vector (`numClusts` x *1*) containing the effective slopes of the lines used to generate clusters.
4453
`lengths` | Vector (`numClusts` x *1*) containing the effective lengths of the lines used to generate clusters.
4554

46-
## Usage example
55+
## Usage examples
56+
57+
### Basic usage
4758

4859
```MATLAB
4960
[data cp idx] = generateData(1, 0.5, 5, 15, 15, 5, 1, 2, 200);
@@ -60,6 +71,21 @@ The following command plots the generated clusters:
6071
scatter(data(:, 1), data(:, 2), 8, idx);
6172
```
6273

74+
### Using optional parameters
75+
76+
The following command generates 7 clusters with a total of 100 000 points.
77+
Optional parameters are used to override the defaults.
78+
79+
```MATLAB
80+
[data cp idx] = generateData(0, 0.1, 7, 25, 25, 25, 5, 1, 100000, ...
81+
'pointDist', 'norm', 'pointOffset', '1D', 'allowEmpty', true);
82+
```
83+
84+
The generated clusters can be visualized with the same `scatter` command used
85+
in the previous example.
86+
87+
### Reproducible cluster generation
88+
6389
To make cluster generation reproducible, set the random number generator seed
6490
to a specific value (e.g. 123) before generating the data:
6591

generateData.m

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,27 +31,30 @@
3131
% Line lengths are drawn from the folded normal
3232
% distribution.
3333
% lengthStd - Standard deviation of line lengths.
34-
% lateralStd - "Cluster fatness", i.e., the standard deviation of the
35-
% distance from each point to the respective line, in both
36-
% x and y directions. This distance is obtained from the
37-
% normal distribution with zero mean.
34+
% lateralStd - Cluster "fatness", i.e., the standard deviation of the
35+
% distance from each point to its projection on the
36+
% line. The way this distance is obtained is controlled by
37+
% the optional 'pointOffset' parameter.
3838
% totalPoints - Total points in generated data. These will be randomly
3939
% divided between clusters using the half-normal
4040
% distribution with unit standard deviation.
4141
%
4242
% Optional named input parameters:
4343
% allowEmpty - Allow empty clusters? This value is false by default.
44-
% pointDist - Specifies the distribution of points along lines.
45-
% Possible values are 'unif' (default) and 'norm'.
46-
% The former will distribute points uniformly
47-
% along lines, while the latter will use a normal
48-
% distribution (mean equal to the line center, standard
49-
% deviation equal to 1/6 of the line length). In the
50-
% latter case, the line includes three standard deviations
51-
% of the normal distribution, meaning that there is a small
52-
% chance that some points are projected outside line
53-
% limits.
54-
% pointOffset - 1D or 2D.
44+
% pointDist - Specifies the distribution of points along lines, with
45+
% two possible values:
46+
% - 'unif' (default) distributes points uniformly along
47+
% lines.
48+
% - 'norm' distribute points along lines using a normal
49+
% distribution (line center is the mean and the line
50+
% length is equal to 3 standard deviations).
51+
% pointOffset - Controls how points are created from their projections
52+
% on the lines, with two possible values:
53+
% - '1D' places points on a second line perpendicular to
54+
% the cluster line using a normal distribution centered
55+
% at their intersection.
56+
% - '2D' (default) places point using a bivariate normal
57+
% distribution centered at the point projection.
5558
%
5659
% Outputs:
5760
% data - Matrix (totalPoints x 2) with the generated data.

0 commit comments

Comments
 (0)