Skip to content

Commit 08a463a

Browse files
committed
make release-tag: Merge branch 'main' into stable
2 parents d3d3b7c + 43803a6 commit 08a463a

File tree

15 files changed

+242
-342
lines changed

15 files changed

+242
-342
lines changed

CONTRIBUTING.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -174,17 +174,17 @@ Release Workflow
174174
The process of releasing a new version involves several steps combining both ``git`` and
175175
``bumpversion`` which, briefly:
176176

177-
1. Merge what is in ``master`` branch into ``stable`` branch.
177+
1. Merge what is in ``main`` branch into ``stable`` branch.
178178
2. Update the version in ``setup.cfg``, ``ctgan/__init__.py`` and
179179
``HISTORY.md`` files.
180180
3. Create a new git tag pointing at the corresponding commit in ``stable`` branch.
181-
4. Merge the new commit from ``stable`` into ``master``.
181+
4. Merge the new commit from ``stable`` into ``main``.
182182
5. Update the version in ``setup.cfg`` and ``ctgan/__init__.py``
183183
to open the next development iteration.
184184

185185
.. note:: Before starting the process, make sure that ``HISTORY.md`` has been updated with a new
186186
entry that explains the changes that will be included in the new version.
187-
Normally this is just a list of the Pull Requests that have been merged to master
187+
Normally this is just a list of the Pull Requests that have been merged to main
188188
since the last release.
189189

190190
Once this is done, run of the following commands:

HISTORY.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# History
22

3+
## v0.7.5 - 2023-10-05
4+
5+
This release adds a progress bar that will show when setting the `verbose` parameter to True when initializing `CTGAN`. It also removes a warning that was showing.
6+
7+
### Maintenance
8+
9+
* Remove model_missing_values from ClusterBasedNormalizer call - PR [#310](https://github.com/sdv-dev/CTGAN/pull/310) by @fealho
10+
* Switch default branch from master to main - Issue [#311](https://github.com/sdv-dev/CTGAN/issues/311) by @amontanez24
11+
* Remove or implement CTGAN tests - Issue [#312](https://github.com/sdv-dev/CTGAN/issues/312) by @fealho
12+
13+
### New Features
14+
15+
* Add progress bar for CTGAN fitting (+ save the loss values) - Issue [#298](https://github.com/sdv-dev/CTGAN/issues/298) by @frances-h
16+
317
## v0.7.4 - 2023-07-25
418

519
This release adds support for Python 3.11 and drops support for Python 3.7.

Makefile

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -158,22 +158,22 @@ publish: dist publish-confirm ## package and upload a release
158158
twine upload dist/*
159159

160160
.PHONY: bumpversion-release
161-
bumpversion-release: ## Merge master to stable and bumpversion release
161+
bumpversion-release: ## Merge main to stable and bumpversion release
162162
git checkout stable || git checkout -b stable
163-
git merge --no-ff master -m"make release-tag: Merge branch 'master' into stable"
163+
git merge --no-ff main -m"make release-tag: Merge branch 'main' into stable"
164164
bumpversion release
165165
git push --tags origin stable
166166

167167
.PHONY: bumpversion-release-test
168-
bumpversion-release-test: ## Merge master to stable and bumpversion release
168+
bumpversion-release-test: ## Merge main to stable and bumpversion release
169169
git checkout stable || git checkout -b stable
170-
git merge --no-ff master -m"make release-tag: Merge branch 'master' into stable"
170+
git merge --no-ff main -m"make release-tag: Merge branch 'main' into stable"
171171
bumpversion release --no-tag
172172
@echo git push --tags origin stable
173173

174174
.PHONY: bumpversion-patch
175-
bumpversion-patch: ## Merge stable to master and bumpversion patch
176-
git checkout master
175+
bumpversion-patch: ## Merge stable to main and bumpversion patch
176+
git checkout main
177177
git merge stable
178178
bumpversion --no-tag patch
179179
git push
@@ -192,7 +192,7 @@ bumpversion-major: ## Bump the version the next major skipping the release
192192

193193
.PHONY: bumpversion-revert
194194
bumpversion-revert: ## Undo a previous bumpversion-release
195-
git checkout master
195+
git checkout main
196196
git branch -D stable
197197

198198
CLEAN_DIR := $(shell git status --short | grep -v ??)
@@ -205,10 +205,10 @@ ifneq ($(CLEAN_DIR),)
205205
$(error There are uncommitted changes)
206206
endif
207207

208-
.PHONY: check-master
209-
check-master: ## Check if we are in master branch
210-
ifneq ($(CURRENT_BRANCH),master)
211-
$(error Please make the release from master branch\n)
208+
.PHONY: check-main
209+
check-main: ## Check if we are in main branch
210+
ifneq ($(CURRENT_BRANCH),main)
211+
$(error Please make the release from main branch\n)
212212
endif
213213

214214
.PHONY: check-history
@@ -218,7 +218,7 @@ ifeq ($(CHANGELOG_LINES),0)
218218
endif
219219

220220
.PHONY: check-release
221-
check-release: check-clean check-master check-history ## Check if the release can be made
221+
check-release: check-clean check-main check-history ## Check if the release can be made
222222
@echo "A new release can be made"
223223

224224
.PHONY: release
@@ -228,10 +228,10 @@ release: check-release bumpversion-release publish bumpversion-patch
228228
release-test: check-release bumpversion-release-test publish-test bumpversion-revert
229229

230230
.PHONY: release-candidate
231-
release-candidate: check-master publish bumpversion-candidate
231+
release-candidate: check-main publish bumpversion-candidate
232232

233233
.PHONY: release-candidate-test
234-
release-candidate-test: check-clean check-master publish-test
234+
release-candidate-test: check-clean check-main publish-test
235235

236236
.PHONY: release-minor
237237
release-minor: check-release bumpversion-minor release

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@
88
[![PyPI Shield](https://img.shields.io/pypi/v/ctgan.svg)](https://pypi.python.org/pypi/ctgan)
99
[![Unit Tests](https://github.com/sdv-dev/CTGAN/actions/workflows/unit.yml/badge.svg)](https://github.com/sdv-dev/CTGAN/actions/workflows/unit.yml)
1010
[![Downloads](https://pepy.tech/badge/ctgan)](https://pepy.tech/project/ctgan)
11-
[![Coverage Status](https://codecov.io/gh/sdv-dev/CTGAN/branch/master/graph/badge.svg)](https://codecov.io/gh/sdv-dev/CTGAN)
11+
[![Coverage Status](https://codecov.io/gh/sdv-dev/CTGAN/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/CTGAN)
1212

1313
<div align="left">
1414
<br/>
1515
<p align="center">
1616
<a href="https://github.com/sdv-dev/CTGAN">
17-
<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/master/docs/images/CTGAN-DataCebo.png"></img>
17+
<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/CTGAN-DataCebo.png"></img>
1818
</a>
1919
</p>
2020
</div>
@@ -38,9 +38,9 @@ CTGAN is a collection of Deep Learning based synthetic data generators for s
3838
[Blog]: https://datacebo.com/blog
3939
[Documentation]: https://bit.ly/sdv-docs
4040
[Repository]: https://github.com/sdv-dev/CTGAN
41-
[License]: https://github.com/sdv-dev/CTGAN/blob/master/LICENSE
41+
[License]: https://github.com/sdv-dev/CTGAN/blob/main/LICENSE
4242
[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha
43-
[Slack Logo]: https://github.com/sdv-dev/SDV/blob/master/docs/images/slack.png
43+
[Slack Logo]: https://github.com/sdv-dev/SDV/blob/stable/docs/images/slack.png
4444
[Community]: https://bit.ly/sdv-slack-invite
4545

4646
Currently, this library implements the **CTGAN** and **TVAE** models described in the [Modeling Tabular data using Conditional GAN](https://arxiv.org/abs/1907.00503) paper, presented at the 2019 NeurIPS conference.
@@ -141,7 +141,7 @@ More details can be found in the corresponding repository: https://github.com/ka
141141

142142

143143
<div align="center">
144-
<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/master/docs/images/DataCebo.png"></img></a>
144+
<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png"></img></a>
145145
</div>
146146
<br/>
147147
<br/>

ctgan/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
__author__ = 'DataCebo, Inc.'
66
__email__ = '[email protected]'
7-
__version__ = '0.7.4'
7+
__version__ = '0.7.5.dev1'
88

99
from ctgan.demo import load_demo
1010
from ctgan.synthesizers.ctgan import CTGAN

ctgan/data_transformer.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@
1818
class DataTransformer(object):
1919
"""Data Transformer.
2020
21-
Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector.
22-
Discrete columns are encoded using a scikit-learn OneHotEncoder.
21+
Model continuous columns with a BayesianGMM and normalize them to a scalar between [-1, 1]
22+
and a vector. Discrete columns are encoded using a OneHotEncoder.
2323
"""
2424

2525
def __init__(self, max_clusters=10, weight_threshold=0.005):
@@ -46,7 +46,11 @@ def _fit_continuous(self, data):
4646
A ``ColumnTransformInfo`` object.
4747
"""
4848
column_name = data.columns[0]
49-
gm = ClusterBasedNormalizer(model_missing_values=True, max_clusters=min(len(data), 10))
49+
gm = ClusterBasedNormalizer(
50+
missing_value_generation='from_column',
51+
max_clusters=min(len(data), self._max_clusters),
52+
weight_threshold=self._weight_threshold
53+
)
5054
gm.fit(data, column_name)
5155
num_components = sum(gm.valid_component_indicator)
5256

ctgan/synthesizers/ctgan.py

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import torch
88
from torch import optim
99
from torch.nn import BatchNorm1d, Dropout, LeakyReLU, Linear, Module, ReLU, Sequential, functional
10+
from tqdm import tqdm
1011

1112
from ctgan.data_sampler import DataSampler
1213
from ctgan.data_transformer import DataTransformer
@@ -175,6 +176,8 @@ def __init__(self, embedding_dim=128, generator_dim=(256, 256), discriminator_di
175176
self._data_sampler = None
176177
self._generator = None
177178

179+
self.loss_values = pd.DataFrame(columns=['Epoch', 'Generator Loss', 'Distriminator Loss'])
180+
178181
@staticmethod
179182
def _gumbel_softmax(logits, tau=1, hard=False, eps=1e-10, dim=-1):
180183
"""Deals with the instability of the gumbel_softmax for older versions of torch.
@@ -335,8 +338,15 @@ def fit(self, train_data, discrete_columns=(), epochs=None):
335338
mean = torch.zeros(self._batch_size, self._embedding_dim, device=self._device)
336339
std = mean + 1
337340

341+
self.loss_values = pd.DataFrame(columns=['Epoch', 'Generator Loss', 'Distriminator Loss'])
342+
343+
epoch_iterator = tqdm(range(epochs), disable=(not self._verbose))
344+
if self._verbose:
345+
description = 'Gen. ({gen:.2f}) | Discrim. ({dis:.2f})'
346+
epoch_iterator.set_description(description.format(gen=0, dis=0))
347+
338348
steps_per_epoch = max(len(train_data) // self._batch_size, 1)
339-
for i in range(epochs):
349+
for i in epoch_iterator:
340350
for id_ in range(steps_per_epoch):
341351

342352
for n in range(self._discriminator_steps):
@@ -412,10 +422,25 @@ def fit(self, train_data, discrete_columns=(), epochs=None):
412422
loss_g.backward()
413423
optimizerG.step()
414424

425+
generator_loss = loss_g.detach().cpu()
426+
discriminator_loss = loss_d.detach().cpu()
427+
428+
epoch_loss_df = pd.DataFrame({
429+
'Epoch': [i],
430+
'Generator Loss': [generator_loss],
431+
'Discriminator Loss': [discriminator_loss]
432+
})
433+
if not self.loss_values.empty:
434+
self.loss_values = pd.concat(
435+
[self.loss_values, epoch_loss_df]
436+
).reset_index(drop=True)
437+
else:
438+
self.loss_values = epoch_loss_df
439+
415440
if self._verbose:
416-
print(f'Epoch {i+1}, Loss G: {loss_g.detach().cpu(): .4f},' # noqa: T001
417-
f'Loss D: {loss_d.detach().cpu(): .4f}',
418-
flush=True)
441+
epoch_iterator.set_description(
442+
description.format(gen=generator_loss, dis=discriminator_loss)
443+
)
419444

420445
@random_state
421446
def sample(self, n, condition_column=None, condition_value=None):

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.7.4
2+
current_version = 0.7.5.dev1
33
commit = True
44
tag = True
55
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\.(?P<release>[a-z]+)(?P<candidate>\d+))?

setup.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@
2121
"torch>=1.8.0;python_version<'3.10'",
2222
"torch>=1.11.0;python_version>='3.10' and python_version<'3.11'",
2323
"torch>=2.0.0;python_version>='3.11'",
24-
'rdt>=1.3.0,<2.0',
24+
'tqdm>=4.15,<5',
25+
'rdt>=1.6.1,<2.0',
2526
]
2627

2728
setup_requires = [
@@ -118,6 +119,6 @@
118119
test_suite='tests',
119120
tests_require=tests_require,
120121
url='https://github.com/sdv-dev/CTGAN',
121-
version='0.7.4',
122+
version='0.7.5.dev1',
122123
zip_safe=False,
123124
)

tests/integration/synthesizer/test_ctgan.py

Lines changed: 16 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ def test_ctgan_no_categoricals():
3232
assert sampled.shape == (100, 1)
3333
assert isinstance(sampled, pd.DataFrame)
3434
assert set(sampled.columns) == {'continuous'}
35+
assert len(ctgan.loss_values) == 1
36+
assert list(ctgan.loss_values.columns) == ['Epoch', 'Generator Loss', 'Discriminator Loss']
3537

3638

3739
def test_ctgan_dataframe():
@@ -51,6 +53,8 @@ def test_ctgan_dataframe():
5153
assert isinstance(sampled, pd.DataFrame)
5254
assert set(sampled.columns) == {'continuous', 'discrete'}
5355
assert set(sampled['discrete'].unique()) == {'a', 'b', 'c'}
56+
assert len(ctgan.loss_values) == 1
57+
assert list(ctgan.loss_values.columns) == ['Epoch', 'Generator Loss', 'Discriminator Loss']
5458

5559

5660
def test_ctgan_numpy():
@@ -69,6 +73,8 @@ def test_ctgan_numpy():
6973
assert sampled.shape == (100, 2)
7074
assert isinstance(sampled, np.ndarray)
7175
assert set(np.unique(sampled[:, 1])) == {'a', 'b', 'c'}
76+
assert len(ctgan.loss_values) == 1
77+
assert list(ctgan.loss_values.columns) == ['Epoch', 'Generator Loss', 'Discriminator Loss']
7278

7379

7480
def test_log_frequency():
@@ -83,13 +89,23 @@ def test_log_frequency():
8389
ctgan = CTGAN(epochs=100)
8490
ctgan.fit(data, discrete_columns)
8591

92+
assert len(ctgan.loss_values) == 100
93+
assert list(ctgan.loss_values.columns) == ['Epoch', 'Generator Loss', 'Discriminator Loss']
94+
pd.testing.assert_series_equal(ctgan.loss_values['Epoch'],
95+
pd.Series(range(100), name='Epoch'))
96+
8697
sampled = ctgan.sample(10000)
8798
counts = sampled['discrete'].value_counts()
8899
assert counts['a'] < 6500
89100

90101
ctgan = CTGAN(log_frequency=False, epochs=100)
91102
ctgan.fit(data, discrete_columns)
92103

104+
assert len(ctgan.loss_values) == 100
105+
assert list(ctgan.loss_values.columns) == ['Epoch', 'Generator Loss', 'Discriminator Loss']
106+
pd.testing.assert_series_equal(ctgan.loss_values['Epoch'],
107+
pd.Series(range(100), name='Epoch'))
108+
93109
sampled = ctgan.sample(10000)
94110
counts = sampled['discrete'].value_counts()
95111
assert counts['a'] > 9000
@@ -231,56 +247,6 @@ def test_fixed_random_seed():
231247
np.testing.assert_array_equal(sampled_0_1, sampled_1_1)
232248

233249

234-
# Below are CTGAN tests that should be implemented in the future
235-
def test_continuous():
236-
"""Test training the CTGAN synthesizer on a continuous dataset."""
237-
# assert the distribution of the samples is close to the distribution of the data
238-
# using kstest:
239-
# - uniform (assert p-value > 0.05)
240-
# - gaussian (assert p-value > 0.05)
241-
# - inversely correlated (assert correlation < 0)
242-
pass
243-
244-
245-
def test_categorical():
246-
"""Test training the CTGAN synthesizer on a categorical dataset."""
247-
# assert the distribution of the samples is close to the distribution of the data
248-
# using cstest:
249-
# - uniform (assert p-value > 0.05)
250-
# - very skewed / biased? (assert p-value > 0.05)
251-
# - inversely correlated (assert correlation < 0)
252-
pass
253-
254-
255-
def test_categorical_log_frequency():
256-
"""Test training the CTGAN synthesizer on a small categorical dataset."""
257-
# assert the distribution of the samples is close to the distribution of the data
258-
# using cstest:
259-
# - uniform (assert p-value > 0.05)
260-
# - very skewed / biased? (assert p-value > 0.05)
261-
# - inversely correlated (assert correlation < 0)
262-
pass
263-
264-
265-
def test_mixed():
266-
"""Test training the CTGAN synthesizer on a small mixed-type dataset."""
267-
# assert the distribution of the samples is close to the distribution of the data
268-
# using a kstest for continuous + a cstest for categorical.
269-
pass
270-
271-
272-
def test_conditional():
273-
"""Test training the CTGAN synthesizer and sampling conditioned on a categorical."""
274-
# verify that conditioning increases the likelihood of getting a sample with the specified
275-
# categorical value
276-
pass
277-
278-
279-
def test_batch_size_pack_size():
280-
"""Test that if batch size is not a multiple of pack size, it raises a sane error."""
281-
pass
282-
283-
284250
def test_ctgan_save_and_load(tmpdir):
285251
"""Test that the ``CTGAN`` model can be saved and loaded."""
286252
# Setup

0 commit comments

Comments
 (0)