Skip to content

Commit

Permalink
fix: improvements to the doppelganger model
Browse files Browse the repository at this point in the history
  • Loading branch information
ricardodcpereira committed Sep 7, 2023
1 parent ce75895 commit 73001e8
Show file tree
Hide file tree
Showing 10 changed files with 33,956 additions and 132 deletions.
33,601 changes: 33,601 additions & 0 deletions data/fcc_mba.csv

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/examples/doppelganger_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ DoppelGANger is a model that uses a Generative Adversarial Network (GAN) framewo

- 📑 **Paper:** [Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions](https://dl.acm.org/doi/pdf/10.1145/3419394.3423643)

Here’s an example of how to synthetize time-series data with DoppelGANger using the [Yahoo Stock Price](https://www.kaggle.com/datasets/arashnic/time-series-forecasting-with-yahoo-stock-price) dataset:
Here’s an example of how to synthetize time-series data with DoppelGANger using the [Measuring Broadband America](https://www.fcc.gov/reports-research/reports/measuring-broadband-america/raw-data-measuring-broadband-america-seventh) dataset:


```python
--8<-- "examples/timeseries/stock_doppelganger.py"
--8<-- "examples/timeseries/mba_doppelganger.py"
```


Expand Down
35 changes: 21 additions & 14 deletions docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,30 @@ The following example showcases how to synthesize the [Yahoo Stock Price](https:
```python
# Import the necessary modules
import pandas as pd
from ydata_synthetic.synthesizers import ModelParameters
from ydata_synthetic.synthesizers.timeseries import TimeGAN
from ydata_synthetic.preprocessing.timeseries.utils import real_data_loading
from ydata_synthetic.synthesizers.timeseries import TimeSeriesSynthesizer
from ydata_synthetic.synthesizers import ModelParameters, TrainParameters

# Load and preprocess data
stock_data_df = pd.read_csv("stock_data.csv")
processed_data = real_data_loading(stock_data_df.values, seq_len=24)
# Define model and training parameters
gan_args = ModelParameters(batch_size=128, lr=5e-4, noise_dim=128, layers_dim=128)
synth = TimeGAN(model_parameters=gan_args, hidden_dim=24, seq_len=24, n_seq=6, gamma=1)
# Define model parameters
gan_args = ModelParameters(batch_size=128,
lr=5e-4,
noise_dim=32,
layers_dim=128,
latent_dim=24,
gamma=1)

# Train the generator model
synth.train(data=processed_data, train_steps=50000)
train_args = TrainParameters(epochs=50000,
sequence_length=24,
number_sequences=6)

# Read the data
stock_data = pd.read_csv("stock_data.csv")

# Training the TimeGAN synthesizer
synth = TimeSeriesSynthesizer(modelname='timegan', model_parameters=gan_args)
synth.fit(stock_data, train_args, num_cols=list(stock_data.columns))

# Generate new synthetic data
synth_data = synth.sample(len(stock_data_df))
# Generating new synthetic samples
synth_data = synth.sample(n_samples=500)
```

## Running the Streamlit App
Expand Down
63 changes: 63 additions & 0 deletions examples/timeseries/mba_doppelganger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
"""
DoppelGANger architecture example file
"""

# Importing necessary libraries
import pandas as pd
from os import path
import matplotlib.pyplot as plt
from ydata_synthetic.synthesizers.timeseries import TimeSeriesSynthesizer
from ydata_synthetic.synthesizers import ModelParameters, TrainParameters

# Read the data
mba_data = pd.read_csv("../../data/fcc_mba.csv")
numerical_cols = ["traffic_byte_counter", "ping_loss_rate"]
categorical_cols = [col for col in mba_data.columns if col not in numerical_cols]

# Define model parameters
model_args = ModelParameters(batch_size=100,
lr=0.001,
betas=(0.2, 0.9),
latent_dim=20,
gp_lambda=2,
pac=1)

train_args = TrainParameters(epochs=400, sequence_length=56,
sample_length=8, rounds=1,
measurement_cols=["traffic_byte_counter", "ping_loss_rate"])

# Training the DoppelGANger synthesizer
if path.exists('doppelganger_mba'):
model_dop_gan = TimeSeriesSynthesizer.load('doppelganger_mba')
else:
model_dop_gan = TimeSeriesSynthesizer(modelname='doppelganger', model_parameters=model_args)
model_dop_gan.fit(mba_data, train_args, num_cols=numerical_cols, cat_cols=categorical_cols)
model_dop_gan.save('doppelganger_mba')

# Generate synthetic data
synth_data = model_dop_gan.sample(n_samples=600)
synth_df = pd.concat(synth_data, axis=0)

# Create a plot for each measurement column
plt.figure(figsize=(10, 6))

plt.subplot(2, 1, 1)
plt.plot(mba_data['traffic_byte_counter'].reset_index(drop=True), label='Real Traffic')
plt.plot(synth_df['traffic_byte_counter'].reset_index(drop=True), label='Synthetic Traffic', alpha=0.7)
plt.xlabel('Index')
plt.ylabel('Value')
plt.title('Traffic Comparison')
plt.legend()
plt.grid(True)

plt.subplot(2, 1, 2)
plt.plot(mba_data['ping_loss_rate'].reset_index(drop=True), label='Real Ping')
plt.plot(synth_df['ping_loss_rate'].reset_index(drop=True), label='Synthetic Ping', alpha=0.7)
plt.xlabel('Index')
plt.ylabel('Value')
plt.title('Ping Comparison')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()
35 changes: 0 additions & 35 deletions examples/timeseries/stock_doppelganger.py

This file was deleted.

Loading

0 comments on commit 73001e8

Please sign in to comment.