Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BEIS National Energy Efficiency Data-Framework (NEED) - Data exploration #21

Open
Yiannis20 opened this issue Nov 13, 2018 · 1 comment
Assignees

Comments

@Yiannis20
Copy link
Contributor

Ioannis and Joshi had a discussion with BEIS regarding the dataset based on the National Energy Efficiency Data-Framework (NEED).

A sample of the aforementioned dataset was provided by the NEED team and the Synthetic data team decided to explore the dataset and investigate whether it is suitable as a test dataset for the Synthetic data generation platform developed at the Data Science Campus.

@Yiannis20 Yiannis20 self-assigned this Nov 13, 2018
@Yiannis20
Copy link
Contributor Author

The dataset contains both categorical and numerical variables. However, the vast majority of variables are numerical and the categorical variables can be potentially encoded with methods such as 'label encoding' or one 'hot encoding'.

Additionally, the dataset is of good size with about 50K samples and 46 variables.

Therefore, the synthetic data team decided to process the dataset with the Synthetic data generation software developed at the Data Science Campus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant