In this repository, various data mining algorithms are implemented while following the Course, Foundations of Data Mining at Eindhoven University of Technology (TU/e). Besides, hyper-parameter tuning techniques are experimented. The algorithms are as follows.
File | Algorithm |
---|---|
MPNN.py | Multiple processing nearest-neighbor based on Cosine similarity. |
MTNN.py | Multiple threads nearest-neighbor based on Cosine similarity. |
NN.py | Nearest-neighbor based on Cosine similarity. |
SGDDataset.py | Provides next_batch method, useful in Neural Network Mini-batch training. |
ada_learning_rate_nn.py | One hidden layer and one Softmax output layer neural netwok based on Tensorflow. |
challenge.py | Used to challenge the task 14951 in OpenML. |
dataloader_1b.py | Used to load files in data1b/. Provided by Course Prof. |
decision_tree.py | Experimented CART and randomized tree with a set of hyperparameter settings. |
ensembles.py | Experimented with Random Forests. |
evaluate_NN.py | Used to evaluate Nearest-neighbor algorithms with different distance functions, i.e., confusion matrix. |
k_means.py | k-means with different initialization methods, inclu. first k points, uniformly sampled k points, kmeans++, gonzales algorithm. |
k_medians.py | k-median clustering. |
kernel_selection.py | Support Vector Machines (SVM) with different kernels, incl. linear, rbf and polynomial kernels. |
kernel_selection2.py | Experimented parameters of SVM with rbf kernel, namely gamma and C. |
kernel_selection3.py | Grid search of SVM with rbf kernel, using AUC as metric. |
landscape_analysis.py | Grid search of SVM with rbf kernel. Plot the AUC = f(gamma, C) heat map. |
max_margin_classifier.py | A simple example to explain support vectors and maximal margin linear classifier. |
mnist_dataloader.py | To load the MNIST dataset (data1a/). Provided by Course Prof. |
model_selection.py | Compute bias and variance using bootstraping of knearest-neighbor (different ks) or decision tree (different max_depth or max_leaf_nodes). |
nn_mnist.py | Neural Network with sklearn. |
nn_with_alpha.py | Neural Network with different alphas, i.e., l2-norm penalty implemented with Tensorflow. |
nn_with_learning_rate.py | Neural Network with different learning rates implemented with Tensorflow. |
nn_with_momentum.py | Neural Network with different momentum implemented with Tensorflow. |
nn_with_nodes.py | Neural Network with a hidden layer and a softmax output layer implemented in Tensorflow. |
optimization.py | Experimented with different hyperparameter tuning techniques, incl. random search, grid search (with cross validation). |
random_forests.py | Demonstrate how Random Forests reduce variance without increasing bias (much) so as to reduce the classification error. |
random_projection.py | Implement random projection, to do dimensionality reduction. The result is compared with MPNN.py. |
roc_curves.py | Demonstrate the convex hull of many classifiers in ROC diagram. |
tensor_flow_softmax_mnist.py | Softmax regression implemented in Tensorflow. This is used to practice with Tensorflow. |
unit_circles.py | Demonstrate the unit circles of different norms, inclu. l1, l2, l10 and l-infinity. |