Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS using wrapper approaches #331

Open
Mohammed-Ryiad-Eiadeh opened this issue Apr 4, 2023 · 7 comments
Open

FS using wrapper approaches #331

Mohammed-Ryiad-Eiadeh opened this issue Apr 4, 2023 · 7 comments
Labels
question General question

Comments

@Mohammed-Ryiad-Eiadeh
Copy link

greetings,

I asked this question before. I have some concerns about selecting features using approximation algorithms like: Cuckoo Search. I did that from scrach with a project I worked on in the past, yet reading data from CSV file into two D-array and saving the new subset of features into new CSV file each time for evaluating purposes (training and testing) is a time consuming and its not professional at all. So my question here, can you remind me of what classes and interfacess I need to use in order to integrate them with my work?

@Mohammed-Ryiad-Eiadeh Mohammed-Ryiad-Eiadeh added the question General question label Apr 4, 2023
@Craigacp
Copy link
Member

Craigacp commented Apr 4, 2023

I'm not sure I understand the question. At the moment Tribuo doesn't have any implementations of feature selection wrappers. To add one you need to implement org.tribuo.FeatureSelector with the desired algorithm. The SelectedFeatureSet produced by a run of the algorithm can be saved out, and you can produce a dataset containing only the selected features by constructing a SelectedFeaturesDataset.

@Mohammed-Ryiad-Eiadeh
Copy link
Author

Thats all I need to know now. And for further concerns I may reopen this issue.

@Mohammed-Ryiad-Eiadeh
Copy link
Author

Dear Adam,

I implemented a wrapper FS based Cuckoo search algorithm. But I want to know your opinion about this:

var data = new CSVLoader(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");

    var dataSplitter = new TrainTestSplitter<Label>(data, 0.5, Trainer.DEFAULT_SEED);
    var TrainingPart = new MutableDataset<Label>(dataSplitter.getTrain());
    var TestinfPart = new MutableDataset<Label>(dataSplitter.getTest());

    var opt = new CuckooSearchOptimizer(TestinfPart,
            TransferFunction.TransferFunction_V2,
            50,
            2,
            2,
            0.1,
            1.5,
            10);

    var SFS = opt.select(TrainingPart);

This is how the algorithm looks like, and my concern is about passing the test part to the constructor since I think the code should be better but the wrapper FS requires to train and test each solution from the population so I need to use train and test portions for it, now my suggestion is to pass the datasource to the FS algorithm such as:

var data = new CSVLoader(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");

    var opt = new CuckooSearchOptimizer(data,
            TransferFunction.TransferFunction_V2,
            50,
            2,
            2,
            0.1,
            1.5,
            10);

    var SFS = opt.getSelectedFeature();

With some other methods to get all needed information.

Please tell me if there is another appropriate solution for this

@Craigacp
Copy link
Member

I would pass the feature selection algorithm a dataset and have it split that internally, controlled by a parameter. DataSources should only be converted into Datasets, nothing should really be processing them in the DataSource form.

@Mohammed-Ryiad-Eiadeh
Copy link
Author

Mohammed-Ryiad-Eiadeh commented Apr 20, 2023

Okay, in the algorithm I need to train some trainer like KNN (lazy algorithm) in order to evaluate each solution from the population, therefore I need the train and test parts to be used inside the algorithm and I cant do that by passing the training part, I want to know your suggesion

@Craigacp
Copy link
Member

You should keep the test set used by the wrapper completely separate from the test set used to evaluate the final classifier, so you need to split your data into at least three chunks, a train set for the wrapper, a test set for the wrapper and a final test set. You can also train the final classifier on the wrappers train & test set combined if you want, but that's not necessary. You can also do cross validation inside the wrapper, or randomly split the data each time for each feature set, but essentially all three of those options operate on whatever data you pass into the wrapper which should be separate from your final test set.

@Mohammed-Ryiad-Eiadeh
Copy link
Author

Mohammed-Ryiad-Eiadeh commented Apr 20, 2023

I think 10-fold cross validation is suitable for such a task and it solved the issue I was asking about. Now I want to add some other constructors, writing some comments too. Thanks for your help. I will request to add the model to the Tribuo engine and I may add more wrapper approaches for FS in the near future. The code looks like this:

var data = new CSVLoader(new LabelFactory()).loadDataSource(Paths.get("C:\Users\20187\Desktop\o.csv"), "Class");

    var dataSplitter = new TrainTestSplitter<Label>(data, 0.5, Trainer.DEFAULT_SEED);
    var TrainingPart = new MutableDataset<Label>(dataSplitter.getTrain());
    var TestinfPart = new MutableDataset<Label>(dataSplitter.getTest());

    var opt = new CuckooSearchOptimizer(TransferFunction.TransferFunction_V2,
            50,
            2,
            2,
            0.1,
            1.5,
            20);

    var SFS = opt.select(TrainingPart);
    System.out.println(SFS.featureNames().size());
    var SFDS = new SelectedFeatureDataset<>(TrainingPart, SFS);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question
Projects
None yet
Development

No branches or pull requests

2 participants