Skip to content

Conversation

@toncho11
Copy link
Collaborator

@toncho11 toncho11 commented Nov 10, 2025

A re-written _inc_exc_datasets() that fixes issues and provides much needed checks. Before it was possible not to recognize correctly if the input is string or object and thus process the input incorrectly. Fixes: #654. It avoids some confusion induced by the old version.
Might help with: #659

Below is the code I used for testing:

import os
from moabb import set_download_dir
from moabb import benchmark, set_log_level
from sklearn.pipeline import make_pipeline
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.linear_model import LogisticRegression
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace
from moabb.pipelines.classification import SSVEP_CCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.pipeline import make_pipeline
from moabb.pipelines import TRCA
from pyriemann.estimation import Covariances
from pyriemann.classification import MDM
from sklearn.pipeline import make_pipeline

# P300 databases
from moabb.datasets import (
    BI2013a,
    BNCI2014_008,
    BNCI2014_009,
    BNCI2015_003,
    EPFLP300,
    Lee2019_ERP,
    BI2014a,
    BI2014b,
    BI2015a,
    BI2015b,
)

# Motor imagery databases
from moabb.datasets import (
    BNCI2014_001,
    Zhou2016,
    BNCI2015_001,
    BNCI2014_002,
    BNCI2014_004,
    #BNCI2015_004, #not tested
    AlexMI,
    Weibo2014,
    Cho2017,
    GrosseWentrup2009,
    PhysionetMI,
    Shin2017A,
    Lee2019_MI, #new
    Schirrmeister2017 #new
)

pipelines = [
    # {
    #     "name": "SSVEP_CCA",
    #     "pipeline": SSVEP_CCA(
    #         n_harmonics=3,
    #         interval=[1, 3],
    #         freqs={"13": 0, "17": 1},
    #     ),
    #     "paradigms": ["SSVEP"],
    # },
    {
        "name": "MDM",
        "pipeline": make_pipeline(
            Covariances(estimator="oas"),   # Estimate covariance matrices
            MDM(metric="riemann")           # Riemannian Minimum Distance to Mean classifier
        ),
        "paradigms": ["LeftRightImagery", "MotorImagery", "P300"],
        #"paradigms": ["P300"],  
        #"paradigms": ["SSVEP"],
    }
]

results = benchmark(
    pipelines=pipelines,
    evaluations=["WithinSession"],
    #include_datasets=["Kalunga2016", "Nakanishi2015"], #should fail
    #include_datasets=["Nakanishi2015"], #should fail
    #include_datasets=["fsdfsdfsdfs"],
    # exclude_datasets=[EPFLP300()], #should be OK with 2 warnings
    # include_datasets=[Lee2019_ERP(), BI2015b()], #should be OK, with 2 warnings 
    # exclude_datasets=["Stieger2021","fsdfsdfs"], # gives warning  
    #exclude_datasets=["Stieger2021","Liu2024"], #must be OK
    #include_datasets=["BNCI2014-001"], # #must be OK    
    #exclude_datasets=[Zhou2016(), Weibo2014()], #must be OK
    #include_datasets=[Zhou2016(), Weibo2014()], #must be OK
    #include_datasets=[PhysionetMI(), Shin2017A(), Lee2019_MI()], #must be OK
    #include_datasets=[PhysionetMI(), Shin2017A(), "BNCI2014-001"], #should fail
    #exclude_datasets=[PhysionetMI(), Shin2017A(), "BNCI2014-001"], #should fail
    #include_datasets=[Lee2019_ERP(), "fsdfsdfdwwww"], #should fail
    # include_datasets=["fsdfsdfdwwww","dasdasd"], #should fail
    #exclude_datasets = None, include_datasets = None, # should be OK
    
    #include_datasets=["Kalunga2016"],
    results="./results/",
    overwrite=True,
    plot=True,
    n_jobs=1, #otherwise memory is not enough, so 4 is a good value
    output="./benchmark/",
)

print("Results:")
print(results.to_string())

print("Averaging the session performance:")
print(results.groupby("pipeline").mean("score")[["score", "time"]])

# save results
save_path = os.path.join(
    os.path.dirname(os.path.realpath(__file__)), "results_dataframe_test_SSVEP.csv"
)
results.to_csv(save_path, index=True)

print(results.groupby(["dataset","pipeline"]).mean("score")[["score", "time"]].to_string())

@bruAristimunha
Copy link
Collaborator

hey @toncho11,

can you fix the tests please:

FAILED moabb/tests/test_benchmark.py::TestBenchmark::test_benchmark_strdataset - ValueError: Invalid dataset codes in include_datasets: ['FakeDataset-p300-10-2--60-60--120-120--target-nontarget--c3-cz-c4', 'FakeDataset-ssvep-10-2--60-60--120-120--13-15--c3-cz-c4', 'FakeDataset-cvep-10-2--60-60--120-120--10-00--c3-cz-c4']
FAILED moabb/tests/test_benchmark.py::TestBenchmark::test_benchmark_objdataset - ValueError: Some datasets in include_datasets are not part of available datasets for the paradigms you requested in benchmark(): ['FakeDataset-p300-10-2--60-60--120-120--target-nontarget--c3-cz-c4', 'FakeDataset-ssvep-10-2--60-60--120-120--13-15--c3-cz-c4', 'FakeDataset-cvep-10-2--60-60--120-120--10-00--c3-cz-c4']
FAILED moabb/tests/test_benchmark.py::TestBenchmark::test_include_exclude - ValueError: Cannot specify both include_datasets and exclude_datasets.
===== 3 failed, 301 passed, 90 skipped, 207 war

@toncho11 toncho11 marked this pull request as draft November 13, 2025 10:33
@toncho11
Copy link
Collaborator Author

I need some more time on the code.

Copy link
Collaborator

@gcattan gcattan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @toncho11
I think the issue is that you cannot rely on real datasets in Ci/Cd.

Your code remains unchanged. Let me know if you are ok with the edits.

@bruAristimunha
Copy link
Collaborator

Exactly, we don't have a compute instance to send jobs like MNE and Scikit-learn, with some Azure cluster. It's still too expensive for us.

@toncho11 toncho11 marked this pull request as ready for review November 18, 2025 16:27
@toncho11
Copy link
Collaborator Author

toncho11 commented Nov 18, 2025

Ready for merge. I added a few adjustments and improvements.
With this code there many checks, so many future problems should be avoided.
#659 with this PR and the latest code of MOABB seems fixed.

@gcattan gcattan merged commit 9cabb89 into NeuroTechX:develop Nov 18, 2025
18 checks passed
@gcattan
Copy link
Collaborator

gcattan commented Nov 18, 2025

Thank you for this nice contribution @toncho11 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem with exclude datasets in benchmark

3 participants