Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: forced pandas conversion of features array in preprocessing #1378

Open
Lopa10ko opened this issue Mar 14, 2025 · 0 comments · May be fixed by #1343
Open

[Bug]: forced pandas conversion of features array in preprocessing #1378

Lopa10ko opened this issue Mar 14, 2025 · 0 comments · May be fixed by #1343
Assignees
Labels
bug Something isn't working

Comments

@Lopa10ko
Copy link
Collaborator

Current Behavior

The type of data features array is inconsistent during the obligatory preprocessing.
When performing operations to reduce the memory size, a conversion from np.array to pd.DataFrame is forced.

https://github.com/aimclub/FEDOT/blame/e15c0bfef449d100872028a0d24fd43e43e5b67d/fedot/preprocessing/preprocessing.py#L563-L579

Possible Solution

@copy_doc(BasePreprocessor.reduce_memory_size)
    def reduce_memory_size(self, data: InputData) -> InputData:
        if isinstance(data, InputData):
            if data.task.task_type == TaskTypesEnum.ts_forecasting:
                # TODO: TS data has col_type_ids['features'] = None.
                #  It required to add this to reduce memory for them
                pass
            else:
                if data.data_type == DataTypesEnum.table:
                    self.log.debug('-- Reduce memory in features')
                    was_features_in_numpy = isinstance(data.features, np.ndarray)
                    data.features = reduce_mem_usage(data.features, data.supplementary_data.col_type_ids['features'])
                    data.features = data.features.to_numpy() if was_features_in_numpy else data.features

                    if data.target is not None:
                        self.log.debug('-- Reduce memory in target')
                        data.target = reduce_mem_usage(data.target, data.supplementary_data.col_type_ids['target'])
                        data.target = data.target.to_numpy()

        return data

Steps to Reproduce

Fedot.Industrial test suite:
from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from golem.core.tuning.sequential import SequentialTuner

from fedot_ind.api.utils.checkers_collections import ApiConfigCheck
from fedot_ind.core.operation.dummy.dummy_operation import init_input_data
from fedot_ind.core.repository.config_repository import DEFAULT_CLF_API_CONFIG
from fedot_ind.core.repository.initializer_industrial_models import IndustrialModels
from fedot_ind.tools.loader import DataLoader

def initialize_uni_data():
    train_data, test_data = DataLoader('Lightning7').load_data()
    train_input_data = init_input_data(train_data[0], train_data[1])
    test_input_data = init_input_data(test_data[0], test_data[1])
    return train_input_data, test_input_data

def test_tuner_industrial_uni_series():
    IndustrialModels().setup_repository()
    train_data, test_data = initialize_uni_data()
    # search_space = SearchSpace(get_industrial_search_space(1))
    pipeline_builder = PipelineBuilder()
    pipeline_builder.add_node('eigen_basis')
    pipeline_builder.add_node('quantile_extractor')
    pipeline_builder.add_node('rf')

    pipeline = pipeline_builder.build()

    pipeline_tuner = TunerBuilder(train_data.task) \
        .with_tuner(SequentialTuner) \
        .with_timeout(2) \
        .with_iterations(2) \
        .build(train_data)

    pipeline = pipeline_tuner.tune(pipeline)

    pipeline.fit(train_data)
    pipeline.predict(test_data)
@Lopa10ko Lopa10ko added the bug Something isn't working label Mar 14, 2025
@Lopa10ko Lopa10ko self-assigned this Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant