[Bug]: forced pandas conversion of features array in preprocessing #1378

Lopa10ko · 2025-03-14T14:56:00Z

Current Behavior

The type of data features array is inconsistent during the obligatory preprocessing.
When performing operations to reduce the memory size, a conversion from np.array to pd.DataFrame is forced.

https://github.com/aimclub/FEDOT/blame/e15c0bfef449d100872028a0d24fd43e43e5b67d/fedot/preprocessing/preprocessing.py#L563-L579

Possible Solution

@copy_doc(BasePreprocessor.reduce_memory_size)
    def reduce_memory_size(self, data: InputData) -> InputData:
        if isinstance(data, InputData):
            if data.task.task_type == TaskTypesEnum.ts_forecasting:
                # TODO: TS data has col_type_ids['features'] = None.
                #  It required to add this to reduce memory for them
                pass
            else:
                if data.data_type == DataTypesEnum.table:
                    self.log.debug('-- Reduce memory in features')
                    was_features_in_numpy = isinstance(data.features, np.ndarray)
                    data.features = reduce_mem_usage(data.features, data.supplementary_data.col_type_ids['features'])
                    data.features = data.features.to_numpy() if was_features_in_numpy else data.features

                    if data.target is not None:
                        self.log.debug('-- Reduce memory in target')
                        data.target = reduce_mem_usage(data.target, data.supplementary_data.col_type_ids['target'])
                        data.target = data.target.to_numpy()

        return data

Steps to Reproduce

Fedot.Industrial test suite:

from fedot.core.pipelines.pipeline_builder import PipelineBuilder
from fedot.core.pipelines.tuning.tuner_builder import TunerBuilder
from golem.core.tuning.sequential import SequentialTuner

from fedot_ind.api.utils.checkers_collections import ApiConfigCheck
from fedot_ind.core.operation.dummy.dummy_operation import init_input_data
from fedot_ind.core.repository.config_repository import DEFAULT_CLF_API_CONFIG
from fedot_ind.core.repository.initializer_industrial_models import IndustrialModels
from fedot_ind.tools.loader import DataLoader

def initialize_uni_data():
    train_data, test_data = DataLoader('Lightning7').load_data()
    train_input_data = init_input_data(train_data[0], train_data[1])
    test_input_data = init_input_data(test_data[0], test_data[1])
    return train_input_data, test_input_data

def test_tuner_industrial_uni_series():
    IndustrialModels().setup_repository()
    train_data, test_data = initialize_uni_data()
    # search_space = SearchSpace(get_industrial_search_space(1))
    pipeline_builder = PipelineBuilder()
    pipeline_builder.add_node('eigen_basis')
    pipeline_builder.add_node('quantile_extractor')
    pipeline_builder.add_node('rf')

    pipeline = pipeline_builder.build()

    pipeline_tuner = TunerBuilder(train_data.task) \
        .with_tuner(SequentialTuner) \
        .with_timeout(2) \
        .with_iterations(2) \
        .build(train_data)

    pipeline = pipeline_tuner.tune(pipeline)

    pipeline.fit(train_data)
    pipeline.predict(test_data)

The text was updated successfully, but these errors were encountered:

Lopa10ko added the bug Something isn't working label Mar 14, 2025

Lopa10ko self-assigned this Mar 14, 2025

Lopa10ko linked a pull request Mar 14, 2025 that will close this issue

feat: close old todos in prediction saving, preprocessing hotfixes #1343

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: forced pandas conversion of features array in preprocessing #1378

[Bug]: forced pandas conversion of features array in preprocessing #1378

Lopa10ko commented Mar 14, 2025

[Bug]: forced pandas conversion of features array in preprocessing #1378

[Bug]: forced pandas conversion of features array in preprocessing #1378

Comments

Lopa10ko commented Mar 14, 2025

Current Behavior

Possible Solution

Steps to Reproduce