Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing sorting of features? #284

Open
rcheu-quora opened this issue Jul 2, 2020 · 1 comment
Open

Missing sorting of features? #284

rcheu-quora opened this issue Jul 2, 2020 · 1 comment

Comments

@rcheu-quora
Copy link

I was looking at reagent/preprocessing/preprocessor.py, it seems like the Preprocessor expects that the input has already been sorted according to the normalization parameters, but I believe that's not actually the case. Instead, the input is just in increasing feature idx.

One of the first lines of the forward pass of the Preprocessor is:

split_input = torch.split(input, self.split_sections, dim=1)

Which appears to expect that the input tensor has been sorted as in sorted_features.

The input to the preprocessor is generated by reagent/workflow/data_fetcher.py. Inside that file, the order is generated by:

def infer_states_names(df, multi_steps: Optional[int]):
    """ Infer possible state names from states and next state features. """
    state_keys = get_distinct_keys(df, "state_features")
    next_states_is_col_arr_map = not (multi_steps is None)
    next_state_keys = get_distinct_keys(
        df, "next_state_features", is_col_arr_map=next_states_is_col_arr_map
    )
    return sorted(set(state_keys) | set(next_state_keys))

This later is passed to make_sparse2dense(df, col_name: str, possible_keys: List) as possible_keys and used to generate the dense feature array input.

I believe either the preprocessor needs to first re-arrange the input to match the sorted feature ordering, or the sorted ordering needs to be used when generating the datasets as the possible_keys variable.

@rcheu-quora rcheu-quora changed the title Possible missing sorting of features? Missing sorting of features? Jul 2, 2020
@kaiwenw
Copy link
Contributor

kaiwenw commented Jul 10, 2020

Hi @rcheu-quora, thanks for the detailed analysis! I think you're right. Either query_data or DataLoader can handle the sorting. I'll be sure to fix this. Right now the examples work because they're generated in the same order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants