You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking at reagent/preprocessing/preprocessor.py, it seems like the Preprocessor expects that the input has already been sorted according to the normalization parameters, but I believe that's not actually the case. Instead, the input is just in increasing feature idx.
One of the first lines of the forward pass of the Preprocessor is:
Which appears to expect that the input tensor has been sorted as in sorted_features.
The input to the preprocessor is generated by reagent/workflow/data_fetcher.py. Inside that file, the order is generated by:
def infer_states_names(df, multi_steps: Optional[int]):
""" Infer possible state names from states and next state features. """
state_keys = get_distinct_keys(df, "state_features")
next_states_is_col_arr_map = not (multi_steps is None)
next_state_keys = get_distinct_keys(
df, "next_state_features", is_col_arr_map=next_states_is_col_arr_map
)
return sorted(set(state_keys) | set(next_state_keys))
This later is passed to make_sparse2dense(df, col_name: str, possible_keys: List) as possible_keys and used to generate the dense feature array input.
I believe either the preprocessor needs to first re-arrange the input to match the sorted feature ordering, or the sorted ordering needs to be used when generating the datasets as the possible_keys variable.
The text was updated successfully, but these errors were encountered:
rcheu-quora
changed the title
Possible missing sorting of features?
Missing sorting of features?
Jul 2, 2020
Hi @rcheu-quora, thanks for the detailed analysis! I think you're right. Either query_data or DataLoader can handle the sorting. I'll be sure to fix this. Right now the examples work because they're generated in the same order.
I was looking at
reagent/preprocessing/preprocessor.py
, it seems like thePreprocessor
expects that theinput
has already been sorted according to the normalization parameters, but I believe that's not actually the case. Instead, theinput
is just in increasing feature idx.One of the first lines of the forward pass of the
Preprocessor
is:split_input = torch.split(input, self.split_sections, dim=1)
Which appears to expect that the input tensor has been sorted as in
sorted_features
.The input to the preprocessor is generated by
reagent/workflow/data_fetcher.py
. Inside that file, the order is generated by:This later is passed to
make_sparse2dense(df, col_name: str, possible_keys: List)
aspossible_keys
and used to generate the dense feature array input.I believe either the preprocessor needs to first re-arrange the input to match the sorted feature ordering, or the sorted ordering needs to be used when generating the datasets as the
possible_keys
variable.The text was updated successfully, but these errors were encountered: