Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FDataGrid.copy argument sample _names default behaviour #588

Open
ego-thales opened this issue Oct 24, 2023 · 2 comments
Open

FDataGrid.copy argument sample _names default behaviour #588

ego-thales opened this issue Oct 24, 2023 · 2 comments

Comments

@ego-thales
Copy link
Contributor

Hello,

I sometimes use a "skeleton" for a FDataGrid instance, containing no samples but lots of information (grid points, domain_range, interpolation, etc...).

I would like, given a new data_matrix (generated from another process), to create an instance following the skeleton, with:

fd = skeleton.copy(data_matrix=data_matrix)

but the problem with that, is that copy isn't happy with the fact that I don't give sample names.

I can easily fix this with

fd = skeleton.copy(data_matrix=data_matrix,
                   sample_names=skeleton.sample_names + (None,) * len(data_matrix))

but I think the default behaviour of copy might be changed for a smoother use. What do you think?
Another possibility could be to turn to concatenate and allow it to be used with a data_matrix argument and not only FDataGrid instances.

@vnmabus
Copy link
Member

vnmabus commented Oct 24, 2023

I wanted since a long time to rework the classes used to represent functions, in order to make them more general and move them to a separate package. Thus, some of the design choices taken for scikit-fda will be revisited.

In particular, I agree that metadata attributes, such as sample_names complicate a lot of things. On one hand, they are interesting for plotting and exploratory analysis. On the other hand, they work really bad with operations. For example, when taking the mean of a functional dataset, the sample_names need to be removed. This is not a new problem. Other libraries, such as xarray or scipp that deal with arrays including labels or units need to do something about it too. We should check what they do.

As for the copy method itself, I am unsure of the best way to deal with it. Right now it is more of an "utility" method, used for creating identical or almost identical copies of a FData object, that is why by default it tries to preseve everything. However I agree that this is problematic, and can create subtle bugs if one forgets about that behavior. I would love to have your feedback on how to better implement it.

@eliegoudout
Copy link
Contributor

Other libraries, such as xarray or scipp that deal with arrays including labels or units need to do something about it too. We should check what they do.

I personally don't know these packages at all, so I would not be able to give you any insight at the moment.

As for the copy method itself, I am unsure of the best way to deal with it. [...] I would love to have your feedback on how to better implement it.

I don't have any strong opinion, but my main observation would be that it is currently impossible to set any argument to attr=None, because it is then treated as self.attr. I think this should be fixed, while retaining default behaviour.

skeleton.copy(data_matrix=data_matrix)  # Raises error if data_matrix has different number of samples
skeleton.copy(data_matrix=data_matrix, sample_names=None)  # Actively deleting sample names

This way, no sample name is silently deleted, but if the user encounters this error, she/he can fix it more easily.

As for implementation detail, I don't know what the best practise would be. Creating a DEF object to replace the default None values?

def copy(self: T, *, data_matrix=DEF, grid_points=DEF, [...]) -> T

There might be more pythonic ways I don't know of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants