Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas df.query() #72

Open
andrea-bistacchi opened this issue Jun 12, 2024 · 3 comments
Open

pandas df.query() #72

andrea-bistacchi opened this issue Jun 12, 2024 · 3 comments
Assignees

Comments

@andrea-bistacchi
Copy link
Collaborator

Dataframe queries can be implemented as:

query_string = 'something'
df.query(query_string)

This is very effective since it is possible to compare two fields:

query_string = 'field_1 == field_2'
df.query(query_string)

a field and a string:

query_string = 'field_1 == "some string"'
df.query(query_string)

#or

string = "some string"
query_string = f'field_1 == "{string}"'
df.query(query_string)

a field and a value:

query_string = 'field_1 >= 10.3'
df.query(query_string)

#or

value = 10.3
query_string = f'field_1 == {value}'
df.query(query_string)

and to get the whole dataframe it is possible to define a string that will be true for all rows:

query_string = 'index == index'
df.query(query_string)

(somebody says 'ilevel_0 in ilevel_0' is more robust).

@andrea-bistacchi
Copy link
Collaborator Author

andrea-bistacchi commented Jun 12, 2024

However this method is prone to key errors.

(A) For some reason the "@" method reported in many examples does not work always, so use the syntax above:

query_string = 'field_1 == "some string"'
df.query('@query_string')

(B) Unfortunately if the field that is used in the query is not present, pandas crashes instead of returning an empty dataframe. To deal with this we have three options:

  1. all columns used for common queries must be present in all collection dataframes, but (i) this can cause some redundancy, e.g. when adding the x_section column to the x section collection, and (ii) this might mean adding a lot of these redundant fields in the future, with problems of backwards compatibility of project files;

  2. handle errors where a field is not present in a dataframe with try: except: return; this is not nice but should work;

  3. find a way to obtain an empty list from a query where a field is not present in a dataframe, instead of a fatal error

Opinions?

@andrea-bistacchi andrea-bistacchi self-assigned this Jun 12, 2024
@andrea-bistacchi
Copy link
Collaborator Author

It seems it works for cross sections. See last commit in windows_refactoring branch.

Must be tested on different objects that are not included in standard test projects.

@gbene
Copy link
Collaborator

gbene commented Jun 13, 2024

It looks like it works fine for wells too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants