Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modellen vergelijken #229

Open
gijsber opened this issue Jan 16, 2025 · 4 comments
Open

Modellen vergelijken #229

gijsber opened this issue Jan 16, 2025 · 4 comments
Assignees
Labels
enhancement New feature or request Parameterisatie
Milestone

Comments

@gijsber
Copy link

gijsber commented Jan 16, 2025

Zou in de Ribasim-NL Model klasse kunnen, maar Ribasim Python is misschien beter.

model.equals --> True/False
model.diff/compare --> lijst met tabellen die niet gelijk zijn?
model.diff/compare -->objecten die niet gelijk zijn?

Zie ook https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.compare.html

@gijsber gijsber added enhancement New feature or request Parameterisatie labels Jan 16, 2025
@gijsber gijsber added this to the jan 2025 milestone Jan 16, 2025
@github-project-automation github-project-automation bot moved this to 🆕 New in Ribasim NL Jan 16, 2025
@gijsber gijsber moved this from 🆕 New to 📋 Backlog in Ribasim NL Jan 16, 2025
@gijsber gijsber changed the title algemene methode om verschillen tussen modellen te detecteren model.equals en model.diff methodes om verschillen tussen modellen te detecteren Jan 16, 2025
@gijsber gijsber modified the milestones: jan 2025, feb 2025 Jan 28, 2025
@visr visr changed the title model.equals en model.diff methodes om verschillen tussen modellen te detecteren Modellen vergelijken Jan 30, 2025
@visr visr assigned evetion and unassigned visr Jan 30, 2025
@evetion
Copy link
Member

evetion commented Feb 3, 2025

@rbruijnshkv @DanielTollenaar Can you comment here on your proposed/expected API?

def equals(modela, modelb): boolean  # what should this compare

We can compare on the network topology (same nodes, same links), and if so (if topology is not equal I expect a comparison on individual node tables doesn't make sense?) we can print the difference in tables? Do we consider spatial differences to count, or is it topology only? It can be quite a list, is that wanted/expected? Do we want compare to work on individual node(tables)?

@Huite
Copy link

Huite commented Feb 3, 2025

My two cents:

  • equals should return a bool
  • A list of difference seems useful, makes sense to implement it per table.
  • Compare with https://pandas.pydata.org/docs/reference/api/pandas.testing.assert_frame_equal.html#pandas.testing.assert_frame_equal
  • The assert_frame_equal has a bunch of options; I can imagine analogous options are imaginable for ribasim input
  • What to return: printing or raising an error isn't ideal (since you're expected to parse by eye). I think returning a dataframe of differences makes sense. At that stage you're free to print it, or write it to csv, etc.
  • For a method: maybe difference isn't too bad (see set methods). Equals can call this and return True if length of difference dataframe is 0.

@DanielTollenaar
Copy link
Collaborator

As a model developer I want to be able to efficiently check if two models are equal, that basically covers all in database.gpkg (and theoretically toml-file, but let's skip that for now). What would be convenient:

Something that could work: result = model.equals(other_model), results is a list (or tuple) with tables that are not equal, so e.g. ["ManningResistance.Static", "Pump.Static"] means these tables are not the same. If the list is empty I know models are the same.

Than I can use model.manning_resistance.static.df.diff(oter_model.manning_resistance.static.df) to check what has changed in the table using native diff of Pandas (not working in GeoPandas though!): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.diff.html

@rbruijnshkv anything to add?

@rbruijnshkv
Copy link
Collaborator

Great suggestions so far—I agree.

It would be helpful to have functionality that allows the "meta_" columns to be ignored during difference comparisons. I often add additional information (sometimes temporarily) in these columns, which is later used to fill i.e. the control of connection nodes. Having the ability to exclude them would make it easier to identify where the model starts to deviate, which would be very helpful for debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Parameterisatie
Projects
Status: 📋 Backlog
Development

No branches or pull requests

6 participants