Skip to content

Major Refactor v1.0.0: Redesign for compare logic into modular "Comparators" #397

@fdosani

Description

@fdosani

Had a look into the implementation -- the actual column comparison code (def columns_equal) seems rather unflexible/specifically built for use cases at Capital One. Here's two ideas how to deal with the NumPy array issue:

A) Add new fixed logic for NumPy arrays: try to detect NumPy array columns by looking at the actual series values. Use .all() for NumPy arrays.

B) Add a new system for custom declaration of "comparators", ie. give more flexibility to the user to configure how columns are compared. We would ship a default configuration that mimics the current behavior, and users would be free to change the configuration to their liking. This could be as simple as giving a list of comparators that are tried in order until one of them "understand" the data, ie. the user could pass something like:

columns_equal(..., comparators=[

    FloatComparator(rtol=1e-3),

    StringComparator(case_sensitive=False),

    ArrayComparator(aggregate="all")  # calls .all()

])

Or it could be an explicit list of comparators for each column, or something similar.

Originally posted by @jonashaag in #58

This was a old issue from a while back but want to revist and get some thoughts from folks on if this is something we should look into and or persue. the idea of compartmentalizing "comparators" from a design perspective feels cleaner and nice. This could also allow people to build out their own custom ones and tweak to their liking.

tagging the @capitalone/datacompy-write-team for their thoughts and opinions on this.

  • The initial set could use dispatching to house logic for all supported data types within say FloatComparator.
  • Some builtins could be:
    • Numeric
    • String
    • Date/String
    • Temporal

Sub-issues

Metadata

Metadata

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions