Skip to content

Commit 01a8633

Browse files
committed
Update api-docs.txt
1 parent 85e7eb1 commit 01a8633

File tree

1 file changed

+62
-2
lines changed

1 file changed

+62
-2
lines changed

pointblank/data/api-docs.txt

Lines changed: 62 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5919,6 +5919,66 @@ tbl_match(self, tbl_compare: 'FrameT | Any', pre: 'Callable | None' = None, thre
59195919
Aside from reporting failure conditions, thresholds can be used to determine the actions to
59205920
take for each level of failure (using the `actions=` parameter).
59215921

5922+
Cross-Backend Validation
5923+
------------------------
5924+
The `tbl_match()` method supports **automatic backend coercion** when comparing tables from
5925+
different backends (e.g., comparing a Polars DataFrame against a Pandas DataFrame, or
5926+
comparing database tables from DuckDB/SQLite against in-memory DataFrames). When tables with
5927+
different backends are detected, the comparison table is automatically converted to match the
5928+
data table's backend before validation proceeds.
5929+
5930+
**Certified Backend Combinations:**
5931+
5932+
All combinations of the following backends have been tested and certified to work (in both
5933+
directions):
5934+
5935+
- Pandas DataFrame
5936+
- Polars DataFrame
5937+
- DuckDB (native)
5938+
- DuckDB (as Ibis table)
5939+
- SQLite (via Ibis)
5940+
5941+
Note that database backends (DuckDB, SQLite, PostgreSQL, MySQL, Snowflake, BigQuery) are
5942+
automatically materialized during validation:
5943+
5944+
- if comparing **against Polars**: materialized to Polars
5945+
- if comparing **against Pandas**: materialized to Pandas
5946+
- if **both tables are database backends**: both materialized to Polars
5947+
5948+
This ensures optimal performance and type consistency.
5949+
5950+
**Data Types That Work Best in Cross-Backend Validation:**
5951+
5952+
- numeric types: int, float columns (including proper NaN handling)
5953+
- string types: text columns with consistent encodings
5954+
- boolean types: True/False values
5955+
- null values: `None` and `NaN` are treated as equivalent across backends
5956+
- list columns: nested list structures (with basic types)
5957+
5958+
**Known Limitations:**
5959+
5960+
While many data types work well in cross-backend validation, there are some known
5961+
limitations to be aware of:
5962+
5963+
- date/datetime types: When converting between Polars and Pandas, date objects may be
5964+
represented differently. For example, `datetime.date` objects in Pandas may become
5965+
`pd.Timestamp` objects when converted from Polars, leading to false mismatches. To work
5966+
around this, ensure both tables use the same datetime representation before comparison.
5967+
- custom types: User-defined types or complex nested structures may not convert cleanly
5968+
between backends and could cause unexpected comparison failures.
5969+
- categorical types: Categorical/factor columns may have different internal
5970+
representations across backends.
5971+
- timezone-aware datetimes: Timezone handling differs between backends and may cause
5972+
comparison issues.
5973+
5974+
Here are some ideas to overcome such limitations:
5975+
5976+
- for date/datetime columns, consider using `pre=` preprocessing to normalize representations
5977+
before comparison.
5978+
- when working with custom types, manually convert tables to the same backend before using
5979+
`tbl_match()`.
5980+
- use the same datetime precision (e.g., milliseconds vs microseconds) in both tables.
5981+
59225982
Examples
59235983
--------
59245984
For the examples here, we'll create two simple tables to demonstrate the `tbl_match()`
@@ -5980,8 +6040,8 @@ tbl_match(self, tbl_compare: 'FrameT | Any', pre: 'Callable | None' = None, thre
59806040
validation
59816041
```
59826042

5983-
The validation table shows that the test unit failed because the tables don't match (one
5984-
value is different in column `c`).
6043+
The validation table shows that the single test unit failed because the tables don't match
6044+
(one value is different in column `c`).
59856045

59866046

59876047
conjointly(self, *exprs: 'Callable', pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

0 commit comments

Comments
 (0)