Skip to content

Column lengths are not equal from polars when reading parquet #14

@SteampunkIslande

Description

@SteampunkIslande

Using some real world kind of VCF, I had this issue with reading produced parquet file using polars:

import pyvcf2parquet as pv
import polars as pl
pv.convert_vcf("realworld_data.vcf.gz","test.parquet")

In [5]: pl.scan_parquet("test.parquet").head().collect()
Out[5]: thread 'ipython' panicked at crates/polars-core/src/fmt.rs:513:13:
The column lengths in the DataFrame are not equal.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
<ipython-input-5-5f2079c139cc> in ?()
----> 1 pl.scan_parquet("test.parquet").head().collect()

/media/charles/SANDISK/TousLesVCF_PPI/touslesppi/venv/lib/python3.10/site-packages/decorator.py in ?(*args, **kw)
    229         def fun(*args, **kw):
    230             if not kwsyntax:
    231                 args, kw = fix(args, kw, sig)
--> 232             return caller(func, *(extras + args), **kw)

/media/charles/SANDISK/TousLesVCF_PPI/touslesppi/venv/lib/python3.10/site-packages/polars/dataframe/frame.py in ?(self)
   1425     def __repr__(self) -> str:
-> 1426         return self.__str__()

/media/charles/SANDISK/TousLesVCF_PPI/touslesppi/venv/lib/python3.10/site-packages/polars/dataframe/frame.py in ?(self)
   1422     def __str__(self) -> str:
-> 1423         return self._df.as_str()

PanicException: The column lengths in the DataFrame are not equal.

Still, using duckdb I was able to read the parquet file just as normal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions