Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polars.read_json fails when reading empty list from json response #7355

Closed
2 tasks done
liammcknight95 opened this issue Mar 5, 2023 · 3 comments · Fixed by #18827
Closed
2 tasks done

polars.read_json fails when reading empty list from json response #7355

liammcknight95 opened this issue Mar 5, 2023 · 3 comments · Fixed by #18827
Labels
A-io Area: reading and writing data bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@liammcknight95
Copy link

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Noticed this was addressed in a recent fix for read_ndjson but seems to have slipped through on read_json.

polars.read_json fails when the response is empty, seems as if the parser doesn't know what to to do in this case and returns the following error BindingsError:"ArrowError(NotYetImplemented("read an Array from a non-Array data type"))"

May be a bit of an edge case but is somewhat of an issue when dealing with json responses from http requests that show 200 but give an empty body. Currently circumvent it by passing the response via json.loads (well orjson given it's improvements vs stdlib json) to a DataFrame constructor i.e. as shown below. Feel the read_json should be able to handle this

Getting deeper and deeper into Rust out of personal interest, so will try and submit a pull request to fix myself if I get the time - but realise this may be a trivial handle for someone else

Reproducible example

import polars as pl
import json   

empty_list = b'[]'

pl.read_json(empty_list) -> gives a BindingsError

pl.DataFrame(json.loads(empty_list)) -> outputs to empty DataFrame

Expected behavior

Would expect this simply to output to an empty DataFrame

Installed versions

---Version info---
Polars: 0.16.11
Index type: UInt32
Platform: Windows-10-10.0.22621-SP0
Python: 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
---Optional dependencies---
pyarrow: <not installed>
pandas: <not installed>
numpy: <not installed>
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
deltalake: <not installed>
matplotlib: <not installed>
@liammcknight95 liammcknight95 added bug Something isn't working python Related to Python Polars labels Mar 5, 2023
@cmdlineluser
Copy link
Contributor

Incase it's relevant, I just ran into a similar sort of issue: #6745

I found .json_extract() which seems to handle some cases where .read_json fails.

It returns null in this particular case, as opposed to an empty dataframe.

import polars as pl

empty_list = b"[]"

pl.DataFrame({"json": [empty_list]}).with_columns(
   pl.col("json").cast(pl.Utf8).str.json_extract())
shape: (1, 1)
┌──────┐
│ json │
│ ---  │
│ null │
╞══════╡
│ null │
└──────┘

@liammcknight95 liammcknight95 changed the title polars.read_json fails when polars.read_json fails when reading empty list Mar 6, 2023
@liammcknight95 liammcknight95 changed the title polars.read_json fails when reading empty list polars.read_json fails when reading empty list from json response Mar 6, 2023
@migel
Copy link

migel commented Jan 5, 2024

I'm not sure if it's related but there is also a problem with empty lists in rust code:

//!
//! ```cargo
//! [dependencies]
//! polars = { version = "0.36.2", features = ["json"] }
//! ```

use polars::prelude::*;

fn main() {
    let f = std::io::Cursor::new("[]");
    let df = JsonReader::new(f).finish();
    println!("{:?}", df);
}

output:

thread 'main' panicked at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/infer.rs:19:10:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:127:5
   3: core::option::Option<T>::unwrap
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/option.rs:931:21
   4: polars_io::json::infer::json_values_to_supertype
             at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/infer.rs:10:5
   5: <polars_io::json::JsonReader<R> as polars_io::SerReader<R>>::finish
             at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/mod.rs:252:25
   6: p_test_d8b0191af191a30a7a7c462b::main
             at ./p-test.rs:12:14
   7: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@stinodego stinodego added needs triage Awaiting prioritization by a maintainer A-io Area: reading and writing data P-low Priority: low and removed needs triage Awaiting prioritization by a maintainer labels Jan 13, 2024
@Delt4Nin3
Copy link

I had the same problem in Rust, checking if the JSON is empty before calling JsonReader::new(f) helped to work around this error, might make sense in Python too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: reading and writing data bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Archived in project
5 participants