Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading multiple file corrupt values and is also order dependent #534

Closed
Moelf opened this issue Nov 22, 2024 · 5 comments · Fixed by #535
Closed

Reading multiple file corrupt values and is also order dependent #534

Moelf opened this issue Nov 22, 2024 · 5 comments · Fixed by #535

Comments

@Moelf
Copy link
Contributor

Moelf commented Nov 22, 2024

flist1 = filter(contains("physics_TLA"), readdir("/data/jiling/TLA/julia_arrows/"; join=true))
flist2 = filter(contains("mRp20"), readdir("/data/jiling/TLA/julia_arrows/"; join=true))

Arrow.Table(flist1).proc |> unique
# 1-element Vector{Bool}:
# 0

Arrow.Table(flist2).proc |> unique
# 1-element Vector{Bool}:
# 1

length(Arrow.Table([flist1;]).proc), length(Arrow.Table([flist2;]).proc)
# (3000521, 10077)

length(Arrow.Table([flist1; flist2]).proc)
# 3010598

Arrow.Table([flist1; flist2]).proc |> unique
#1-element Vector{Bool}:
# 1
Arrow.Table([flist2; flist1]).proc |> unique
#1-element Vector{Bool}:
# 0
@quinnj
Copy link
Member

quinnj commented Nov 22, 2024

Can you be more descriptive of the issue here? Or provide a more simple/clear way to reproduce the issue? It seems to me that when you're trying to read just a single file as an "array of files" something is going wrong? Is that right?

@Moelf
Copy link
Contributor Author

Moelf commented Nov 22, 2024

Yea looks like the later file is overriding the values of booleans in earlier files. I suspect it's due to some sentinel value merge of some sort.

Will provide two sample files today

@Moelf
Copy link
Contributor Author

Moelf commented Nov 22, 2024

here's the file to reproduce: https://drive.proton.me/urls/1BP55Z7XDR#GSBCZ9hlUijm

julia> Arrow.Table(["d2.feather", "d1.feather"]).proc |> unique
1-element Vector{Bool}:
 0

julia> Arrow.Table(["d2.feather", "d1.feather"]).proc |> unique
2-element Vector{Bool}:
 1
 0

it stochatistically would yield wrong result from time to time

@Moelf
Copy link
Contributor Author

Moelf commented Nov 26, 2024

bump

@quinnj
Copy link
Member

quinnj commented Nov 26, 2024

Ok, I believe #535 should fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants