-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
apache/arrow#13535 schema is replaced by the t.schema import numpy as np
import pyarrow as pa
arr = pa.array(np.arange(8))
schema = pa.schema([
pa.field('nums', arr.type)
])
n_legs = pa.array([2, 4, 5, 100])
t = pa.Table.from_arrays([n_legs], names=["nums"],metadata = {"loc":"san diego"})
with pa.OSFile('arraydata.arrow', 'wb') as sink:
with pa.ipc.new_file(sink, schema=t.schema) as writer:
writer.write_table(t) printing out the metadata import pyarrow as pa
# f = '/tmp/fib.arrow'
f = 'arraydata.arrow'
with pa.ipc.open_file(f) as reader:
t = reader.read_all()
# print(t.to_string(show_metadata=True, preview_cols=2))
print(t.schema.metadata) |
Beta Was this translation helpful? Give feedback.
apache/arrow#13535
I posted it to the Apache-Arrow community, David liu answered that in arrow, the table metadata is located inside the schema. When I create a writer with a given schema, the table written by the writer must follow the schema. In this example, the metadata in the "schema" variable is None, and the metadata in the t is {"loc":"san diego"}. When creating the writer, the schema is initialized to "schema", which means, whatever t is, its metadata is replaced by None. Further, I found out that, to access the schema, I can call t.schema.metadata. Below is the updated example, printing out extracted metadata.
schema is replaced by the t.schema
pwriter.py