Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for BYTEA/BLOB #511

Merged
merged 5 commits into from
Jan 2, 2025
Merged

Add support for BYTEA/BLOB #511

merged 5 commits into from
Jan 2, 2025

Conversation

ritwizsinha
Copy link
Contributor

@ritwizsinha ritwizsinha commented Dec 22, 2024

Fixes #464

@ritwizsinha
Copy link
Contributor Author

@JelteF currently the output from select * from blob_tbl is an escaped version of inserted rows, is this fine or do we need to change that?

\x
\x5c7831315c7830315c7830325c7830335c7830345c7830355c7830365c7830375c7830385c7830395c7830415c7830425c7830435c7830445c7830455c783046
\x
\x5c783030
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the output from select * from blob_tbl is an escaped version of inserted rows, is this fine or do we need to change that?

Okay, this needs to change. All the non empty bytea results are wrong. When using postgres execution the it instead gives the following rows:

 \x
 \x110102030405060708090a0b0c0d0e0f
 \x
 \x00
 \x07

It seems like you're encoding the string as hex somewhere as an additional time, because 5c783030 is \x00 in ASCII, i.e. including the \ and x characters. So I think when converting between from DuckDB to PG type you're encoding the string representation of the type instead of the raw bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it, converting it to string was trying to convert those bytes to string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed GetValue to GetValueUnsafe for getting raw bytes in string_t

@JelteF JelteF changed the title Add support for BYTEAOID Add support for BYTEA/BLOB Jan 2, 2025
\x07

(6 rows)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comparison test too? Something like:

SELECT * FROM blob_tbl WHERE a = '\x00';

@JelteF JelteF merged commit b4878e4 into duckdb:main Jan 2, 2025
5 checks passed
auto str = value.GetValueUnsafe<duckdb::string_t>();
auto blob_len = str.GetSize();
auto blob = str.GetDataUnsafe();
bytea* result = (bytea *)palloc0(blob_len + VARHDRSZ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make format will re-format this line as

bytea *result = (bytea *)palloc0(blob_len + VARHDRSZ);

Copy link
Collaborator

@JelteF JelteF Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report. Fixed by: #518

JelteF added a commit that referenced this pull request Jan 3, 2025
In #496 we started using the newest clang-format to format our files,
but CI was still installing the old version. This meant that we were not
catching some unformatted files correctly. An example of this being:
#511 (comment)

This starts using the correct clang-format version in CI too and formats
any incorrectly formatted files.
@JelteF JelteF mentioned this pull request Jan 3, 2025
Y-- pushed a commit that referenced this pull request Jan 6, 2025
In #496 we started using the newest clang-format to format our files,
but CI was still installing the old version. This meant that we were not
catching some unformatted files correctly. An example of this being:
#511 (comment)

This starts using the correct clang-format version in CI too and formats
any incorrectly formatted files.
ritwizsinha pushed a commit to ritwizsinha/pg_duckdb that referenced this pull request Jan 11, 2025
In duckdb#496 we started using the newest clang-format to format our files,
but CI was still installing the old version. This meant that we were not
catching some unformatted files correctly. An example of this being:
duckdb#511 (comment)

This starts using the correct clang-format version in CI too and formats
any incorrectly formatted files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conversion: DuckDB Type BLOB -> postgres bytea failed
3 participants