Skip to content

Comments

perf: Optimize concat()/concat_ws() UDFs#20317

Merged
Jefffrey merged 3 commits intoapache:mainfrom
neilconway:neilc/optimize-concat
Feb 19, 2026
Merged

perf: Optimize concat()/concat_ws() UDFs#20317
Jefffrey merged 3 commits intoapache:mainfrom
neilconway:neilc/optimize-concat

Conversation

@neilconway
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Faster is better.

What changes are included in this PR?

This commit implements three optimizations:

  • In StringViewArrayBuilder, we recreated block after every call to append_offset. It is cheaper to instead clear and re-use block.

  • In StringViewArrayBuilder::write(), we re-validated that a string array consists of valid UTF8 characters. This was unnecessary work and can be skipped.

  • In the concat() UDF implementation, we miscalculated the initial size of the StringViewArrayBuilder buffer. This didn't lead to incorrect behavior but it resulted in unnecessarily needing to reallocate the buffer.

Are these changes tested?

Yes; no additional test cases warranted.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the functions Changes to functions implementation label Feb 12, 2026
@neilconway
Copy link
Contributor Author

Benchmark results:

$ cargo bench --bench concat -- --baseline concat-vanilla
   Compiling datafusion-functions v52.1.0 (/Users/neilconway/datafusion/datafusion/functions)
    Finished `bench` profile [optimized] target(s) in 46.20s
     Running benches/concat.rs (target/release/deps/concat-4233805a3f40f888)
Gnuplot not found, using plotters backend
concat function/concat/1024
                        time:   [10.485 µs 10.510 µs 10.544 µs]
                        change: [−5.7512% −5.3786% −4.9576%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

concat function/concat/4096
                        time:   [39.832 µs 39.896 µs 39.957 µs]
                        change: [−6.1783% −5.6993% −5.2265%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

concat function/concat/8192
                        time:   [79.562 µs 79.690 µs 79.831 µs]
                        change: [−5.3363% −5.0808% −4.8152%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

concat function/concat_view/1024
                        time:   [23.159 µs 23.197 µs 23.236 µs]
                        change: [−82.546% −82.420% −82.307%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild

concat function/concat_view/4096
                        time:   [97.326 µs 97.967 µs 98.620 µs]
                        change: [−81.132% −81.027% −80.908%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  7 (7.00%) high mild

concat function/concat_view/8192
                        time:   [206.99 µs 207.77 µs 208.67 µs]
                        change: [−80.382% −80.269% −80.158%] (p = 0.00 < 0.05)
                        Performance has improved.

concat function/concat/scalar
                        time:   [1.0204 µs 1.0226 µs 1.0249 µs]
                        change: [−0.7254% −0.4513% −0.2078%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild

This commit implements three optimizations:

* In `StringViewArrayBuilder`, we re-allocated `block` after every call
to `append_offset`. It is cheaper to instead clear and re-use `block`.

* In `StringViewArrayBuilder::write()`, we re-validated that a string
array consists of valid UTF8 characters. This was unnecessary work and
can be skipped.

* In the concat() UDF implementation, we miscalculated the initial size
of the StringViewArrayBuilder buffer. This didn't lead to incorrect
behavior but it resulted in unnecessarily needing to reallocate the
buffer.
@neilconway neilconway force-pushed the neilc/optimize-concat branch from 0eac046 to 992b569 Compare February 13, 2026 22:36
@neilconway
Copy link
Contributor Author

@Jefffrey Is this okay to land in main, do you think? Lmk if you have other feedback or concerns.

@martin-g martin-g self-assigned this Feb 18, 2026
@martin-g martin-g changed the title perf: Optimize concat() UDF perf: Optimize concat()/concat_ws() UDFs Feb 18, 2026
let string_array = as_string_view_array(array)?;

data_size += string_array.len();
data_size += string_array.total_buffer_bytes_used();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not include the inlined strings (shorter than 12 bytes). An array of only short strings will report 0.
Is this a problem ?!

If YES then let's use something like:

self.views().iter().sum().min(usize::MAX) as usize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.

On your suggested fix: summing the entire 128 bit view seems wrong (we don't want to add the buffer indexes and offsets, etc.). I think it would work to just use the low 32 bits of each view, but I'm leery of depending on an Arrow implementation detail like that.

We could iterate over the strings and sum their length, but that seems like overkill for a buffer sizing hint.

How about:

  1. Use total_buffer_bytes_used() for now; this is just a capacity hint, so an estimate is okay
  2. Add a comment for the short-strings case
  3. Open an issue in Arrow to have them add a helper for this use-case, as it seems reasonably useful in general

Copy link
Member

@martin-g martin-g Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is similar to the impl at https://github.com/apache/arrow-rs/blob/2f40f78e4feae3aee261d9608cede9535e1429e0/arrow-array/src/array/byte_view_array.rs#L685. It just does not filter out the small strings and uses u128 for the sum before the final cast to usize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in Arrow casts *v to u32, which takes the low 32 bits. Summing all 128 bit values and then taking the min of that value and usize::MAX does not seem to do the right thing, unless I'm misunderstanding completely.

In any case, I'd prefer to not depend on Arrow implementation details, if we can avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed apache/arrow-rs#9435 for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right!

@Jefffrey
Copy link
Contributor

I think this should be good to merge once the conflict is resolved; it would be nice indeed if we had a unified way of estimating (whether accurate or not) view size, whether upstream or in a common function in DataFusion but that can be a follow up

@neilconway
Copy link
Contributor Author

@Jefffrey Great! I resolved the merge conflict.

@Jefffrey Jefffrey added this pull request to the merge queue Feb 19, 2026
Merged via the queue into apache:main with commit 4f4e814 Feb 19, 2026
28 checks passed
@Jefffrey
Copy link
Contributor

Thanks @neilconway, @alamb & @martin-g

@neilconway neilconway deleted the neilc/optimize-concat branch February 19, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize concat()

4 participants