Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make InsertTupleIntoChunk more efficient #416

Conversation

Reminiscent
Copy link
Contributor

Fix #402

  1. Cache the detoast variables if they are both part of a filter and one of the output variables.
  2. Only contain the values that we actually need to output in the value.

@Reminiscent
Copy link
Contributor Author

I have two questions:

  1. Are there any easy way to add a bench test to show the improvement? Or we just add some complex test cases in the regression test and see the result.
  2. Now we still use vector for the values array, we can replace it to the map. But such a change may not be necessary, and it's unclear how much of an improvement it would bring.

Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing this. I left a few comments. Maybe don't spend time on implementing them yet though, because we're currently working on a big change to the reading and filtering. So it's a possibility that we actually end up removing this code.

@@ -34,6 +34,8 @@ class PostgresScanGlobalState {
std::vector<duckdb::TableFilter *> m_column_filters;
/* Duckdb output vector idx with information about postgres column id */
duckdb::vector<duckdb::pair<duckdb::idx_t, AttrNumber>> m_output_columns;
/* Store the column ID which needs to output in the set for quick lookup */
duckdb::set<AttrNumber> attr_to_output_set;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkaruza converted all the maps that we had to vectors in #366. Because the hashing of the keys had significant overhead. Since AttrNumber is bound to be small, I think it's better to use a vector of booleans (or a bitmap or something) to avoid the hashing.

Datum value;
bool should_free;
};
duckdb::map<idx_t, DetoastedValue> detoasted_values;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to allocate any datastructures in this function, because it's extremely performance sensitive. We should pre-allocate this one like we do for the others. We instead want to allocate them once. Also as described in my other we don't want to use map at all, so let's use a vector for this.

@JelteF
Copy link
Collaborator

JelteF commented Dec 9, 2024

Okay, so chances are very high that we'll remove this code completely in 0.3.0. So I don't think it makes sense to continue with this approach. Thanks anyway for your efforts.

@JelteF JelteF closed this Dec 9, 2024
@JelteF
Copy link
Collaborator

JelteF commented Dec 9, 2024

For reference #477 is the PR that removes this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make InsertTupleIntoChunk more efficient
2 participants