Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support some compress functions #47307

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lzyy2024
Copy link

What problem does this PR solve?

Added the compress and uncompressed functions similar to mysql

Issue Number: close #45530

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remember to format your file


Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
uint32_t result, size_t input_rows_count) const override {
// LOG(INFO) << "Executing FunctionCompress with " << input_rows_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these commented lines

col_data[idx] = '0', col_data[idx + 1] = 'x';
for (int i = 0; i < 4; i++) {
unsigned char byte = (value >> (i * 8)) & 0xFF;
col_data[idx + 2 + i * 2] = "0123456789ABCDEF"[byte >> 4]; // 高4位
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use Chinese

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and make magic values


auto st = compression_codec->compress(data, &compressed_str);

if (!st.ok()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment about when will it fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add cases like regression-test/suites/query_p0/sql_functions/test_template_one_arg.groovy did

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need modify this file anymore

std::string func_name = "compress";
InputTypeSet input_types = {TypeIndex::String};

// 压缩多个不同的字符串
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use Chinese comment

std::string uncompressed;
Slice data;
Slice uncompressed_slice;
for (int row = 0; row < input_rows_count; row++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use size_t, not int

illegal = 1;
} else {
if (data[0] != '0' || data[1] != 'x') {
LOG(INFO) << "illegal: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont log info here

if (x >= 'A' && x <= 'F') return true;
return false;
};
auto trans = [](char x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use from_chars and to_chars to replace your user implemented lambdas

// Print the compressed string (after compression)
// LOG(INFO) << "Compressed string at row " << row << ": "
// << std::string(reinterpret_cast<const char*>(col_data.data()));
col_offset[row] = col_offset[row - 1] + 10 + compressed_str.size() * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this value for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first ten digits of the compress value are "0x" and eight digits long, followed by each digit split into two hexadecimal values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement](good-first-issue) Support some compress functions
3 participants