-
Notifications
You must be signed in to change notification settings - Fork 13.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-37192] [pyflink] Replace deprecated avro-python3 with avro #26008
[FLINK-37192] [pyflink] Replace deprecated avro-python3 with avro #26008
Conversation
7c11f44
to
8c67bb7
Compare
@mina-asham Thanks for the PR! LGTM overall. Could you create a JIRA ticket and also rebase the PR to fix the code conflict of file setup.py? |
8d2543b
to
84398d2
Compare
@dianfu thanks for reviewing!
✅ Done |
84398d2
to
6239582
Compare
@flinkbot run azure |
@mina-asham Thanks for the update! These are test failures, could you take a look? |
- avro-python3 was deprecated and replaced by avro, the first hasn't had a release since March 17, 2021 while the second has had multiple fixes and updated, latest is August 5, 2024 - Both libraries are the exact same (i.e. `avro-python3` was just renamed to `avro` and had multiple updates since), but the problem is that they overlap in package name, so using PyFlink with any updated library that relies on `avro` fails starting the pipeline even if the pipeline doesn't actually do any avro encoding/decoding - This updates the library to the latest one, and fixes a few imports, other than that the library's functionality is exactly the same
6239582
to
f93d04a
Compare
@flinkbot run azure |
@dianfu fixed, thanks for catching this |
@@ -318,7 +318,7 @@ def extracted_output_files(base_dir, file_path, output_directory): | |||
|
|||
install_requires = ['py4j==0.10.9.7', 'python-dateutil>=2.8.0,<3', | |||
'apache-beam>=2.54.0,<=2.61.0', | |||
'cloudpickle>=2.2.0', 'avro-python3>=1.8.1,!=1.9.2', | |||
'cloudpickle>=2.2.0', 'avro>=1.12.0', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be slight differences between the old avro-python3 and the new preferred avro packages. I assume the other changes in this pr relate to the slight changes. The original code seems to be tolerating a range of versions, do we know why? Or is the latest fine?
I am curious whether fastavro needs to be updated to be compatible with avro 1.12.0 in any way. fastavro 1.1 is 5 years old - should we push this to the latest as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be slight differences between the old avro-python3 and the new preferred avro packages. I assume the other changes in this pr relate to the slight changes. The original code seems to be tolerating a range of versions, do we know why? Or is the latest fine?
I think we are fine, basically avro-python3
was deprecated in favour of avro
, it's the same package just renamed, but they also did some breaking changes in these updates (changing package names, slightly changing some names, etc...), that's what the code changes here account for. We still account for a range of versions here though, which afaik is a Python best practice so that you don't lock up consumers to a certain library.
I am curious whether fastavro needs to be updated to be compatible with avro 1.12.0 in any way. fastavro 1.1 is 5 years old - should we push this to the latest as well.
fastavro
is completely unrelated (beats me with Flink is using two separate Avro libraries tbh), avro-python3/avro
is a pure Python avro implementation, fastavro
is written in C to offer faster execution. Might be worth using a single Python package for avro or upgrading avro, but I think that's a bigger effort and should be out of scope of this smaller change here.
Reviewed by Chi on 23/01/2025 Go back to the submitter with review comments. |
@davidradl thanks for reviewing, responded and re-requested review. |
@mina-asham Thanks for the update. LGTM. |
Will wait to see if @davidradl has other comments before merging the PR. |
Merged to master via aef8c86 |
- avro-python3 was deprecated and replaced by avro, the first hasn't had a release since March 17, 2021 while the second has had multiple fixes and updated, latest is August 5, 2024 - Both libraries are the exact same (i.e. `avro-python3` was just renamed to `avro` and had multiple updates since), but the problem is that they overlap in package name, so using PyFlink with any updated library that relies on `avro` fails starting the pipeline even if the pipeline doesn't actually do any avro encoding/decoding - This updates the library to the latest one, and fixes a few imports, other than that the library's functionality is exactly the same This closes #26008.
What is the purpose of the change
Replace deprecated avro-python3 with avro
avro-python3
was just renamed toavro
and had multiple updates since), but the problem is that they overlap in package name, so using PyFlink with any updated library that relies onavro
fails starting the pipeline even if the pipeline doesn't actually do any avro encoding/decodingBrief change log
avro-python3
(deprecated since 2021) withavro
in PyFlinkVerifying this change
This change is already covered by existing tests, such as all the existing avro tests
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation