Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrimitiveGenericRecord interface #45

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FelixGV
Copy link
Collaborator

@FelixGV FelixGV commented Apr 13, 2020

No description provided.

* @return true if the field has a value defined, or false if it is null
* @throws AvroRuntimeException
*/
boolean isDefined(String key) throws AvroRuntimeException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - "isDefined" sounds (to me?) more like if the schema contains a field. maybe hasValue()? of nonNull() ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't sure how to name this one. Also thought of isSet(name)... what's your favorite wording?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if its set to null? :-)

@@ -0,0 +1,103 @@
package com.linkedin.avro.primitive;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - if we're gonna go extending avro ... "com.linkedin.avro.extensions" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, or maybe com.linkedin.avro.api?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me

Copy link
Contributor

@radai-rosenblatt radai-rosenblatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think a benchmark is in order before we embark on this?

@volauvent
Copy link
Collaborator

Nice! I think it definitely can improve deser performance further like PrimitiveFloatList.
@radai-rosenblatt Here is the performance improvement brought by PrimitiveFloatList benchmarked by a Float Array with 800 elements -

  • vanilla avro 1.4
    Deser latency (ns) : 24603.5
    Memory alloc (KB/exe) : 15.9
  • avro-fastserde with PrimitiveFloatList
    Deser latency (ns) : 3926.4
    Memory alloc (KB/exe) : 3.2
  • avro-fastserde with GenericArray<Float>
    Deser latency (ns) : 6904.2
    Memory alloc (KB/exe) : 15.8

N.B.: This is still a work in progress. There are some build issues
that I need to debug. And the code overall is untested.
@gaojieliu
Copy link
Collaborator

@volauvent
For primitive list, the big benefit is totally reasonable.
But for the primitive type for each field, we may need to do another round of benchmark to understand the benefit accurately, and it should be related to how many primitive types the users are planning to store inside the record.
Can we try different number of primitive fields and evaluate the benefit?

@FelixGV
Also do we recommend the usage to everyone or just the ones who are heavily using primitive fields in the record types? Since I could see another coding overhead while using this new type, I would like to understand the proper use case.

@volauvent
Copy link
Collaborator

@gaojieliu since we only have primitiveFloatList now, we can vary float array size and compare the performance diff between Primitive float and default box one. The above metric is for 800 FloatArray, are you saying reduce the array size to a small number to understand the real performance improvement brought by primitive float type, such as to 5?

FelixGV added a commit to FelixGV/avro-util that referenced this pull request May 28, 2020
The PrimitiveFloatList is an API which users should expect to rely on,
so it is wrong to degrade to the more constrained List<Float> API while
Fast-Avro is still cold. This commit introduces several changes to make
the extended API reliably present whenever using Fast-Avro, regardless
of being cold or warm.

- Changed PrimitiveFloatList to an interface, in a new package called:
  com.linkedin.avro.api; since the package name migration makes this an
  incompatible change, it would be desirable for the next release to
  not increment only the patch version. Having a proper package name
  for API extension should make things cleaner in the future as we add
  other optimized APIs (e.g. PR linkedin#45).
- Renamed the old class to ByteBufferBackedPrimitiveFloatList, and made
  it implement the new interface.
- Added new several new classes to ensure that the PrimitiveFloatList
  is always returned even when Fast-Avro falls back to vanilla Avro:
  - ColdPrimitiveFloatList which is a naive implementation that simply
    implements the new API by delegating to the regular Avro functions.
    This does not provide any GC benefits, but at least maintains the
    API.
  - ColdGenericDatumReader and ColdSpecificDatumReader which extend the
    GenericDatumReader and SpecificDatumReader classes, respectively,
    from vanilla Avro.
  - ColdDatumReaderMixIn which provides a utility function to minimize
    repeated code between the two DatumReader functions.
- Significantly refactored the FastGenericDeserializerGeneratorTest so
  that it tests three permutations: vanilla, cold fast and warm fast.
  As part of doing this, several test short-comings were discovered and
  fixed. In particular, the decodeRecordSlow function had some flipped
  parameters which led to test failures on vanilla Avro, and those test
  failures were hidden by the fact that some tests ignored he provided
  permutation param and systematically tested only Fast-Avro.
FelixGV added a commit to FelixGV/avro-util that referenced this pull request May 28, 2020
The PrimitiveFloatList is an API which users should expect to rely on,
so it is wrong to degrade to the more constrained List<Float> API while
Fast-Avro is still cold. This commit introduces several changes to make
the extended API reliably present whenever using Fast-Avro, regardless
of being cold or warm.

- Changed PrimitiveFloatList to an interface, in a new package called:
  com.linkedin.avro.api; since the package name migration makes this an
  incompatible change, it would be desirable for the next release to
  not increment only the patch version. Having a proper package name
  for API extension should make things cleaner in the future as we add
  other optimized APIs (e.g. PR linkedin#45).
- Renamed the old class to ByteBufferBackedPrimitiveFloatList, and made
  it implement the new interface.
- Added new several new classes to ensure that the PrimitiveFloatList
  is always returned even when Fast-Avro falls back to vanilla Avro:
  - ColdPrimitiveFloatList which is a naive implementation that simply
    implements the new API by delegating to the regular Avro functions.
    This does not provide any GC benefits, but at least maintains the
    API.
  - ColdGenericDatumReader and ColdSpecificDatumReader which extend the
    GenericDatumReader and SpecificDatumReader classes, respectively,
    from vanilla Avro.
  - ColdDatumReaderMixIn which provides a utility function to minimize
    repeated code between the two DatumReader functions.
- Significantly refactored the FastGenericDeserializerGeneratorTest so
  that it tests three permutations: vanilla, cold fast and warm fast.
  As part of doing this, several test short-comings were discovered and
  fixed. In particular, the decodeRecordSlow function had some flipped
  parameters which led to test failures on vanilla Avro, and those test
  failures were hidden by the fact that some tests ignored he provided
  permutation param and systematically tested only Fast-Avro.
FelixGV added a commit to FelixGV/avro-util that referenced this pull request May 29, 2020
The PrimitiveFloatList is an API which users should expect to rely on,
so it is wrong to degrade to the more constrained List<Float> API while
Fast-Avro is still cold. This commit introduces several changes to make
the extended API reliably present whenever using Fast-Avro, regardless
of being cold or warm.

- Changed PrimitiveFloatList to an interface, in a new package called:
  com.linkedin.avro.api; since the package name migration makes this an
  incompatible change, it would be desirable for the next release to
  not increment only the patch version. Having a proper package name
  for API extension should make things cleaner in the future as we add
  other optimized APIs (e.g. PR linkedin#45).
- Renamed the old class to ByteBufferBackedPrimitiveFloatList, and made
  it implement the new interface.
- Added new several new classes to ensure that the PrimitiveFloatList
  is always returned even when Fast-Avro falls back to vanilla Avro:
  - ColdPrimitiveFloatList which is a naive implementation that simply
    implements the new API by delegating to the regular Avro functions.
    This does not provide any GC benefits, but at least maintains the
    API.
  - ColdGenericDatumReader and ColdSpecificDatumReader which extend the
    GenericDatumReader and SpecificDatumReader classes, respectively,
    from vanilla Avro.
  - ColdDatumReaderMixIn which provides a utility function to minimize
    repeated code between the two DatumReader functions.
- Significantly refactored the FastGenericDeserializerGeneratorTest so
  that it tests three permutations: vanilla, cold fast and warm fast.
  As part of doing this, several test short-comings were discovered and
  fixed. In particular, the decodeRecordSlow function had some flipped
  parameters which led to test failures on vanilla Avro, and those test
  failures were hidden by the fact that some tests ignored the provided
  permutation param and systematically tested only Fast-Avro.
FelixGV added a commit to FelixGV/avro-util that referenced this pull request May 29, 2020
The PrimitiveFloatList is an API which users should expect to rely on,
so it is wrong to degrade to the more constrained List<Float> API while
Fast-Avro is still cold. This commit introduces several changes to make
the extended API reliably present whenever using Fast-Avro, regardless
of being cold or warm.

- Changed PrimitiveFloatList to an interface, in a new package called:
  com.linkedin.avro.api; since the package name migration makes this an
  incompatible change, it would be desirable for the next release to
  not increment only the patch version. Having a proper package name
  for API extension should make things cleaner in the future as we add
  other optimized APIs (e.g. PR linkedin#45).
- Renamed the old class to ByteBufferBackedPrimitiveFloatList, and made
  it implement the new interface.
- Added new several new classes to ensure that the PrimitiveFloatList
  is always returned even when Fast-Avro falls back to vanilla Avro:
  - ColdPrimitiveFloatList which is a naive implementation that simply
    implements the new API by delegating to the regular Avro functions.
    This does not provide any GC benefits, but at least maintains the
    API.
  - ColdGenericDatumReader and ColdSpecificDatumReader which extend the
    GenericDatumReader and SpecificDatumReader classes, respectively,
    from vanilla Avro.
  - ColdDatumReaderMixIn which provides a utility function to minimize
    repeated code between the two DatumReader functions.
- Significantly refactored the FastGenericDeserializerGeneratorTest so
  that it tests three permutations: vanilla, cold fast and warm fast.
  As part of doing this, several test short-comings were discovered and
  fixed. In particular, the decodeRecordSlow function had some flipped
  parameters which led to test failures on vanilla Avro, and those test
  failures were hidden by the fact that some tests ignored the provided
  permutation param and systematically tested only Fast-Avro.
FelixGV added a commit that referenced this pull request May 29, 2020
The PrimitiveFloatList is an API which users should expect to rely on,
so it is wrong to degrade to the more constrained List<Float> API while
Fast-Avro is still cold. This commit introduces several changes to make
the extended API reliably present whenever using Fast-Avro, regardless
of being cold or warm.

- Changed PrimitiveFloatList to an interface, in a new package called:
  com.linkedin.avro.api; since the package name migration makes this an
  incompatible change, it would be desirable for the next release to
  not increment only the patch version. Having a proper package name
  for API extension should make things cleaner in the future as we add
  other optimized APIs (e.g. PR #45).
- Renamed the old class to ByteBufferBackedPrimitiveFloatList, and made
  it implement the new interface.
- Added new several new classes to ensure that the PrimitiveFloatList
  is always returned even when Fast-Avro falls back to vanilla Avro:
  - ColdPrimitiveFloatList which is a naive implementation that simply
    implements the new API by delegating to the regular Avro functions.
    This does not provide any GC benefits, but at least maintains the
    API.
  - ColdGenericDatumReader and ColdSpecificDatumReader which extend the
    GenericDatumReader and SpecificDatumReader classes, respectively,
    from vanilla Avro.
  - ColdDatumReaderMixIn which provides a utility function to minimize
    repeated code between the two DatumReader functions.
- Significantly refactored the FastGenericDeserializerGeneratorTest so
  that it tests three permutations: vanilla, cold fast and warm fast.
  As part of doing this, several test short-comings were discovered and
  fixed. In particular, the decodeRecordSlow function had some flipped
  parameters which led to test failures on vanilla Avro, and those test
  failures were hidden by the fact that some tests ignored the provided
  permutation param and systematically tested only Fast-Avro.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants