Added a config to limit the code-gen and class loading #83

volauvent · 2020-08-05T18:19:02Z

The config sets the maximum number of fast SerDes classes
generated and loaded. It helps to limit the metaspace and codecache
usage brought by fast-avro runtime code-gen and class loading.

The limit config can be set via FastSerdeCache constructors.

It sets the maximum number of fast SerDes classes generated and loaded. It helps to limit the metaspace and codecache usage brought by fast-avro runtime code-gen and class loading. The limit config can be set via FastSerdeCache constructors.

codecov-commenter · 2020-08-05T18:34:44Z

Codecov Report

Merging #83 into master will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master      #83   +/-   ##
=========================================
  Coverage     53.42%   53.42%           
  Complexity      275      275           
=========================================
  Files            39       39           
  Lines          1662     1662           
  Branches        206      206           
=========================================
  Hits            888      888           
  Misses          692      692           
  Partials         82       82

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80ecbc2...5a79fa9. Read the comment docs.

gaojieliu · 2020-08-14T04:30:39Z

avro-fastserde/src/main/java/com/linkedin/avro/fastserde/FastSerdeBase.java

+    } else {
+      LOGGER.warn("Loaded fast serdes classes number {}, with limit set to {}", loadClassNum, loadClassLimit);
+    }
+    return null;


I think it will be clearer to just throw the exception instead of returning null.

Sure, will fix it.

gaojieliu · 2020-08-14T04:34:53Z

avro-fastserde/src/main/java/com/linkedin/avro/fastserde/FastSerdeCache.java

+   * @param limit
+   *            custom number {@link #generatedFastSerDesLimit}
+   */
+  public FastSerdeCache(int limit) {


Do we need to add other two constructors? Maybe this one is good enough.

No harm to add these two constructors :) For exmaple, I am using FastSerdeCache(Executor executorService, int limit) in the test.

gaojieliu · 2020-08-14T04:48:50Z

avro-fastserde/src/main/java/com/linkedin/avro/fastserde/FastSerdeCache.java

+    T result = null;
+    if (this.generatedSerDesNum.get() < this.generatedFastSerDesLimit) {
+      result = supplier.get();
+    } else if (this.generatedSerDesNum.get() == this.generatedFastSerDesLimit) {


I am seeing there are two counters, which are being maintained independently in FastSerdeCache and FastSerdeBase, and I think we should combine and just use one.
The idea in my mind:

Maintaining the counter and limit in FastSerdeCache.

Passing the limit enforcement function to FastSerdeBase via all the Generator classes.

The limit enforcement function can be this way:

private int generatedSerDeNum; private int generatedFastSerDeLimit; Predicate<Boolean> limitPredicate = (whetherIncrementCounter) -> { synchronized (this) { if (generatedSerDeNum >= generatedFastSerDeLimit) { return false; } if (whetherIncrementCounter) { ++generatedSerDeNum; } return true; } };

In FastSerdeCache and all kinds of generators before the fast class generation, we should call limitPredicate(false), and fail fast when this predicate returns false.

In FastSerdeBase, before loading the new generated class, we should call limitPredicate(true), and fail fast if the predicate return false.

Essentially, the idea is to keep the counting logic in a single place to make it consistent.
Invoking this function in FastSerdeCache and various Generators is to try to avoid useless work.

Let me know if you want to have a sync up about this.

Thanks for the detailed explaination!

This approach indeed helps to reduce two counters to one. However, it also leads to redundant code-gens and compilations which are CPU intensive tasks. IIUC, there's no limit for the code-gen and compilation before we really load generatedFastSerDeLimit number of fast classes. So the corner case is fast-avro may generate and compile a great number of fast classes, and then throw them away.

For the current implementation, we at most generate extra N - 1 fast classes, N is the threads number of FastSerdeCache Executor. So I think the current implementation is better. What do you think?

volauvent requested review from radai-rosenblatt, FelixGV and gaojieliu August 5, 2020 18:19

volauvent force-pushed the bingfeng-code-gen-limit branch from 2510845 to 5a79fa9 Compare August 5, 2020 18:28

gaojieliu requested changes Aug 14, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a config to limit the code-gen and class loading #83

Added a config to limit the code-gen and class loading #83

volauvent commented Aug 5, 2020

codecov-commenter commented Aug 5, 2020

gaojieliu Aug 14, 2020

volauvent Aug 15, 2020

gaojieliu Aug 14, 2020

volauvent Aug 15, 2020

gaojieliu Aug 14, 2020

volauvent Aug 15, 2020

Added a config to limit the code-gen and class loading #83

Are you sure you want to change the base?

Added a config to limit the code-gen and class loading #83

Conversation

volauvent commented Aug 5, 2020

codecov-commenter commented Aug 5, 2020

Codecov Report

gaojieliu Aug 14, 2020

Choose a reason for hiding this comment

volauvent Aug 15, 2020

Choose a reason for hiding this comment

gaojieliu Aug 14, 2020

Choose a reason for hiding this comment

volauvent Aug 15, 2020

Choose a reason for hiding this comment

gaojieliu Aug 14, 2020

Choose a reason for hiding this comment

volauvent Aug 15, 2020

Choose a reason for hiding this comment