Cache record names to avoid hitting class loader #219

marcospassos · 2020-09-11T18:20:49Z

We use Jackson in our stream processing pipeline, and we noticed that the serialization process was consuming most of the CPU time. After some profiling, we tracked down the issue to the method which resolves the record's name. In the current implementation, for every record visited, the class loader checks if a class with the record's name exists, causing an IO bottleneck in high loads. By caching the result, we were able to boost our performance by 7000%.

cowtowncoder · 2020-09-11T18:51:03Z

avro/src/main/java/com/fasterxml/jackson/dataformat/avro/schema/AvroSchemaHelper.java

@@ -24,6 +24,8 @@

 public abstract class AvroSchemaHelper
 {
+    private static final Map<String, String> SCHEMA_NAME_CACHE = new HashMap<>();


I have some concerns about this.

Since it is static it is per-ClassLoader, and ideally nothing in Jackson would use that. If at all possible this should be tied to something else, probably AvroFactory (or AvroMapper if practical).

Second, all caches should be bound (not with unlimited size). This is easy to solve by just using

com.fasterxml.jackson.databind.util.LRUMap

with whatever size settings (default, max size) makes sense.

Also another significant problem is that access is not synchronized: since this will get called from multiple threads, and it is not read-only, access must be made thread-safe. LRUMap uses ConcurrentHashMap which solves that issue.

Question from a Java non-expert: will the call Class.forName(nestedClassName) (called in the method that resolves the name) use the default class loader? If so, no matter how we cache the names, it will still be broken.

Yes, that would use the default class loader of... well. It's probably the context class loader (alternative being class loader that loaded given class; usually these are the same but not always).

cowtowncoder · 2020-09-11T18:55:43Z

First of all, thank you for contributing this! I did not realize that this path could be used during actual serializations, results not cached (if I had, would have raised the concern).

I have some concerns about implementation, added notes. I think those should be resolvable.

I was also wondering about where to add this (2.11 vs 2.12). I suspect there is value in getting it in earlier (2.11.3 -- altho I really hope to get 2.12.0.rc1 out within 1 month now), and maybe there is low risk for issues.
So we can go with 2.11 branch as target.

marcospassos · 2020-09-11T21:43:54Z

Not sure how to tie the cache to the AvroFactory or AvroMapper. This method is transitively used across several classes, including value-objects not related to any of these instances. An example is AvroSchema, instantiated directly in most cases.

I've tracked down all classes using these methods, and the good news is that only two classes use it directly: AvroReaderFactory and AvroWriteContext.

As this is a severe issue for us, I'm going to work on it. I just need some guidance as it does seem to fit in the current design.

marcospassos · 2020-09-15T12:27:45Z

Hi @cowtowncoder, any comments?

What about using a static WeakHashMap where the class loader and the name cache are the keys and values respectively?

cowtowncoder · 2020-09-19T01:41:03Z

@marcospassos If I remember correctly, JDK offerings for weak/soft reference handling is unfortunately weak.
Let me see if I can find a proper place for cache. This is the problem with static utility and other things without context; they can not really have state, including caching.

cowtowncoder · 2020-09-19T02:25:25Z

Ok, yes, this is very tricky. For a moment I thought much/most could be moved within AvroSchema but unfortunately cursed Avro lib type Schema is used instead so can not really stick it in there -- fundamentally all we need is a one-time decision of whether to use one full name or another, so if we could wrap Schema in something, consistently (and hold on to references to that wrapper), no shared caching should be needed.

I hope to look more into this but I concur, this is badly implemented at this point & tricky to uproot. I wish I realized exactly how nasty this is -- now I understand why performance consequences are disastrous: failing to find a class can be VERY VERY expensive, esp. on bigger classpaths.

One thing that would be useful, if it was possible to obtain, would be to see specific path of calls, if profiler could give that information. There are quite a few callpaths to getTypeId(Schema) and getFullName(Schema) but I suspect they are not equally problematic.

cowtowncoder · 2020-09-19T02:59:55Z

One other thought: I am beginning to think that the "solution" in #195 is wrong -- anything that has to do class existence lookups seems like a Wrong Solution to a problem. So I am not ruling out a possibility of actually reverting that change.
That would not resolve the issue wrt 2.11 so perhaps static caching is a lesser evil, as long as size is bounded and both key and value are Strings. There is the need to use data struct that works in multi-threaded environment and com.fasterxml.jackson.databind.util.LRUMap works well.

marcospassos · 2020-09-19T20:51:16Z

@cowtowncoder Thank you again for the detailed review and support. I've pushed a commit that makes use of LRUMap instead of HashMap. I didn't include tests because I don't see how we could test it, so I left it out.

Please let me know if you need anything else before merging this. We are looking forward to seeing it included as part of the next release.

cowtowncoder · 2020-09-19T21:32:14Z

@marcospassos thanks! I think I'll take this for 2.11, hoping things can be cleared up in future. :)

…, done now

cowtowncoder · 2020-09-19T22:16:48Z

@marcospassos I did some changes mostly just to isolate cache bit further and use separate cache key type (if optimization is needed going all in :-D ). I realize that testing of this part is difficult, but it probably would be good if you could at least check that performance improvement is as expected (my main concern is that I managed to do something silly to render caching not work).

marcospassos · 2020-09-20T13:17:43Z

LGTM. We're going to monitor the IO metrics after the next release. I'll keep you posted.

baharclerode · 2020-09-20T15:35:09Z

anything that has to do class existence lookups seems like a Wrong Solution to a problem.

While I am in 100% agreement with this statement, Avro defined this behavior for matching nested classes in their reference implementation. :(

https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L245-L262

It's a choice between not being compatible with nested classes when using an Apache 1.9 schema, or doing class existence lookups.

... and it looks like Apache also handled this problem by just caching the Schema -> resolved full name as well :/

baharclerode · 2020-09-20T15:47:33Z

The "Right" Solution would be to emit a java-class (AvroSchemaHelper.AVRO_SCHEMA_PROP_CLASS) schema property on record schemas, and check/use that before doing a class lookup (Or only emitting that for nested classes, and assuming if it doesn't exist and the namespace doesn't exist, then it's a normal name, avoiding a class lookup in all cases); If you're in an ecosystem where you're using Jackson to generate the schemas and deserialize the records, that would avoid the class lookups altogether.

baharclerode · 2020-09-20T15:49:33Z

avro/src/main/java/com/fasterxml/jackson/dataformat/avro/schema/AvroSchemaHelper.java

+            case FIXED:
+                String namespace = schema.getNamespace();
+                String name = schema.getName();
+                String key = namespace + "." + name;


namespace might be null here (not when a schema is generated from a java class, but if using hand-crafted or schemas generated from avdl files)

Just double checked, schema.getFullName() should be suitable as a cache key here without any additional checks.

Hi there @baharclerode! Thank you for additional comments -- and yes, I realize that Avro's choice of "interesting" ways of dealing with stuff sort of forces our hand here.
Would be happy to merge a PR for removing helper class as key, use schema.getFullName() if that should work?

One other related question: if we had a setting to select between "1.8-and-earlier" vs "1.9-and-later" modes, would it be possible to use different logic, avoiding Class lookup or at least alleviate it?
Or does the problem exist regardless of this, with 1.9 and above?
(I realize that right now lack of context makes it impossible to have such a setting, but I still hope we can make AvroSchemaHelper become non-static and/or take some context object we control which could pass configuration, and ideally contain per-mapper (or per-schema or whatever) cache).

marcospassos · 2020-09-20T15:51:32Z

The "Right" Solution would be to emit a java-class (AvroSchemaHelper.AVRO_SCHEMA_PROP_CLASS) schema property on record schemas, and check/use that before doing a class lookup (Or only emitting that for nested classes, and assuming if it doesn't exist and the namespace doesn't exist, then it's a normal name, avoiding a class lookup in all cases); If you're in an ecosystem where you're using Jackson to generate the schemas and deserialize the records, that would avoid the class lookups altogether.

But it would introduce a non-standard attribute that can eventually break if Avro parser doesn't recognize it anymore, or am I missing something?

baharclerode · 2020-09-20T16:05:53Z

non-standard attribute

It's a standard attribute used everywhere except record/fixed/enum schemas (maps, arrays, large numbers, etc.), for some reason. The Apache parser wouldn't recognize it at all on records, so it'd just ignore it and fall back to class lookups if you're using Apache to deserialize avro records.

https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L73-L75

If Jackson emitted the property on every record/enum/fixed schema, you could maintain compatibility with deserializing all Apache 1.8 and 1.9 schemas, while optimizing the usecase where Jackson is being used both for schema generation and deserialization at the cost of bloating the schema size a bit. Every other combination would require fallbacks to class lookups (so we'd still need the cache).

The benefit is that it creates a potential path forward (if we could get Apache to adopt the same behavior) of moving back to a world where the general case doesn't require class existence checks (they would become a special case when deserializing from a Apache 1.9 schema)

marcospassos · 2020-09-20T16:10:25Z

In our case, we don't use schema generation, but I understand that it should work in the same way provided that we include the java-class attribute in the Avro schema, right? If so, indeed, it does seem to be the right path.

baharclerode · 2020-09-20T16:20:02Z

That would be the idea in such a hypothetical scenario. You could retroactively generate the property for any 1.8 schemas (or 1.9+ schemas if you're 100% sure there are no nested classes in use) and update the stored schemas for your data to add the property to bypass class existence checks, without breaking compatibility with anything. Which is still a big mess, but it's a workable mess for someone that has stored historical data they can't throw out.

baharclerode · 2020-09-20T16:22:44Z

The other option is to look into adding some compatibility feature flags to the AvroMapper whereby you could disable 1.9+ support to avoid the class existence checks if you're not using 1.9+ schemas.

cowtowncoder · 2020-09-21T22:06:41Z

If I understand the idea correctly, I like it: additional metadata to avoid lookups in best case; but fallbacks for other cases.

Cache record names to avoid hitting class loader

887f2db

cowtowncoder reviewed Sep 11, 2020

View reviewed changes

cowtowncoder mentioned this pull request Sep 19, 2020

[Avro] Remove dependencies upon Jackson 1.X and Avro's JacksonUtils #195

Merged

Use LRUCache instead of HashMap

db1b79d

cowtowncoder merged commit 80b8a19 into FasterXML:2.11 Sep 19, 2020

marcospassos deleted the record-name-caching branch September 19, 2020 21:33

cowtowncoder added this to the 2.11.3 milestone Sep 19, 2020

cowtowncoder added a commit that referenced this pull request Sep 19, 2020

Add release notes wrt #219, minor reorg

abe5d1d

cowtowncoder added a commit that referenced this pull request Sep 19, 2020

rewrote parts of #219 (use separate cache key; isolate cache further)…

51fa201

…, done now

baharclerode reviewed Sep 20, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache record names to avoid hitting class loader #219

Cache record names to avoid hitting class loader #219

marcospassos commented Sep 11, 2020 •

edited

Loading

cowtowncoder Sep 11, 2020

cowtowncoder Sep 11, 2020

marcospassos Sep 15, 2020

cowtowncoder Sep 19, 2020

cowtowncoder commented Sep 11, 2020

marcospassos commented Sep 11, 2020 •

edited

Loading

marcospassos commented Sep 15, 2020

cowtowncoder commented Sep 19, 2020

cowtowncoder commented Sep 19, 2020 •

edited

Loading

cowtowncoder commented Sep 19, 2020

marcospassos commented Sep 19, 2020

cowtowncoder commented Sep 19, 2020

cowtowncoder commented Sep 19, 2020

marcospassos commented Sep 20, 2020

baharclerode commented Sep 20, 2020

baharclerode commented Sep 20, 2020

baharclerode Sep 20, 2020

baharclerode Sep 20, 2020 •

edited

Loading

cowtowncoder Sep 21, 2020

marcospassos commented Sep 20, 2020

baharclerode commented Sep 20, 2020 •

edited

Loading

marcospassos commented Sep 20, 2020

baharclerode commented Sep 20, 2020

baharclerode commented Sep 20, 2020

cowtowncoder commented Sep 21, 2020

Cache record names to avoid hitting class loader #219

Cache record names to avoid hitting class loader #219

Conversation

marcospassos commented Sep 11, 2020 • edited Loading

cowtowncoder Sep 11, 2020

Choose a reason for hiding this comment

cowtowncoder Sep 11, 2020

Choose a reason for hiding this comment

marcospassos Sep 15, 2020

Choose a reason for hiding this comment

cowtowncoder Sep 19, 2020

Choose a reason for hiding this comment

cowtowncoder commented Sep 11, 2020

marcospassos commented Sep 11, 2020 • edited Loading

marcospassos commented Sep 15, 2020

cowtowncoder commented Sep 19, 2020

cowtowncoder commented Sep 19, 2020 • edited Loading

cowtowncoder commented Sep 19, 2020

marcospassos commented Sep 19, 2020

cowtowncoder commented Sep 19, 2020

cowtowncoder commented Sep 19, 2020

marcospassos commented Sep 20, 2020

baharclerode commented Sep 20, 2020

baharclerode commented Sep 20, 2020

baharclerode Sep 20, 2020

Choose a reason for hiding this comment

baharclerode Sep 20, 2020 • edited Loading

Choose a reason for hiding this comment

cowtowncoder Sep 21, 2020

Choose a reason for hiding this comment

marcospassos commented Sep 20, 2020

baharclerode commented Sep 20, 2020 • edited Loading

marcospassos commented Sep 20, 2020

baharclerode commented Sep 20, 2020

baharclerode commented Sep 20, 2020

cowtowncoder commented Sep 21, 2020

marcospassos commented Sep 11, 2020 •

edited

Loading

marcospassos commented Sep 11, 2020 •

edited

Loading

cowtowncoder commented Sep 19, 2020 •

edited

Loading

baharclerode Sep 20, 2020 •

edited

Loading

baharclerode commented Sep 20, 2020 •

edited

Loading