Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intern the String value #726

Open
1zg12 opened this issue Nov 17, 2021 · 2 comments
Open

Intern the String value #726

1zg12 opened this issue Nov 17, 2021 · 2 comments
Labels
pr-needed Feature request for which PR likely needed (no active development but idea is workable)

Comments

@1zg12
Copy link

1zg12 commented Nov 17, 2021

Version

2.12.4

Feature request

Existing scenario

I think the INTERN_FIELD_NAMES flag is great thing to have, which is turned on by default. This would save greatly on the memory footprint especially when the message size is huge, imagine ~3.2 million position messages, all with same key like 'portfolio', 'book'.
Instead of having ~3.2 million of portfolio in parsing each batch of message in heap, with INTERN_FIELD_NAMES, it would result in only one portfolio on the string pool regardless of the message sizes, being ~3.2 million or even more.

Changes proposed

A similar feature flag could be provided, even turned on by default as well, when parsing the values.
So that, back to the ~3.2 million records example, instead of having ~3.2 million portfolio names in the heap, the similar feature flag would result in only around ~200 portfolio name (like Jason, Jackson) in the string pool.

Possible changes

From here, it could take in the feature flag, and apply the intern if the flag is on
https://github.com/FasterXML/jackson-core/blob/2.14/src/main/java/com/fasterxml/jackson/core/util/TextBuffer.java#L797

    public String setCurrentAndReturn(int len) {
        _currentSize = len;
        // We can simplify handling here compared to full `contentsAsString()`:
        if (_segmentSize > 0) { // longer text; call main method
            return contentsAsString();
        }
        // more common case: single segment
        int currLen = _currentSize;
        String str = (currLen == 0) ? "" : new String(_currentSegment, 0, currLen);
        if (JsonFactory.Feature.`INTERN_FIELD_VALUES`.enabledIn(_flags)) {
            str = InternCache.instance.intern(str );
        }
        _resultString = str;
        return str;
    }
@1zg12
Copy link
Author

1zg12 commented Nov 17, 2021

To add some context, here is a peek at the String key and value memory address, with existing implementation:

18:15:58.102 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6c9fe73e0
18:15:58.102 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6c9fef010
18:15:58.102 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6c9ffbe00
18:15:58.103 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca003ad0
18:15:58.103 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca010fc0
18:15:58.103 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca018d90
18:15:58.103 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca025910
18:15:58.103 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca02d4f0
18:15:58.103 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca037e88
18:15:58.103 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca03f800
18:15:58.103 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca04a250
18:15:58.104 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca051ce8
18:15:58.104 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca05d0d8
18:15:58.104 [main] INFO  c.l.zg.TestConsumerString - portfolio:PM_JASON@: 0x6c37d99a8:: 0x6ca0649b8

The keys are pointing to same string in the pool, while the value, even though they are the same, they are created on heap with each encountering of new record.

@cowtowncoder
Copy link
Member

I am open to this idea, but probably requires some sort of handler to let customization of which String values are to be intern()ed and/or how to handle canonicalization (possibly using other mechanisms) -- most likely there would be limit to the length of String to intern().
Not sure what kind of interface should be used, PRs welcome.

@cowtowncoder cowtowncoder added the pr-needed Feature request for which PR likely needed (no active development but idea is workable) label Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-needed Feature request for which PR likely needed (no active development but idea is workable)
Projects
None yet
Development

No branches or pull requests

2 participants