-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Referencing of unbound nodes may be incorrect. #94
Comments
How interesting the coincidence is! The last couple of days I'm thinking about possible ways of not reincarnating a node object on every occasion. Though it may not sound totally identical to this case, but it is still related. Consider the following. I'm trying to develop a framework of dealing with ALTO standard. Partial top-level layout for it is like: <alto>
<Styles>
<TextStyle>...</TextStyle>
</Styles>
<Layout>...</Layout>
</alto> Elements within Having the But this solution is partial because the other side of the coin would be Anyway, solving this would be very beneficial to the memory footprint of any LibXML-based code. And very likely to it performance too as less object construction and less memory fragmentation would be involved. What do you think? |
I've backed out adding a The problem is the document struct can be updated via the DOM. nodes can even be moved between documents,. Still considering this. |
What happens to |
No that doesn't change. and is globally unique Its simply the hex string of the raw node's address. |
At least while a reference is being kept the the object |
It is apparent that DOM mutation makes things problematic, to say the least. But for a r/o case there is a chance. My all-time wish was to have a box outcome be somehow bound to the underlying node structure so that any subsequent There is another case where something like Just to be clear, I'm not asking for a solution. Simply sharing my thoughts, so that they might result in something helpful later. Unfortunately, the second case is currently out of question anyway because pulling the |
Working with ALTO turned out to be very slow. Being curious, I decided to intercept The file size is ~48K. Don't know how many actual tags does it contain, but deriving from the count of unique keys – 930-950 (some are likely never pulled in by the code). A sample section of the file looks like this: <TextBlock ID="p1_b8" HPOS="157.889" VPOS="77.1160" HEIGHT="8.3520" WIDTH="216.612">
<TextLine WIDTH="216.612" HEIGHT="8.3520" ID="p1_t10" HPOS="157.889" VPOS="77.1160">
<String ID="p1_w18" CONTENT="(Please" HPOS="157.889" VPOS="77.1160" WIDTH="30.5010"
HEIGHT="8.3520" STYLEREFS="font4"/>
<SP WIDTH="2.5151" VPOS="77.1160" HPOS="188.390"/>
<String ID="p1_w19" CONTENT="reference" HPOS="190.905" VPOS="77.1160" WIDTH="38.0070"
HEIGHT="8.3520" STYLEREFS="font4"/>
<SP WIDTH="2.5150" VPOS="77.1160" HPOS="228.912"/>
<String ID="p1_w20" CONTENT="#1234567" HPOS="231.427" VPOS="77.1160" WIDTH="45.0360"
HEIGHT="8.3520" STYLEREFS="font1"/>
<SP WIDTH="2.5128" VPOS="77.1160" HPOS="276.463"/>
<String ID="p1_w21" CONTENT="when" HPOS="278.976" VPOS="77.1160" WIDTH="21.5100" HEIGHT="8.3520"
STYLEREFS="font4"/>
<SP WIDTH="2.5061" VPOS="77.1160" HPOS="300.486"/>
<String ID="p1_w22" CONTENT="making" HPOS="302.992" VPOS="77.1160" WIDTH="29.0070"
HEIGHT="8.3520" STYLEREFS="font4"/>
<SP WIDTH="2.5062" VPOS="77.1160" HPOS="331.999"/>
<String ID="p1_w23" CONTENT="payment.)" HPOS="334.506" VPOS="77.1160" WIDTH="39.9960"
HEIGHT="8.3520" STYLEREFS="font4"/>
</TextLine>
</TextBlock> Considering that many allocations are followed by a couple of new ones since values from XML attributes are pulled into my classes – no wonder it all takes ages to complete a single run. My next step is to try back boxing by some a cache of some kind. Shouldn't be a problem since all is done is R/O mode with few hand-made elements which are not planted back into the original DOM.
|
That ties in with what I've seen when benchmarking. LibXML is faster than XML for parsing and Xpath queries. But XML is fast for repeated DOM access, because LibXML is recreating the objects everytime. Caching and a read-only mode sounds like a good idea. |
I'm trying to implement caching locally with, well, mixed results so far. But thinking over it for a while I came upon an idea. Not sure if it is feasible to implement it, but it better be spoken out. What is the biggest problem of I just suppose that it couldn't be done without changing the underlying C |
There's already a Yes, this could hold a sequential id that's incremented globally say 64 bit unsigned integer. and could be fetched by the bindings This would leads to a better implementation of |
This shouldn't collide, except for a very, very long running process. Also provide uid method to get the raw value as a 64 bit unsigned integer.
See ea5c163 which reimplements |
I'm currently investigating a caching weirdness where object counter installed via A quick glance into the commit tells me that C code is not thread safe, counter increment isn't atomic. Either a mutex is needed or |
The increment is in code protected by a Just realised another problem with that last commit: the This can happen if Raku references to a node are destroyed, then the xml6_ref struct is destroyed and I might revert the last commit and do any further work on a branch. |
I don't plan to rely on this new change anyway. In the meantime, I hope you can help me to understand better the relations between Can you help me to clear this out? The key problem is that I cannot return a cached object from |
I'm not sure, except I'd also expect For example, I'd assume the call to |
I like the idea. In this case whatever would go wrong during object creation it wouldn't result in memory leaks due to wrong reference count. There is one more reason why implementing the cache via BTW, unfortunately, whatever I currently do I do in a quick and hacky way applicable to my local project only because I need it, like, yesterday. Hopefully, I'd be able to share whatever useful bits I learn along the way. For example, I just have realized that most calls to |
Any config supplied by upstream was previously ignored and class mapping was expected to be done by `LibXML::Node`. This resulted in limited abilities of user code in controlling how boxing is done. See libxml-raku#94. Note that `LibXML::Node` still does its own class mapping because this is the point of final decision and in many cases it is invoked directly.
Any config supplied by upstream was previously ignored and class mapping was expected to be done by `LibXML::Node`. This resulted in limited abilities of user code in controlling how boxing is done. See #94. Note that `LibXML::Node` still does its own class mapping because this is the point of final decision and in many cases it is invoked directly.
With PR #97 the total number of allocations fell down from 22k to 10k for my code. Next problem I just've encountered are iterators. Hopefully I can quickly make them do about the same because, for example, |
Move the call to $.raw.set-flags to a TWEAK submethod as it needs to be called after $.raw.Reference has been called. Any config supplied by upstream was previously ignored and class mapping was expected to be done by `LibXML::Node`. This resulted in limited abilities of user code in controlling how boxing is done. See #94. Note that `LibXML::Node` still does its own class mapping because this is the point of final decision and in many cases it is invoked directly.
I'm suspecting unbound nodes can have the owner document destroyed, e.g.
The problem (I think), is that there's nothing currently keeping the document alive, if it's not explicitly referenced.
The text was updated successfully, but these errors were encountered: