Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak #114

Open
bo-tato opened this issue Nov 16, 2024 · 6 comments
Open

memory leak #114

bo-tato opened this issue Nov 16, 2024 · 6 comments

Comments

@bo-tato
Copy link

bo-tato commented Nov 16, 2024

With the following short script to parse a html page and do an xpath search, memory usage will continually climb until it's killed for out of memory: https://gist.github.com/bo-tato/fe8194f53be5061a43af5264a8ec3f66

@bo-tato
Copy link
Author

bo-tato commented Nov 17, 2024

seems it's in findnodes, just parsing the document doesn't make memory climb so fast. i'm trying to track it down with valgrind

@dwarring
Copy link
Contributor

dwarring commented Nov 18, 2024

After building/compiling with: raku Build.pm6; make clean debug and adding to gist:

use LibXML::Raw;
my $total-objects = LibXML::Raw::ref-total() || 1;
my $lost-objects = LibXML::Raw::ref-current();
my $lost-percent = 100 * $lost-objects / $total-objects;
say "total objects: $total-objects, lost: $lost-objects ({$lost-percent.fmt: '%.2f'}%)";

Output:

done in: 478.401732663 seconds
total objects: 195000, lost: 90006 (46.16%)

So libxml2 objects are not not getting released, which is supposed to happen when Raku objects get destroyed (e.g. LibXML::ElementLibXML::Node DESTROY method)

@dwarring
Copy link
Contributor

dwarring commented Nov 18, 2024

I added a count of calls to LibXML::Node TWEAK and DESTROY methods.

david@pc:~/git/LibXML-raku$ git diff lib/LibXML/Node.rakumod
diff --git a/lib/LibXML/Node.rakumod b/lib/LibXML/Node.rakumod
index 2abbb83..a0ab892 100644
--- a/lib/LibXML/Node.rakumod
+++ b/lib/LibXML/Node.rakumod
@@ -159,11 +159,15 @@ method native is DEPRECATED<raw> { self.raw }
 method domFailure { $.raw.domFailure.Str }
 method string-value { $.raw.string-value.Str }
 
+our $count = 0;
+
 submethod TWEAK {
+    $count++;
     $!raw.Reference;
 }
 
 submethod DESTROY {
+    $count--;
     $!raw.Unreference;
 }

And added this to the script: say "Undestroyed Raku objects: " ~ $LibXML::Node::count;
Output from a run of ^15000: Undestroyed Raku objects: 25688

So the issue seems to be on the Raku side. DESTROY is being called and presumably the Raku objects aren't being GC.

@dwarring
Copy link
Contributor

I think this is similar to #85. Both are parsing a document, then performing an XPath query.

@dwarring
Copy link
Contributor

dwarring commented Nov 19, 2024

I'm eyeballing 92e3792 again, which somehow triggered the earlier leak.

Edit: one thing in particular to be careful off is accidental closures in native callbacks. These cause memory leaks.

@dwarring
Copy link
Contributor

Even a simple first() leaks:

use HTTP::UserAgent;
use LibXML::Document;

my HTTP::UserAgent $ua .= new;

my $html = $ua.get('https://nokogiri.org/tutorials/installing_nokogiri.html').content;

my $start-time = now;
for ^15000 {
    my LibXML::Document $doc .=  parse: :html, :recover, :suppress-errors, :string($html);
    $doc.first('//article//h2');
}

$*ERR.say: "done in: {now - $start-time} seconds";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants