Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage while validating #17

Open
fabianem opened this issue Dec 20, 2023 · 5 comments
Open

Memory usage while validating #17

fabianem opened this issue Dec 20, 2023 · 5 comments

Comments

@fabianem
Copy link

I am facing an issue with huge memory usage while validating large XML files (about 90 MB).
I might be doing something wrong but there seems to be a memory leakage because the used memory grows after every file and never clears up until I restart the server.

My setup is an HTTP server with a handler that validates an uploaded XML file against a provided XSD.

My main func looks something like this:

func main() {
  // init xsdvalidate
  // xsdvalidate.InitWithGc(2 * time.Minute) // didn't help
  err = xsdvalidate.Init()
  defer xsdvalidate.Cleanup()
  if err != nil {
  log.WithError(err).Fatalf("could not init xsdvalidate")
  }
  ...
}

and inside my handler I am doing something like this:

func (it *service) handler(file io.ReadCloser, filesize int64) error {
  defer file.Close()
  
  xsdHandler, err := xsdvalidate.NewXsdHandlerMem(it.meteringObjectsXSD, xsdvalidate.ParsErrDefault) // it.meteringObjectsXSD is a small XSD file (3KB) read into a byte slice
  defer xsdHandler.Free()
  if err != nil {
    return err
  }

  fileContent := make([]byte, filesize)
  _, err = io.ReadFull(file, fileContent)
  if err != nil {
    return err
  }

  err = xsdHandler.ValidateMem(fileContent, xsdvalidate.ParsErrDefault)
  if err != nil {
    return err
  }
  ...
}

(I also tried creating the XSD handler only once inside main and injecting it into my service but that didn't help either)

When uploading a 90MB XML file and validating it with the XSD the memory usage looks like this:
image

After uploading it again the memory usage grows almost twice the size:
image

Unfortunately, the memory never frees up.

Now when using the same service again but commenting out the validation part: err = xsdHandler.ValidateMem(fileContent, xsdvalidate.ParsErrDefault) the memory usage looks like this

After the first time uploading the 90MB XML file:
image

After the second time uploading the file:
image

The file was still read into memory but after the GC kicks in it will be all freed up.

Any idea why the memory usage is so high and why it's only growing and never being freed up?

@terminalstatic
Copy link
Owner

Unfortunately I'm a little rusty when it comes to system programming, been drifting away to full stack lately, so ymmv.
As far as my experience goes, the output of ps is usually somewhat misleading.
Here an excerpt from stack overflow in regard to this which pretty much reflects what one is usually seeing:

With ps or similar tools you will only get the amount of memory pages allocated by that process. This number is correct, but:
does not reflect the actual amount of memory used by the application, only the amount of memory reserved for it
can be misleading if pages are shared, for example by several threads or by using dynamically linked libraries

If you really want to know what amount of memory your application actually uses, you need to run it within a profiler. For example, Valgrind can give you insights about the amount of memory used, and, more importantly, about possible memory leaks in your program. The heap profiler tool of Valgrind is called 'massif':

So to see what's actually going on it's either using valgrind (which I only us to profile the c part for with go's gc the output is not accurate), but to profile the go application itself I'd recommend to use go's pprof.

@fabianem
Copy link
Author

fabianem commented Dec 21, 2023

So I tried analyzing it with valgrind --tool=massif and also heaptrack but it doesnt show anything significant in memory usage.
I have been using go's pprof but it also seemed to be fine on the go part.
But there is definitely something going on because the service (with a memory limit of 4GB) is getting OOMKilled after 3 file uploads.

Locally I am now investigating with smem to get more detailed metrics and here the process shows again huge memory usage - even for the USS statistic which should be more accurate and represent the actual cost of the process:
image

Everything changes when I am commenting out the validation part err = xsdHandler.ValidateMem(fileContent, xsdvalidate.ParsErrDefault).

Do you have maybe any idea what could be the cause of this or if I should use your lib somehow differently?

EDIT: What version of libxml2 did you use?

@terminalstatic
Copy link
Owner

What version of libxml2 did you use?

I really can't remember ... it's about 6 years ago I needed this and ever since then this has been chuckling along quite unattended. And I'm not using the go wrapper but the elixir one I also wrote, for go's concurrency model needed way more attention to keep the whole thing performing in comparison with erlang's preemptive scheduler (I needed quite an extensive amount of concurrency). And also I'm not validating xmls of that size.

I'm currently testing this a little for I'm curious. I personally don't think it's a leak but has sth. to do with memory allocation/deallocation. When I come up with sth. I'll keep you posted.

@fabianem
Copy link
Author

I tested it further with different versions of libxml2 and unfortunately it didn't help - but maybe I missed the right one.
I also tried it with smaller files and what I observed is that the memory usage of the process still grows with every new file but seems to stop going higher at a certain point.
Unfortunately it doesn't get freed up.

Have you maybe come up with something?

@terminalstatic
Copy link
Owner

Not much ... for quite busy despite the holidays. Only thing I came up with is that while starting to prepare a pure c version to check if this would make a difference valgrind reported still reachable memory blocks with libxm2 version 2.9.4. Upgrading to 2.9.12 fixed that. No memory leaks in the wrapper code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants