Reduce microservices memory footprint #12200

mapellidario · 2024-12-10T11:49:14Z

Impact of the new feature

MicroServices

Is your feature request related to a problem? Please describe.

We realized that the microservices memory footprint depends on their backlog, for example for ms-rulecleaner at every polling cycle runs the function _execute() only once [1] on every workflow with a certain status [2]

Describe the solution you'd like

Taking ms-rulecleaner as an example, we could change getRequestRecords into a generator that yields only a few workflows every time it is called. We would need to add a for loop in execute() around the call to _execute(). Not a huge effort, achievable without consistent refactoring.

Describe alternatives you've considered

The alternative would be to process once workflow at a time, possibly moving our model to a pub/sub, but this would require some major refactoring

Additional context

Follow-up to #12042 .

[1]

WMCore/src/python/WMCore/MicroService/MSRuleCleaner/MSRuleCleaner.py

Line 222 in beefc74

    
           totalNumRequests, cleanNumRequests, normalArchivedNumRequests, forceArchivedNumRequests = self._execute(requestRecords)

[2]

WMCore/src/python/WMCore/MicroService/MSRuleCleaner/MSRuleCleaner.py

Line 775 in beefc74

result = self.reqmgr2.getRequestByStatus([reqStatus], detail=True)

The text was updated successfully, but these errors were encountered:

vkuznet · 2024-12-10T13:32:35Z

@mapellidario , yesterday I posted on MM chat to Alan and Andrea my observations which aligned with the ticket. Here is my posting (for completeness on the issue):

Here is a proof of memory spike in MSRuleCleanerWflow call which appears on line https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSRuleCleaner/MSRuleCleaner.py#L263

I took test/python/WMCore_t/MicroService_t/MSRuleCleaner_t/MSRuleCleanerWflow_t.py code and added memory profile to one of the unit test as following:

import tracemalloc
    def setUp(self):
        ...
        tracemalloc.start()
    def tearDown(self):
        # Stop tracing and print memory usage details
        current, peak = tracemalloc.get_traced_memory()
        print(f"Current memory usage: {current / 1024:.2f} KB")
        print(f"Peak memory usage: {peak / 1024:.2f} KB")
....
    def testIncludeParents(self):
       ....
        tracemalloc.stop()
        for idx in range(10000):
            req = self.includeParentsReq
            for key, val in req.items():
                if isinstance(val, (str, bytes)):
                    req[key] += "%s" % idx
            MSRuleCleanerWflow(req)

Basically, I run over 10K requests which I modified slightly and call MSRuleCleanerWflow for each of them in a similar manner as MSRuleCleaner code is doing.

Here is the outcome:

without my loop I observer on average 10KB memory footprint

python test/python/WMCore_t/MicroService_t/MSRuleCleaner_t/MSRuleCleanerWflow_t.py
Current memory usage: 8.56 KB
Peak memory usage: 11.52 KB
.Current memory usage: 7.54 KB
Peak memory usage: 10.52 KB
.Current memory usage: 7.87 KB
Peak memory usage: 10.86 KB
.Current memory usage: 5.82 KB
Peak memory usage: 7.70 KB
.

and, when I enable my for loop I see the following:

Current memory usage: 1232.53 KB
Peak memory usage: 1276.86 KB
.Current memory usage: 7.54 KB
Peak memory usage: 10.30 KB
.Current memory usage: 7.87 KB
Peak memory usage: 10.64 KB
.Current memory usage: 5.82 KB
Peak memory usage: 7.49 KB
.

As you can see the first reported set of numbers which correspond to the test I modified spiked from 11KB to 1232KB.

Therefore if we take MSRuleCleaner for loop at line https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSRuleCleaner/MSRuleCleaner.py#L262 and pass 10K requests you will see a spike of 1000x times in memory due to the memory allocation in MSRuleCleanerWflow call (which by itself makes couple of deepcopy calls over nested python dictionary)

Here is modified version I used MSRuleCleanerWflow_t.py

To fix the problem few steps should be performed:

the _execute should process single workflow or request, instead of taking list of requests and loading corresponding number of workflow objects.
The for loop for reqRecords should be taken out of this method to upper codebase and process only one workflow at a time which will keep memory be equal to one workflow
wfCounters should be taken outside of this code as well and converted to basic integers, instead of keeping them in a nested dict
the execute code should be refactored into something like this:

def execute(self, reqStatus):
      ...
      for status in reqStatus:
          # in this loop we'll only allocate single wflow object, process it and collect metrics
          # therefore, the memory allocation will be flat regardless of number of records.
          for rec in self.getRequestRecords(status):
               metrics = self._execute(rec)  # metrics is a tuple of integers
               total_num += metrics[0]  # first metric counte
               ...
               self.updateReportDict(summary, "total_num_requests", total_num)
      ...
      
def _execute(self, record):
      ...
      wflow = MSRuleCleanerWflow(req)
      ...
      # process pipelines and obtain necessary metrics
      metrics = (totalNum, cleanNum, normalArchivedNum, forceArchivedNumR)
      return metrics

anpicci · 2024-12-29T14:00:17Z

Thanks @mapellidario @vkuznet .
This issue has some overlap with #12061.
As a result, I wonder if we should adopt an abstract implementation, potentially adopted by every MS, or if we need to adopt a specific implementation for each service.
FYI @amaltaro

vkuznet · 2025-01-06T18:29:25Z

@anpicci , I doubt that there is generic abstract implementation since it depends on data-structure of HTTP requests between services. But what probably should be done is the following:

since CouchDB does not provide NDJSON data-format (required for data streaming and therefore to reduce memory footprint) we can write custom JSON parser which will NOT load entire JSON but rather process it line by line, e.g. read [, then read {dictionart}, etc. and convert it into a generator. Such custom parser can be used in places where we DO LOAD entire JSON from CouchDB.
but to take benefit of such custom JSON parser which will keep memory low, i.e. to a size of single row dictionary, the code which uses standard JSON, e.g. where it loads documents from the CouchDB call, must be refactored to use new parser and process results one by one. For instance (here is pseudo code):

# make HTTP request to CouchDB
data = http_request_to_couch

# process received data via custom JSON
gen = custonJSON(data)

# pass around a generator to downstream call
downstream_API(gen)

# modify downstream APIs to consume results from a generator
def downstream_API(gen):
    for rec in gen:
       # process single record

Is it feasible? I think so, but it certainly will require lots of code re-factoring, therefore a proper management decision should be made if we'll spend time on such activity.

vkuznet · 2025-01-06T18:34:31Z

Another approach would be to put a proxy server in front of CouchDB which will provide JSON and NDJSON streams such that clients to start migration from JSON request to NDJSON ones. Then, we'll need to modify each service to talk to proxy server instead of CouchDB and start using NDJSON. The proxy server is easier to write and maintain as it can be written separately, using different language and be fully transparent to MS services. Such proxy will provide an intermediate layer between a database (CouchDB) and the clients (MS services). But downside of this approach would be extra layer in networking stack, and coverage of different APIs. But such proxy server is similar to APS in its functionality as it provide a proxy between client and a database layer.

mapellidario added New Feature Medium Priority ReqMgr2MS labels Dec 10, 2024

mapellidario changed the title ~~reduce microservice memory footprint~~ reduce microservices memory footprint Dec 10, 2024

amaltaro added this to WMCore quarterly developments Jan 7, 2025

amaltaro moved this to ToDo in WMCore quarterly developments Jan 7, 2025

anpicci changed the title ~~reduce microservices memory footprint~~ Reduce microservices memory footprint Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce microservices memory footprint #12200

Reduce microservices memory footprint #12200

mapellidario commented Dec 10, 2024

vkuznet commented Dec 10, 2024 •

edited

Loading

anpicci commented Dec 29, 2024

vkuznet commented Jan 6, 2025

vkuznet commented Jan 6, 2025

Reduce microservices memory footprint #12200

Reduce microservices memory footprint #12200

Comments

mapellidario commented Dec 10, 2024

vkuznet commented Dec 10, 2024 • edited Loading

anpicci commented Dec 29, 2024

vkuznet commented Jan 6, 2025

vkuznet commented Jan 6, 2025

vkuznet commented Dec 10, 2024 •

edited

Loading