Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve delete properties performance by replace DOMDocument with xml_parse #432

Conversation

alexander-schranz
Copy link
Contributor

@alexander-schranz alexander-schranz commented Nov 23, 2023

See also: #423


Prototype Repository: https://github.com/alexander-schranz/jackalope-xml-delete-properties

Jackalope Doctrine DBAL Analyse deletion of properties

The benchmark requires 2 files:

  • var/props.xml - the xml of the node where we want to remove properties (SELECT props FROM phpcr_nodes WHERE identifier = ?)
  • var/props.csv - the props names which we want remove from the given xml (per line one property name)

Different commands:

php src/remover.php legacy
php src/remover.php single_dom_document
php src/remover.php single_dom_query
php src/remover.php single_dom_query_chunk
php src/remover.php xml_parse

Results

~70000 properties (~12.5MB) remove ~1700 props

Run on a MacBook Pro (16", 2021) Apple M1 Pro 32 GB:

Command Time Memory
legacy 6m 29s 48 MB
single_dom_document 2m 25s 35 MB
single_dom_query 1m 30s 37 MB
single_dom_query_chunk 1m 29s 35 MB
xml_parse ~1s 35 MB

legacy: is the 1.9.0 version:

$this->deleteProperty($op->srcPath);

single_dom_document: should represent the state of 2.0.0-beta2 version after: https://github.com/jackalope/jackalope-doctrine-dbal/pull/423/files

Blackfire

I did not use Blackfire for benchmarking as it did show in past benchmark where xml_parse
can not be good profiled as having a lot of callback method being called via xml_set_element_handler and xml_set_character_data_handler.
So profiling takes more time as processing things as Blackfire need to log every method call.
Instead I depend on classic benchmarking via time() and memory_get_peak_usage(true) measures.

Required changes for improvements

A: Group Properties

The most important thing is that we remove all properties at once instead of calling saveXML after each property removal.

For this we mostly would require first group all deleteProperties by its node:

function groupByNode($deletePropertyPaths): array {
    $grouped = [];
    foreach ($deletePropertyPaths as $path) {
        $nodePath = PathHelper::getParentPath($path);
        $propertyName = PathHelper::getNodeName($path);

        $grouped[$nodePath][] = $propertyName;
    }

    return $grouped;
}

Then we load the single node remove all the properties and save the xml once via saveXML.

B: Grouped Reference delete queries

The queries to remove references should also be grouped and best a single query be send to delete the references
instead of one query per reference.

The queries are currently ignored in the benchmark as it is focused on XML manipulation.

C: Replace DOMDocument with xml_parse

DOMDocument is bad for performance and should be avoided.
The xml_parse as it allows us to streamed reading the xml and skip the properties which we want to remove.
The XmlPropsRemover is an example how this could be done.

D: TODO

  • xml_parse escape attributes name correctly
  • xml_parse escape attributes value correctly
  • xml_parse data content correctly
  • xml_parse add queries to remove references
  • xml_parse make sure the output is the same as for DOMDocument
    • self closing tags
    • escape data

currently there is a little difference in the 2 printed xmls:

-rw-r--r--   1 staff  staff  11940901 22 Nov 23:38 removed_legacy.xml
-rw-r--r--   1 staff  staff  12002548 22 Nov 23:39 removed_xml_parse.xml

Update xml_parse variant now has the same output as the previous DOMDocument version:

-rw-r--r--   1 staff  staff  11940901 22 Nov 23:33 removed_legacy.xml
-rw-r--r--   1 staff  staff  11940901 23 Nov 00:18 removed_xml_parse.xml
-rw-r--r--   1 staff  staff  12009559 22 Nov 23:45 removed_legacy_pretty.xml
-rw-r--r--   1 staff  staff  12009559 23 Nov 00:18 removed_xml_parse_pretty.xml

TODO:

@alexander-schranz alexander-schranz changed the base branch from 2.x to 1.x November 23, 2023 19:09
@alexander-schranz alexander-schranz force-pushed the feature/improve-delete-properties-performance branch from ffbaf84 to c4c30e0 Compare November 23, 2023 19:09
@alexander-schranz alexander-schranz force-pushed the feature/improve-delete-properties-performance branch from c4c30e0 to cb8a853 Compare November 23, 2023 19:10
@dbu
Copy link
Member

dbu commented Nov 24, 2023

that is some really impressive improvement, thanks for the analysis and the implementation.

@alexander-schranz
Copy link
Contributor Author

alexander-schranz commented Nov 24, 2023

Thx for the reviews.

@dbu we will next week do some more profiling on some projects and then I will continue here with some fixes. Pushed this already so we can test it on various projects via:

    "repositories": [
        {
            "type": "vcs",
            "url": "[email protected]:alexander-schranz/jackalope-doctrine-dbal.git"
        }
    ],
    "require": {
    
        "jackalope/jackalope-doctrine-dbal": "dev-feature/improve-delete-properties-performance as 1.10.0",
        
    }

@dbu
Copy link
Member

dbu commented Dec 2, 2023

i am about to tag the next beta for jackalope-doctrine-dbal 2.0

i think there is nothing preventing us from doing 2.1 very quickly afterwards, once this is ready. but that way we at least should soon have a stable release that installs with symfony 7 - and this change has no BC breaks, it is an optimization that can come as minor version. (unless you see something that you would want to BC break to make things easier - if we need that tell me by monday so we can look into it before releasing 2.0.0 stable.)

@alexander-schranz
Copy link
Contributor Author

@dbu I will do again a test run against sulu jackolope 2 tomorrow. To release this as new minor for 1 and/or 2 I also see no problem as it is just changing internal logic.

@alexander-schranz alexander-schranz marked this pull request as ready for review December 9, 2023 11:41
@alexander-schranz
Copy link
Contributor Author

I did do some adjustements so escaping the property name and value behave the same as before and extend the test case so we are 100% sure the XML special chars are escaped the same way as before.

alexander-schranz added a commit to alexander-schranz/sulu that referenced this pull request Dec 11, 2023
@alexander-schranz alexander-schranz force-pushed the feature/improve-delete-properties-performance branch from 5be5837 to 3ce6153 Compare December 11, 2023 08:41
@alexander-schranz alexander-schranz force-pushed the feature/improve-delete-properties-performance branch from 3ce6153 to 919dcdb Compare December 11, 2023 08:41
alexander-schranz added a commit to alexander-schranz/sulu that referenced this pull request Dec 11, 2023
@alexander-schranz
Copy link
Contributor Author

alexander-schranz commented Dec 11, 2023

Applied the suggestions to the pull request, so it is ready now :) Also did again more testing with special chars and the result is 100% the same for old and new implementations:

Bildschirmfoto 2023-12-11 um 09 53 37

Copy link
Member

@dbu dbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot! the tests you added increase my confidence that we don't have regressions.

one last nitpick, and can you please add a changelog entry explaining what the change means for the user?

src/Jackalope/Transport/DoctrineDBAL/Client.php Outdated Show resolved Hide resolved
@dbu dbu merged commit 1b5d994 into jackalope:1.x Jan 6, 2024
18 checks passed
@dbu
Copy link
Member

dbu commented Jan 6, 2024

released in 1.11.0

@dbu dbu mentioned this pull request Jan 6, 2024
@alexander-schranz alexander-schranz deleted the feature/improve-delete-properties-performance branch January 6, 2024 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants