-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a new mode like :hash_no_attrs but with included attributes #347
Comments
Have you considered using the SAX parser? Post the actual XML too for further discussion. |
Thank you for such a prompt reply 🙂
I have been thinking to test it, but was somehow afraid that it could be slower than Below is the XML: <nodes xmlns:ns4="http://Model/Status/Protocol/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns4:ServiceProtocol">
<services>
<serviceId>
<id>100</id>
<statusId>400</statusId>
</serviceId>
<updateId>500</updateId>
</services>
<services>
<serviceId>
<id>200</id>
<statusId>400</statusId>
</serviceId>
<updateId>500</updateId>
</services>
<services>
<serviceId>
<id>300</id>
<statusId>400</statusId>
</serviceId>
<updateId>500</updateId>
</services>
<id>82383838383838</id>
<nodes>
<id>8888888</id>
</nodes>
<quantities>
<size>122</size>
<id>
<code>900</code>
<node>5</node>
</id>
</quantities>
<quantities>
<size>103</size>
<id>
<code>900</code>
<node>10</node>
</id>
</quantities>
<quantities>
<size>92</size>
<id>
<code>900</code>
<node>20</node>
</id>
</quantities>
<time>2023-10-20T05:05:00.000+01:00</time>
<type>
<id>9000</id>
<mode>
<id>2828288</id>
<protocol>7000</protocol>
</mode>
</type>
<informations>2</informations>
<informations>17</informations>
<informations>64</informations>
<informations>1157</informations>
<informations>1604</informations>
<informations>100008</informations>
</nodes>
<nodes xmlns:ns4="http://Model/Status/Protocol/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns4:ServiceShop">
<services>
<serviceId>
<id>400</id>
<statusId>500</statusId>
</serviceId>
<updateId>600</updateId>
</services>
<services>
<serviceId>
<id>500</id>
<statusId>500</statusId>
</serviceId>
<updateId>700</updateId>
</services>
<services>
<serviceId>
<id>600</id>
<statusId>500</statusId>
</serviceId>
<updateId>700</updateId>
</services>
<id>92829292992</id>
</nodes>
|
The nice things about the SAX parser is that you can ignore stuff you don't need. I don't know if that applies in you case though. Thanks for the XML. It looks like the only attributes are in the nodes element. |
Oh, I already learnt that ignoring any elements is not an option, because certain parts of an original XML have to be re-used and for that should not be altered in any way 🙁 This is actually the main issue I am dealing with right now - how to efficiently parse a very large XML with all its elements and attributes into a hash format for easy mapping with Ruby.
Yes, and when I send back this XML (and many other ones) without attributes to an API, I get a validation error. |
So if I can summarize you are looking for the hash format but using a map instead of a list and then merging the elements. That would lose the information about the order . If that is not important it might be possible. Can I ask you to try the SAX parser first and then I'll see how alternate formats might be supported. |
Sounds correct, and if I understand correctly the
I guess when dealing with a hash format, the order is not that important.
Sure, I will give it a try and will get back with my findings 👍🏻 |
Hello again, so I have just released a tiny wrapper for the SAX parser called OXML, which I have already tested on multiple applications. It successfully solves the issue of missing attributes, it is at least |
Super! The wrapper looks good. Nice that the performance is that much better as well. |
I also needed that feature on my side and I tool a different approach (not sure if it's the best but I'll expose it anyway). I use libxslt and their command line tool xslt to transform the XML with attributes and add them as element with this stylesheet : <?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" />
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:for-each select="@*">
<xsl:element name="{name()}">
<xsl:value-of select="." />
</xsl:element>
</xsl:for-each>
<xsl:apply-templates
select="*|text()" />
</xsl:element>
</xsl:template>
</xsl:stylesheet> I then feed the result to Ox.load like this to get the intended result. Ox.load(`xsltproc #{xsl_path} #{xml_path}`, mode: :hash) xsltproc is quite fast, even though it would probably be faster to generate the correct result directly with an additional mode in the C codebase. If I need it to be faster later, I'll try and implement such a mode. |
Hello Peter, thank you so much for so efficient
Ox
andOj
gems!I am trying to replace
Savon
(which usesNokogiri
for XML parsing) withOx
in multiple heavily loaded micro services for performance reasons. Below are samples with tiny fractions of XMLs, parsed with both:hash_no_attrs
and:hash
modes:The
:hash_no_attrs
mode gives the most desirable output to work with (it is a hash), but unfortunately can't be used because attributes are missing. The:hash
mode includes missing attributes, but its output structure is significantly different from the the:hash
one - it is an array instead of a hash.Doing mapping of an API response to some internal models is much simpler when accessing a hash by known keys rather than iterating over an array and looking for matching elements. Especially when dealing with thousands of lines, when every millisecond is important. An array could be transformed to a hash after initial parsing, but that would mitigate performance gains from using
Ox
.I very well realize that you already mentioned in other issues that the two modes,
:hash_no_attrs
and:hash
, are enough for most cases, but I would really appreciate if you could consider adding another mode, identical to:hash_no_attrs
in terms of its output structure, but with attributes included as hash elements (instead of an extra hash with attributes like in the:hash
mode)? (please see an example below):Thank you 🙇🏻
The text was updated successfully, but these errors were encountered: