Include XML information in PDF metadata #7

cmcproject · 2023-05-04T06:57:35Z

Update XMP schema handling
- replace previous schema implemenation
- use latest schema version for factur-x
Import metadata from XML and attach it to the PDF
- import metadata to PDF Properties (_prepare_pdf_metadata_txt)
Detect ZUGFeRD profile automatically based on the XML data
- There is no need to provide the profile to attach_xml method
Update tests
Add docstring

Upgrade pypdf lib and fix generation issues (pretix#6)

codecov · 2023-05-04T07:00:40Z

Codecov Report

Patch coverage: 92.00% and project coverage change: +0.88 🎉

Comparison is base (6375c60) 90.55% compared to head (8f7cea9) 91.43%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master       #7      +/-   ##
==========================================
+ Coverage   90.55%   91.43%   +0.88%     
==========================================
  Files          17       18       +1     
  Lines        1398     1355      -43     
==========================================
- Hits         1266     1239      -27     
+ Misses        132      116      -16

Impacted Files	Coverage Δ
drafthorse/pdf.py	`96.61% <91.83%> (+4.14%)`	⬆️
drafthorse/xmp_schema.py	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

drafthorse/__init__.py

raphaelm

Hey, thank you very much for the work and sorry for the slow turnaround on the review. I only need to deal with this stuff every few months, so it requires me to find some time to dive into the topic again and properly think about it.

I think most of this PR looks great. (It was a bit hard to review due to the mixing of actual functional changes and stylistic changes.) But I have some concerns about backwards compatibility, see comments inline.

raphaelm · 2023-06-15T20:09:32Z

drafthorse/xmp_schema.py

+      <fx:DocumentType>{documenttype}</fx:DocumentType>
+      <fx:DocumentFileName>{xml_filename}</fx:DocumentFileName>


Shouldn't these be in the same order as above, since it's a "Seq"?

Suggested change

<fx:DocumentType>{documenttype}</fx:DocumentType>

<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName>

<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName>

<fx:DocumentType>{documenttype}</fx:DocumentType>

raphaelm · 2023-06-15T20:11:27Z

drafthorse/xmp_schema.py

@@ -0,0 +1,86 @@
+"""
+FACTUR-X XMP with the required PDF/A extension schema description


Now that we have this, we should be able to delete schemas/ZUGFeRD2p2_extension_schema.xmp

raphaelm · 2023-06-15T20:14:46Z

drafthorse/pdf.py

+        "subject": "{} {} dated {} issued by {}".format(
+            doc_type_name, number, date_str, seller
+        ),


This would be the first thing in this library to hardcode English language, not sure it's worth it to go down that part? If we just used the invoice number similarly to the title, we'd also not need to extract the date, saving on complexity.

raphaelm · 2023-06-15T20:16:52Z

drafthorse/pdf.py

+        raise Exception(
+            "Invalid doc type! XML value for TypeCode shall be 380 for an invoice."
+        )


This is a breaking change. We don't know if there are users of the library out there who e.g. use 381 or 384. Is it specified that it would be wrong to use 381 or 384? (Where?)

Please try to use more specific exceptions such as ValueError instead of just Exception

raphaelm · 2023-06-15T20:17:39Z

drafthorse/pdf.py

+    format_map = {
+        "102": "%Y%m%d",
+        "203": "%Y%m%d%H%M",
+    }
+    date_dt = datetime.strptime(date, format_map.get(date_format, format_map["102"]))
+    number_xpath = xml_etree.xpath(


Suggested change

format_map = {

"102": "%Y%m%d",

"203": "%Y%m%d%H%M",

}

date_dt = datetime.strptime(date, format_map.get(date_format, format_map["102"]))

number_xpath = xml_etree.xpath(

date_dt = datetime.strptime(date, format_map.get(date_format, "%Y%m%d"))

number_xpath = xml_etree.xpath(

raphaelm · 2023-06-15T20:18:11Z

drafthorse/pdf.py

+    :param pdf_metadata: PDF metadata
+    :return: metadata XML
+    """
+    xml_str = XMP_SCHEMA.format(


Great job, this really makes it a lot more readable

raphaelm · 2023-06-15T20:19:13Z

drafthorse/pdf.py

+        raise Exception("Invalid XML profile!")
+
+    profile = profile.upper()
+    logger.info(f"Invoide profile dectected from XML: {profile}")


Suggested change

logger.info(f"Invoide profile dectected from XML: {profile}")

logger.info(f"Invoice profile dectected from XML: {profile}")

raphaelm · 2023-06-15T20:21:40Z

drafthorse/pdf.py

+        profile = doc_id.split(":")[-2]
+        profile = profile[:2] + " " + profile[2:]
+    else:
+        raise Exception("Invalid XML profile!")


This is at least missing the "urn:cen.eu:en16931:2017#compliant#urn:xoev-de:kosit:standard:xrechnung_1.2" ID which should be mapped to the "XRECHNUNG" profile.

There might be more profiles out there. I believe we need to keep a way to manually set it instead of only automatically detecting it (but auto-detecting it as a default is nice).

raphaelm · 2023-06-15T20:23:39Z

drafthorse/pdf.py



-def attach_xml(original_pdf, xml_data, level="BASIC"):
+def attach_xml(original_pdf, xml_data):


I think we should keep the level parameter and just change the default to None and then perform autodetection (see below for one reason).

raphaelm · 2024-03-10T15:45:37Z

I think this is also resolved by #15

* Bump to 2.3.0 * PR #7 Include XML information in PDF metadata (cmcproject) * update mustang validator * fixes fx:DocumentFileName / fx:DocumentType order * remove `schemas/ZUGFeRD2p2_extension_schema.xmp` (replaced by `xmp_schema.py` generator) * removes date and seller from pdf metadata subject to simplify the code and remove hard coded English language * removes doc type date and seller from pdf metadata subject (incl. the restriction to documents of type 380) to simplify the code and remove hard coded English language * remove unused (now) unused constant INVOICE_TYPE_CODE and avoid the use of use bare `except` * removes failing "Invalid doc type! XML value for TypeCode shall be 380 for an invoice." test * allows to supply explicit profile level and extends profile auto detection to cover XRECHNUNG * minor code style improvements like lazy % formatting in logging functions (logging-fstring-interpolation) * fixes style (black) * tests of auto detecting a XRechnung v2 and v3 profiles * blacking again * tests for en16931 auto profile recognition and auto profile recognition failure * black again * typo * allow users to set custom pdf metadata and the PDF language identifier used by PDF readers for blind people * black * spelling * Update drafthorse/pdf.py * Run black --------- Co-authored-by: Raphael Michel <[email protected]> Co-authored-by: Raphael Michel <[email protected]>

cmcproject and others added 2 commits May 2, 2023 12:21

Merge pull request #1 from pretix/master

291ec9e

Upgrade pypdf lib and fix generation issues (pretix#6)

Update metadata handling

460b444

Test invalid invoice XML

d4cce36

cmcproject force-pushed the Add_XML_info_to_PDF_metadata branch from 89a232e to d4cce36 Compare May 9, 2023 06:30

Increze code coverage

f6a03e6

cmcproject force-pushed the Add_XML_info_to_PDF_metadata branch from 4248e76 to adab1bb Compare May 9, 2023 07:51

Minor updates

0fe0d7d

cmcproject force-pushed the Add_XML_info_to_PDF_metadata branch from adab1bb to 0fe0d7d Compare May 9, 2023 13:25

raphaelm reviewed Jun 15, 2023

View reviewed changes

drafthorse/__init__.py Outdated Show resolved Hide resolved

Update drafthorse/__init__.py

8f7cea9

raphaelm requested changes Jun 15, 2023

View reviewed changes

MAKOMO added a commit to MAKOMO/python-drafthorse that referenced this pull request Jan 6, 2024

PR pretix#7 Include XML information in PDF metadata (cmcproject)

4fa89a5

raphaelm pushed a commit to MAKOMO/python-drafthorse that referenced this pull request Mar 10, 2024

PR pretix#7 Include XML information in PDF metadata (cmcproject)

a512671

raphaelm closed this Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include XML information in PDF metadata #7

Include XML information in PDF metadata #7

cmcproject commented May 4, 2023

codecov bot commented May 4, 2023 •

edited

Loading

raphaelm left a comment

raphaelm Jun 15, 2023

raphaelm Jun 15, 2023

raphaelm Jun 15, 2023

raphaelm Jun 15, 2023

raphaelm Jun 15, 2023

raphaelm Jun 15, 2023

raphaelm Jun 15, 2023

raphaelm Jun 15, 2023

raphaelm Jun 15, 2023

raphaelm commented Mar 10, 2024

		<fx:DocumentType>{documenttype}</fx:DocumentType>
		<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName>

		@@ -0,0 +1,86 @@
		"""
		FACTUR-X XMP with the required PDF/A extension schema description

	logger.info(f"Invoide profile dectected from XML: {profile}")
	logger.info(f"Invoice profile dectected from XML: {profile}")



		def attach_xml(original_pdf, xml_data, level="BASIC"):
		def attach_xml(original_pdf, xml_data):

Include XML information in PDF metadata #7

Include XML information in PDF metadata #7

Conversation

cmcproject commented May 4, 2023

codecov bot commented May 4, 2023 • edited Loading

Codecov Report

raphaelm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raphaelm commented Mar 10, 2024

codecov bot commented May 4, 2023 •

edited

Loading