Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include XML information in PDF metadata #7

Closed

Conversation

cmcproject
Copy link
Contributor

  • Update XMP schema handling
    • replace previous schema implemenation
    • use latest schema version for factur-x
  • Import metadata from XML and attach it to the PDF
    • import metadata to PDF Properties (_prepare_pdf_metadata_txt)
  • Detect ZUGFeRD profile automatically based on the XML data
    • There is no need to provide the profile to attach_xml method
  • Update tests
  • Add docstring

cmcproject and others added 2 commits May 2, 2023 12:21
@codecov
Copy link

codecov bot commented May 4, 2023

Codecov Report

Patch coverage: 92.00% and project coverage change: +0.88 🎉

Comparison is base (6375c60) 90.55% compared to head (8f7cea9) 91.43%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master       #7      +/-   ##
==========================================
+ Coverage   90.55%   91.43%   +0.88%     
==========================================
  Files          17       18       +1     
  Lines        1398     1355      -43     
==========================================
- Hits         1266     1239      -27     
+ Misses        132      116      -16     
Impacted Files Coverage Δ
drafthorse/pdf.py 96.61% <91.83%> (+4.14%) ⬆️
drafthorse/xmp_schema.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cmcproject cmcproject force-pushed the Add_XML_info_to_PDF_metadata branch from 89a232e to d4cce36 Compare May 9, 2023 06:30
@cmcproject cmcproject force-pushed the Add_XML_info_to_PDF_metadata branch from 4248e76 to adab1bb Compare May 9, 2023 07:51
@cmcproject cmcproject force-pushed the Add_XML_info_to_PDF_metadata branch from adab1bb to 0fe0d7d Compare May 9, 2023 13:25
drafthorse/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@raphaelm raphaelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thank you very much for the work and sorry for the slow turnaround on the review. I only need to deal with this stuff every few months, so it requires me to find some time to dive into the topic again and properly think about it.

I think most of this PR looks great. (It was a bit hard to review due to the mixing of actual functional changes and stylistic changes.) But I have some concerns about backwards compatibility, see comments inline.

Comment on lines +78 to +79
<fx:DocumentType>{documenttype}</fx:DocumentType>
<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be in the same order as above, since it's a "Seq"?

Suggested change
<fx:DocumentType>{documenttype}</fx:DocumentType>
<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName>
<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName>
<fx:DocumentType>{documenttype}</fx:DocumentType>

@@ -0,0 +1,86 @@
"""
FACTUR-X XMP with the required PDF/A extension schema description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have this, we should be able to delete schemas/ZUGFeRD2p2_extension_schema.xmp

Comment on lines +311 to +313
"subject": "{} {} dated {} issued by {}".format(
doc_type_name, number, date_str, seller
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the first thing in this library to hardcode English language, not sure it's worth it to go down that part? If we just used the invoice number similarly to the title, we'd also not need to extract the date, saving on complexity.

Comment on lines +301 to +303
raise Exception(
"Invalid doc type! XML value for TypeCode shall be 380 for an invoice."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This is a breaking change. We don't know if there are users of the library out there who e.g. use 381 or 384. Is it specified that it would be wrong to use 381 or 384? (Where?)
  • Please try to use more specific exceptions such as ValueError instead of just Exception

Comment on lines +281 to +286
format_map = {
"102": "%Y%m%d",
"203": "%Y%m%d%H%M",
}
date_dt = datetime.strptime(date, format_map.get(date_format, format_map["102"]))
number_xpath = xml_etree.xpath(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
format_map = {
"102": "%Y%m%d",
"203": "%Y%m%d%H%M",
}
date_dt = datetime.strptime(date, format_map.get(date_format, format_map["102"]))
number_xpath = xml_etree.xpath(
date_dt = datetime.strptime(date, format_map.get(date_format, "%Y%m%d"))
number_xpath = xml_etree.xpath(

:param pdf_metadata: PDF metadata
:return: metadata XML
"""
xml_str = XMP_SCHEMA.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, this really makes it a lot more readable

raise Exception("Invalid XML profile!")

profile = profile.upper()
logger.info(f"Invoide profile dectected from XML: {profile}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info(f"Invoide profile dectected from XML: {profile}")
logger.info(f"Invoice profile dectected from XML: {profile}")

profile = doc_id.split(":")[-2]
profile = profile[:2] + " " + profile[2:]
else:
raise Exception("Invalid XML profile!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is at least missing the "urn:cen.eu:en16931:2017#compliant#urn:xoev-de:kosit:standard:xrechnung_1.2" ID which should be mapped to the "XRECHNUNG" profile.

There might be more profiles out there. I believe we need to keep a way to manually set it instead of only automatically detecting it (but auto-detecting it as a default is nice).



def attach_xml(original_pdf, xml_data, level="BASIC"):
def attach_xml(original_pdf, xml_data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the level parameter and just change the default to None and then perform autodetection (see below for one reason).

MAKOMO added a commit to MAKOMO/python-drafthorse that referenced this pull request Jan 6, 2024
raphaelm pushed a commit to MAKOMO/python-drafthorse that referenced this pull request Mar 10, 2024
@raphaelm
Copy link
Member

I think this is also resolved by #15

@raphaelm raphaelm closed this Mar 10, 2024
raphaelm added a commit that referenced this pull request Mar 10, 2024
* Bump to 2.3.0

* PR #7 Include XML information in PDF metadata (cmcproject)

* update mustang validator

* fixes fx:DocumentFileName / fx:DocumentType order

* remove `schemas/ZUGFeRD2p2_extension_schema.xmp` (replaced by `xmp_schema.py` generator)

* removes date and seller from pdf metadata subject to simplify the code and remove hard coded English language

* removes doc type date and seller from pdf metadata subject (incl. the restriction to documents of type 380) to simplify the code and remove hard coded English language

* remove unused (now) unused constant INVOICE_TYPE_CODE and avoid the use of use bare `except`

* removes failing "Invalid doc type! XML value for TypeCode shall be 380 for an invoice." test

* allows to supply explicit profile level and extends profile auto detection to cover XRECHNUNG

* minor code style improvements like lazy % formatting in logging functions (logging-fstring-interpolation)

* fixes style (black)

* tests of auto detecting a XRechnung v2 and v3 profiles

* blacking again

* tests for en16931 auto profile recognition and auto profile recognition failure

* black again

* typo

* allow users to set custom pdf metadata and the PDF language identifier used by PDF readers for blind people

* black

* spelling

* Update drafthorse/pdf.py

* Run black

---------

Co-authored-by: Raphael Michel <[email protected]>
Co-authored-by: Raphael Michel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants