-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include XML information in PDF metadata #7
Conversation
cmcproject
commented
May 4, 2023
- Update XMP schema handling
- replace previous schema implemenation
- use latest schema version for factur-x
- Import metadata from XML and attach it to the PDF
- import metadata to PDF Properties (_prepare_pdf_metadata_txt)
- Detect ZUGFeRD profile automatically based on the XML data
- There is no need to provide the profile to attach_xml method
- Update tests
- Add docstring
Upgrade pypdf lib and fix generation issues (pretix#6)
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #7 +/- ##
==========================================
+ Coverage 90.55% 91.43% +0.88%
==========================================
Files 17 18 +1
Lines 1398 1355 -43
==========================================
- Hits 1266 1239 -27
+ Misses 132 116 -16
☔ View full report in Codecov by Sentry. |
89a232e
to
d4cce36
Compare
4248e76
to
adab1bb
Compare
adab1bb
to
0fe0d7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thank you very much for the work and sorry for the slow turnaround on the review. I only need to deal with this stuff every few months, so it requires me to find some time to dive into the topic again and properly think about it.
I think most of this PR looks great. (It was a bit hard to review due to the mixing of actual functional changes and stylistic changes.) But I have some concerns about backwards compatibility, see comments inline.
<fx:DocumentType>{documenttype}</fx:DocumentType> | ||
<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be in the same order as above, since it's a "Seq"?
<fx:DocumentType>{documenttype}</fx:DocumentType> | |
<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName> | |
<fx:DocumentFileName>{xml_filename}</fx:DocumentFileName> | |
<fx:DocumentType>{documenttype}</fx:DocumentType> |
@@ -0,0 +1,86 @@ | |||
""" | |||
FACTUR-X XMP with the required PDF/A extension schema description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have this, we should be able to delete schemas/ZUGFeRD2p2_extension_schema.xmp
"subject": "{} {} dated {} issued by {}".format( | ||
doc_type_name, number, date_str, seller | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the first thing in this library to hardcode English language, not sure it's worth it to go down that part? If we just used the invoice number similarly to the title, we'd also not need to extract the date, saving on complexity.
raise Exception( | ||
"Invalid doc type! XML value for TypeCode shall be 380 for an invoice." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is a breaking change. We don't know if there are users of the library out there who e.g. use 381 or 384. Is it specified that it would be wrong to use 381 or 384? (Where?)
- Please try to use more specific exceptions such as ValueError instead of just Exception
format_map = { | ||
"102": "%Y%m%d", | ||
"203": "%Y%m%d%H%M", | ||
} | ||
date_dt = datetime.strptime(date, format_map.get(date_format, format_map["102"])) | ||
number_xpath = xml_etree.xpath( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format_map = { | |
"102": "%Y%m%d", | |
"203": "%Y%m%d%H%M", | |
} | |
date_dt = datetime.strptime(date, format_map.get(date_format, format_map["102"])) | |
number_xpath = xml_etree.xpath( | |
date_dt = datetime.strptime(date, format_map.get(date_format, "%Y%m%d")) | |
number_xpath = xml_etree.xpath( |
:param pdf_metadata: PDF metadata | ||
:return: metadata XML | ||
""" | ||
xml_str = XMP_SCHEMA.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, this really makes it a lot more readable
raise Exception("Invalid XML profile!") | ||
|
||
profile = profile.upper() | ||
logger.info(f"Invoide profile dectected from XML: {profile}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info(f"Invoide profile dectected from XML: {profile}") | |
logger.info(f"Invoice profile dectected from XML: {profile}") |
profile = doc_id.split(":")[-2] | ||
profile = profile[:2] + " " + profile[2:] | ||
else: | ||
raise Exception("Invalid XML profile!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is at least missing the "urn:cen.eu:en16931:2017#compliant#urn:xoev-de:kosit:standard:xrechnung_1.2" ID which should be mapped to the "XRECHNUNG" profile.
There might be more profiles out there. I believe we need to keep a way to manually set it instead of only automatically detecting it (but auto-detecting it as a default is nice).
|
||
|
||
def attach_xml(original_pdf, xml_data, level="BASIC"): | ||
def attach_xml(original_pdf, xml_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the level
parameter and just change the default to None
and then perform autodetection (see below for one reason).
I think this is also resolved by #15 |
* Bump to 2.3.0 * PR #7 Include XML information in PDF metadata (cmcproject) * update mustang validator * fixes fx:DocumentFileName / fx:DocumentType order * remove `schemas/ZUGFeRD2p2_extension_schema.xmp` (replaced by `xmp_schema.py` generator) * removes date and seller from pdf metadata subject to simplify the code and remove hard coded English language * removes doc type date and seller from pdf metadata subject (incl. the restriction to documents of type 380) to simplify the code and remove hard coded English language * remove unused (now) unused constant INVOICE_TYPE_CODE and avoid the use of use bare `except` * removes failing "Invalid doc type! XML value for TypeCode shall be 380 for an invoice." test * allows to supply explicit profile level and extends profile auto detection to cover XRECHNUNG * minor code style improvements like lazy % formatting in logging functions (logging-fstring-interpolation) * fixes style (black) * tests of auto detecting a XRechnung v2 and v3 profiles * blacking again * tests for en16931 auto profile recognition and auto profile recognition failure * black again * typo * allow users to set custom pdf metadata and the PDF language identifier used by PDF readers for blind people * black * spelling * Update drafthorse/pdf.py * Run black --------- Co-authored-by: Raphael Michel <[email protected]> Co-authored-by: Raphael Michel <[email protected]>