You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background
The processing model of fpdf2 of pretty much "write any user input to the output stream immediately" makes it difficult and often nearly impossible to dynamically adapt to the characteristics (eg. size) of the input data in many situations.
We currently have one formal substitution mechanism, which allows to use "{nb}" to insert the total number of pages before it is known. This approach is inherently problematic, first because it may conflict with a possible intention to render the same character sequence on page, and also because it conflicts with text shaping.
There's another possible use case for late value substitution: For #678 and #1154, a solution might be to wrap the page content in a transformation (move or rotation), where the actual parameters of that transformation only become known once the page is complete.
Other use cases for similar substitutions may come up with time.
Solution
The robustness could most easily be improved by replacing the explicit string "{nb}" in the output stream with a sequence containing noncharacters. These are special Unicode code points for private and strictly internal use, which means they should never be shared or transferred between different software packages. This makes them safe for use as conflict-free substitution markers.
Note that by the Unicode standard we should not accept such markers from client software, as the noncharacters are strictly for internal use. So for more generic user interaction we need to define a hierarchy of token classes that allow to specify the type and size of the values to substitute.
When used for rendering as text, those tokens get converted into a special type of Fragment() (they could even be derived from Fragment() themselfes). Their important properties are a unique key (automatically generated) to distinguish between intended substitution targets and a width (likely in pica), which will be used by _render_styled_text_line() to determine where to continue with the following text.
Those substitution Fragment()s will be ignored by text shaping. This will make some subistitution results slightly less pretty, and tokens can't be substituted by text using a complex script, but that should be an acceptable limitation.
Most likely the substitution Fragment()s should also be considered "unbreakable". Since we don't know their actual content yet when doing the line wrapping, there's really no other way to handle them at that point.
They get written into the output stream in a form eg. like this:
marker_pattern=f"\uFDD0{key:d}\uFDD1"
FPDF.write() and the text regions could be combined with special methods like .insert_total_pages(width="3em").
For backwards compability with the (then deprecated) use of "{nb}", the text parsing routines could replace that string with an appropriate subtype of Fragment(), to be written as a marker as shown.
In all text input methods, maybe we can allow the user to use their own "{format}" keys in the text, with our methods accepting a dict of key/token pairs, which they will then convert internally into the appropriate marker strings.
my_text="page {current_pageno} of {my_total_pages}"pageno_token=IntSubToken(width="3em")
pdf.cell(text=my_text, substitute=dict(current_pageno=pageno_token, my_total_pages=TotalPagesToken))
# add some other stuffpageno_token.set_value(42)
# TotalPagesToken may get automatically updatedpdf.output("substitution_demo.pdf")
Many other FPDF methods may accept substitution tokens in place of explicit values. Eg. a transformation may accept an instance of a float-type token in place of an actual float. Before writing the file, the user must then update their copy of the token with the correct value, which will cause its to be replaced in the output. Forgetting to set the value of a used token is an error.
y_move_token=FloatSubToken()
withpdf.move(x=0, y=y_move_token): # analog to skew(), rotation(), etc.# create contentremaining_y=pdf.eph-pdf.yy_move_token.set_value(remaining_y)
pdf.pages[pdf.page].set_dimensions(pdf.w_pt, (pdf.y+pdf.t_margin)*pdf.k)
pdf.output("substitution_demo.pdf")
Sorry I couldn't come up with a simpler solution, but there are many different constraints in the different phases of processing, all of which need to be addressed.
Any better ideas? 💡
Any takers?
The text was updated successfully, but these errors were encountered:
I had already started working on refactoring the alias code to fix #1090. I just submitted the PR. I believe it is one step closer on your vision for the substitution mechanism.
Background
The processing model of fpdf2 of pretty much "write any user input to the output stream immediately" makes it difficult and often nearly impossible to dynamically adapt to the characteristics (eg. size) of the input data in many situations.
We currently have one formal substitution mechanism, which allows to use "{nb}" to insert the total number of pages before it is known. This approach is inherently problematic, first because it may conflict with a possible intention to render the same character sequence on page, and also because it conflicts with text shaping.
There's another possible use case for late value substitution: For #678 and #1154, a solution might be to wrap the page content in a transformation (move or rotation), where the actual parameters of that transformation only become known once the page is complete.
Other use cases for similar substitutions may come up with time.
Solution
The robustness could most easily be improved by replacing the explicit string "{nb}" in the output stream with a sequence containing noncharacters. These are special Unicode code points for private and strictly internal use, which means they should never be shared or transferred between different software packages. This makes them safe for use as conflict-free substitution markers.
Note that by the Unicode standard we should not accept such markers from client software, as the noncharacters are strictly for internal use. So for more generic user interaction we need to define a hierarchy of token classes that allow to specify the type and size of the values to substitute.
When used for rendering as text, those tokens get converted into a special type of
Fragment()
(they could even be derived fromFragment()
themselfes). Their important properties are a unique key (automatically generated) to distinguish between intended substitution targets and a width (likely in pica), which will be used by_render_styled_text_line()
to determine where to continue with the following text.Those substitution
Fragment()
s will be ignored by text shaping. This will make some subistitution results slightly less pretty, and tokens can't be substituted by text using a complex script, but that should be an acceptable limitation.Most likely the substitution
Fragment()
s should also be considered "unbreakable". Since we don't know their actual content yet when doing the line wrapping, there's really no other way to handle them at that point.They get written into the output stream in a form eg. like this:
FPDF.write()
and the text regions could be combined with special methods like.insert_total_pages(width="3em").
For backwards compability with the (then deprecated) use of "{nb}", the text parsing routines could replace that string with an appropriate subtype of
Fragment()
, to be written as a marker as shown.In all text input methods, maybe we can allow the user to use their own "{format}" keys in the text, with our methods accepting a dict of key/token pairs, which they will then convert internally into the appropriate marker strings.
Many other FPDF methods may accept substitution tokens in place of explicit values. Eg. a transformation may accept an instance of a float-type token in place of an actual float. Before writing the file, the user must then update their copy of the token with the correct value, which will cause its to be replaced in the output. Forgetting to set the value of a used token is an error.
Sorry I couldn't come up with a simpler solution, but there are many different constraints in the different phases of processing, all of which need to be addressed.
Any better ideas? 💡
Any takers?
The text was updated successfully, but these errors were encountered: