Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Incremental saving #112

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

packdat
Copy link

@packdat packdat commented May 4, 2024

I was recently tasked to evaluate the possibility to "stamp" existing documents.
A "stamp" is literally an image of an actual stamp that should be added to specific pages of a document.
Problem is, the document may be signed, so the stamp has to be added in a non-destructive manner to keep the signature intact.

I started to hack around and was able to come up with something that seems to work.

The idea was to just track changes to arrays (PdfArray) and dictionaries (PdfDictionary).
All other objects are basically immutable so this approach should work in theory.
Also, a new PdfDocumentOpenMode was added, namely Append.
When a document is opened in this mode, it starts to track changes to arrays and dictionaries.
When saving the document, only changed/added objects are saved; the changes are appended to the existing document.

Basic code (taken from the included test-case):

            // necessary to open with ReadWrite access !
            using var fs = File.Open(targetFile, FileMode.Open, FileAccess.ReadWrite);
            var doc = PdfReader.Open(fs, PdfDocumentOpenMode.Append);
            
            // modify the document, e.g. add content
            var page = doc.Pages[0];
            using var gfx = XGraphics.FromPdfPage(page);
            gfx.DrawString("I was added", new XFont("Arial", 16), new XSolidBrush(XColors.Red), 40, 40);

            // append changes to the document
            doc.Save(fs, true);

There may be more that is needed to work consistently (i.e. i haven't tested with encrypted documents yet as i was told the documents i have to work with will not be encrypted).
This change also does not handle the case, where object were deleted from a document.
These objects would need to be tracked separately as they would need special entries in the new XREF-table.

One potential issue i spotted was the fact that library modifies the document by just reading certain properties; thus accidentally marking those objects as "modified" although you haven't changed anything.
One example are the *Box - properties of PdfPage (e.g. TrimBox, CropBox, ...)
If you read these properties and the document does not already contain values for them, a new value is added to the underlying dictionary.

I haven't looked too deeply but i expect there are more cases like that.
I have changed the *Box-properties to just return PdfRectangle.Empty when there is no value instead of adding a new value.

There is also the case with type-transformations (e.g. exchanging a PdfDictionary with a more specific type like PdfPage).
These transformations happen "under the hood" and would normally also cause objects to be marked as modified.
I tried to prevent that by temporarily ignoring changes while doing the type-transformations by using the new method

     PdfCrossReferenceTable.IgnoreModify(Action action)

This is quite "hack-ish", maybe you have better ideas on how to tackle this ?

@packdat
Copy link
Author

packdat commented Jul 31, 2024

Deleted objects are now handled when saving incrementally.

Note:
Still untested with encrypted documents due to a lack of time.

@packdat packdat marked this pull request as ready for review July 31, 2024 17:23
@TM-Atharva
Copy link

TM-Atharva commented Oct 2, 2024

Hey @packdat - I have used your code and i have successfully achieved sign on multiple places and page.
Do you know how to set permissions on finally signed document?

@Havunen
Copy link

Havunen commented Oct 22, 2024

PDF Sharp 6.2 is adding support for digital signatures, incremental saving would be great addition to that to support multiple full covered signatures.

EU validator: https://ec.europa.eu/digital-building-blocks/DSS/webapp-demo/validation
extra info: https://eideasy.com/pdf-pades-external-digital-signatures-using-etsi-cades-detached/
iopdf has support for incremental documents: https://github.com/J-F-Liu/lopdf/blob/7f24a1c3ebc42470a37b4315b843331e4f81cdcd/src/incremental_document.rs#L4 (Rust version)

@Havunen
Copy link

Havunen commented Oct 22, 2024

@TM-Atharva

Hey @packdat - I have used your code and i have successfully achieved sign on multiple places and page.
Do you know how to set permissions on finally signed document?

Did you resolve that issue, if so, how?

@TM-Atharva
Copy link

@Havunen - Permissions on signed document is still not done. We are looking for help too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants