Proposal: Incremental saving #112

packdat · 2024-05-04T10:24:30Z

I was recently tasked to evaluate the possibility to "stamp" existing documents.
A "stamp" is literally an image of an actual stamp that should be added to specific pages of a document.
Problem is, the document may be signed, so the stamp has to be added in a non-destructive manner to keep the signature intact.

I started to hack around and was able to come up with something that seems to work.

The idea was to just track changes to arrays (PdfArray) and dictionaries (PdfDictionary).
All other objects are basically immutable so this approach should work in theory.
Also, a new PdfDocumentOpenMode was added, namely Append.
When a document is opened in this mode, it starts to track changes to arrays and dictionaries.
When saving the document, only changed/added objects are saved; the changes are appended to the existing document.

Basic code (taken from the included test-case):

            // necessary to open with ReadWrite access !
            using var fs = File.Open(targetFile, FileMode.Open, FileAccess.ReadWrite);
            var doc = PdfReader.Open(fs, PdfDocumentOpenMode.Append);
            
            // modify the document, e.g. add content
            var page = doc.Pages[0];
            using var gfx = XGraphics.FromPdfPage(page);
            gfx.DrawString("I was added", new XFont("Arial", 16), new XSolidBrush(XColors.Red), 40, 40);

            // append changes to the document
            doc.Save(fs, true);

There may be more that is needed to work consistently (i.e. i haven't tested with encrypted documents yet as i was told the documents i have to work with will not be encrypted).
This change also does not handle the case, where object were deleted from a document.
These objects would need to be tracked separately as they would need special entries in the new XREF-table.

One potential issue i spotted was the fact that library modifies the document by just reading certain properties; thus accidentally marking those objects as "modified" although you haven't changed anything.
One example are the *Box - properties of PdfPage (e.g. TrimBox, CropBox, ...)
If you read these properties and the document does not already contain values for them, a new value is added to the underlying dictionary.

I haven't looked too deeply but i expect there are more cases like that.
I have changed the *Box-properties to just return PdfRectangle.Empty when there is no value instead of adding a new value.

There is also the case with type-transformations (e.g. exchanging a PdfDictionary with a more specific type like PdfPage).
These transformations happen "under the hood" and would normally also cause objects to be marked as modified.
I tried to prevent that by temporarily ignoring changes while doing the type-transformations by using the new method

     PdfCrossReferenceTable.IgnoreModify(Action action)

This is quite "hack-ish", maybe you have better ideas on how to tackle this ?

…ementalSave

…gnatures

packdat · 2024-07-31T17:22:52Z

Deleted objects are now handled when saving incrementally.

Note:
Still untested with encrypted documents due to a lack of time.

TM-Atharva · 2024-10-02T14:40:02Z

Hey @packdat - I have used your code and i have successfully achieved sign on multiple places and page.
Do you know how to set permissions on finally signed document?

Havunen · 2024-10-22T08:16:19Z

PDF Sharp 6.2 is adding support for digital signatures, incremental saving would be great addition to that to support multiple full covered signatures.

EU validator: https://ec.europa.eu/digital-building-blocks/DSS/webapp-demo/validation
extra info: https://eideasy.com/pdf-pades-external-digital-signatures-using-etsi-cades-detached/
iopdf has support for incremental documents: https://github.com/J-F-Liu/lopdf/blob/7f24a1c3ebc42470a37b4315b843331e4f81cdcd/src/incremental_document.rs#L4 (Rust version)

Havunen · 2024-10-22T12:39:13Z

@TM-Atharva

Hey @packdat - I have used your code and i have successfully achieved sign on multiple places and page.
Do you know how to set permissions on finally signed document?

Did you resolve that issue, if so, how?

TM-Atharva · 2024-10-22T12:49:47Z

@Havunen - Permissions on signed document is still not done. We are looking for help too.

packdat added 3 commits May 4, 2024 11:11

First draft for incremental saving

0e29203

Merge branch 'master' of https://github.com/empira/PDFsharp into Incr…

63006a2

…ementalSave

POC: Document signing based on empira#48 with support for multiple si…

acd78bf

…gnatures

packdat mentioned this pull request Jun 15, 2024

Support of Digital Signatures (PKCS#7) + TSA timestamp #48

Open

julienrffr mentioned this pull request Jun 17, 2024

Support of incremental updates? #111

Open

packdat mentioned this pull request Jul 23, 2024

Deleted page not "really" deleted #141

Closed

Handle deleted objects when saving incrementally

a11a5ea

packdat marked this pull request as ready for review July 31, 2024 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Incremental saving #112

Proposal: Incremental saving #112

packdat commented May 4, 2024

packdat commented Jul 31, 2024

TM-Atharva commented Oct 2, 2024 •

edited

Loading

Havunen commented Oct 22, 2024 •

edited

Loading

Havunen commented Oct 22, 2024

TM-Atharva commented Oct 22, 2024

Proposal: Incremental saving #112

Are you sure you want to change the base?

Proposal: Incremental saving #112

Conversation

packdat commented May 4, 2024

packdat commented Jul 31, 2024

TM-Atharva commented Oct 2, 2024 • edited Loading

Havunen commented Oct 22, 2024 • edited Loading

Havunen commented Oct 22, 2024

TM-Atharva commented Oct 22, 2024

TM-Atharva commented Oct 2, 2024 •

edited

Loading

Havunen commented Oct 22, 2024 •

edited

Loading