ENH: Extract the /FontFile and store it in the new FontDescriptor class #3602

michelcrypt4d4mus · 2026-01-16T11:08:18Z

Apologies for not opening an issue first but this is a small change w/theoretically no impact.

I appreciated the new Font / FontDescriptor refactor in 6.6.0 but when I went to use it in pdfalyzer i noticed that the /FontFile is not extracted into the new FontDescriptor class so I made a small change to do that.

lmk if this isn't up to snuff. also happy to add unit tests if people can point me in the direction of sample files that might be hairy around this kind of thing.

stefan6419846

I would be okay with including these changes, under the condition that there are tests accordingly.

also happy to add unit tests if people can point me in the direction of sample files that might be hairy around this kind of thing.

Given a local development setup, you should be able to do some basic grepping:

grep -r --text '/FontFile' resources/ sample-files/

Having only a quick look at the results, resources/ should already have examples for all three keys.

I appreciated the new Font / FontDescriptor refactor in 6.6.0 but when I went to use it in pdfalyzer i noticed that the /FontFile is not extracted into the new FontDescriptor class so I made a small change to do that.

Just as a note: These classes are not part of the public API at the moment, thus they might change/break at any point in time without a deprecation period.

pypdf/_font.py

michelcrypt4d4mus · 2026-01-16T11:49:32Z

cool. addressed comments and marked this PR as [WIP] until i get a chance to poke around and set up a few useful tests.

michelcrypt4d4mus · 2026-01-16T12:58:18Z

think this is good to go; let me know if you want any other changes

codecov · 2026-01-16T13:00:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.35%. Comparing base (4740225) to head (9804af2).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3602   +/-   ##
=======================================
  Coverage   97.35%   97.35%           
=======================================
  Files          55       55           
  Lines        9816     9827   +11     
  Branches     1792     1795    +3     
=======================================
+ Hits         9556     9567   +11     
  Misses        153      153           
  Partials      107      107

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stefan6419846 · 2026-01-19T12:40:15Z

pypdf/_font.py

            bbox_tuple = tuple(map(float, font_kwargs["bbox"]))
            assert len(bbox_tuple) == 4, bbox_tuple
            font_kwargs["bbox"] = bbox_tuple
+        # Find the binary stream for this font if there is one


Suggested change

# Find the binary stream for this font if there is one

# Find the binary stream for this font if there is one

stefan6419846 · 2026-01-19T12:40:51Z

tests/test_cmap.py

    reader.pages[0].extract_text()  # no error
    font = Font.from_font_resource(reader.pages[0]["/Resources"]["/Font"]["/F"])
    assert font.character_map["\x01"] == "Ü"
+    assert type(font.font_descriptor.font_file) is EncodedStreamObject


Suggested change

assert type(font.font_descriptor.font_file) is EncodedStreamObject

assert isinstance(font.font_descriptor.font_file, EncodedStreamObject)

stefan6419846 · 2026-01-19T12:44:05Z

tests/test_font.py

+    assert type(font.font_descriptor.font_file) is EncodedStreamObject
+    assert len(font.font_descriptor.font_file.get_data()) == 28464
+
+    with pytest.raises(PdfReadError) as exception:


Suggested change

with pytest.raises(PdfReadError) as exception:

with pytest.raises(PdfReadError, match=r"^More than one /FontFile found in .+$"):

Extract the /FontFile and store it in the new FileDescriptor object

b32ee5f

michelcrypt4d4mus changed the title ~~Extract the /FontFile and store it in the new FileDescriptor object~~ Extract the /FontFile and store it in the new FontDescriptor class Jan 16, 2026

michelcrypt4d4mus changed the title ~~Extract the /FontFile and store it in the new FontDescriptor class~~ ENH: Extract the /FontFile and store it in the new FontDescriptor class Jan 16, 2026

ashariyar added 3 commits January 16, 2026 06:11

alphabetize imports

88af6c6

use Union for python 3.9

4416a36

cast types

4a41aa5

stefan6419846 reviewed Jan 16, 2026

View reviewed changes

pypdf/_font.py Outdated Show resolved Hide resolved

pypdf/_font.py Outdated Show resolved Hide resolved

address comments, fix test

7783907

michelcrypt4d4mus changed the title ~~ENH: Extract the /FontFile and store it in the new FontDescriptor class~~ ENH: Extract the /FontFile and store it in the new FontDescriptor class [WIP] Jan 16, 2026

ashariyar added 6 commits January 16, 2026 07:17

add tests for /FontFile, /FontFile2, /FontFile3

e9a7d2d

Fix failing test

4f087af

Fix failing test

17a7bc7

fix ruff test

c2c61c7

fix style

a743058

unused import

73a7b43

michelcrypt4d4mus changed the title ~~ENH: Extract the /FontFile and store it in the new FontDescriptor class [WIP]~~ ENH: Extract the /FontFile and store it in the new FontDescriptor class Jan 16, 2026

ashariyar added 4 commits January 16, 2026 08:00

remove unnecessary @pytest.mark.enable_socket (all test files are local

acc25ce

unused import

ddd41be

test coverage for error case

89397d6

ruff

9804af2

stefan6419846 reviewed Jan 19, 2026

View reviewed changes

stefan6419846 mentioned this pull request Jan 21, 2026

Feature idea: make get_fonts also return the font objects #3607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Extract the /FontFile and store it in the new FontDescriptor class #3602

ENH: Extract the /FontFile and store it in the new FontDescriptor class #3602

michelcrypt4d4mus commented Jan 16, 2026

Uh oh!

stefan6419846 left a comment

Uh oh!

Uh oh!

Uh oh!

michelcrypt4d4mus commented Jan 16, 2026 •

edited

Loading

Uh oh!

michelcrypt4d4mus commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

stefan6419846 Jan 19, 2026

Uh oh!

stefan6419846 Jan 19, 2026

Uh oh!

stefan6419846 Jan 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Find the binary stream for this font if there is one

	# Find the binary stream for this font if there is one

	assert type(font.font_descriptor.font_file) is EncodedStreamObject
	assert isinstance(font.font_descriptor.font_file, EncodedStreamObject)

	with pytest.raises(PdfReadError) as exception:
	with pytest.raises(PdfReadError, match=r"^More than one /FontFile found in .+$"):

ENH: Extract the /FontFile and store it in the new FontDescriptor class #3602

Are you sure you want to change the base?

ENH: Extract the /FontFile and store it in the new FontDescriptor class #3602

Conversation

michelcrypt4d4mus commented Jan 16, 2026

Uh oh!

stefan6419846 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

michelcrypt4d4mus commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michelcrypt4d4mus commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stefan6419846 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

stefan6419846 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

stefan6419846 Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michelcrypt4d4mus commented Jan 16, 2026 •

edited

Loading

codecov bot commented Jan 16, 2026 •

edited

Loading

stefan6419846 Jan 19, 2026 •

edited

Loading