Skip to content

Conversation

@michelcrypt4d4mus
Copy link

Apologies for not opening an issue first but this is a small change w/theoretically no impact.

I appreciated the new Font / FontDescriptor refactor in 6.6.0 but when I went to use it in pdfalyzer i noticed that the /FontFile is not extracted into the new FontDescriptor class so I made a small change to do that.

lmk if this isn't up to snuff. also happy to add unit tests if people can point me in the direction of sample files that might be hairy around this kind of thing.

@michelcrypt4d4mus michelcrypt4d4mus changed the title Extract the /FontFile and store it in the new FileDescriptor object Extract the /FontFile and store it in the new FontDescriptor class Jan 16, 2026
@michelcrypt4d4mus michelcrypt4d4mus changed the title Extract the /FontFile and store it in the new FontDescriptor class ENH: Extract the /FontFile and store it in the new FontDescriptor class Jan 16, 2026
Copy link
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be okay with including these changes, under the condition that there are tests accordingly.

also happy to add unit tests if people can point me in the direction of sample files that might be hairy around this kind of thing.

Given a local development setup, you should be able to do some basic grepping:

grep -r --text '/FontFile' resources/ sample-files/

Having only a quick look at the results, resources/ should already have examples for all three keys.

I appreciated the new Font / FontDescriptor refactor in 6.6.0 but when I went to use it in pdfalyzer i noticed that the /FontFile is not extracted into the new FontDescriptor class so I made a small change to do that.

Just as a note: These classes are not part of the public API at the moment, thus they might change/break at any point in time without a deprecation period.

@michelcrypt4d4mus michelcrypt4d4mus changed the title ENH: Extract the /FontFile and store it in the new FontDescriptor class ENH: Extract the /FontFile and store it in the new FontDescriptor class [WIP] Jan 16, 2026
@michelcrypt4d4mus
Copy link
Author

michelcrypt4d4mus commented Jan 16, 2026

cool. addressed comments and marked this PR as [WIP] until i get a chance to poke around and set up a few useful tests.

@michelcrypt4d4mus michelcrypt4d4mus changed the title ENH: Extract the /FontFile and store it in the new FontDescriptor class [WIP] ENH: Extract the /FontFile and store it in the new FontDescriptor class Jan 16, 2026
@michelcrypt4d4mus
Copy link
Author

think this is good to go; let me know if you want any other changes

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.35%. Comparing base (4740225) to head (9804af2).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3602   +/-   ##
=======================================
  Coverage   97.35%   97.35%           
=======================================
  Files          55       55           
  Lines        9816     9827   +11     
  Branches     1792     1795    +3     
=======================================
+ Hits         9556     9567   +11     
  Misses        153      153           
  Partials      107      107           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bbox_tuple = tuple(map(float, font_kwargs["bbox"]))
assert len(bbox_tuple) == 4, bbox_tuple
font_kwargs["bbox"] = bbox_tuple
# Find the binary stream for this font if there is one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Find the binary stream for this font if there is one
# Find the binary stream for this font if there is one

reader.pages[0].extract_text() # no error
font = Font.from_font_resource(reader.pages[0]["/Resources"]["/Font"]["/F"])
assert font.character_map["\x01"] == "Ü"
assert type(font.font_descriptor.font_file) is EncodedStreamObject
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert type(font.font_descriptor.font_file) is EncodedStreamObject
assert isinstance(font.font_descriptor.font_file, EncodedStreamObject)

assert type(font.font_descriptor.font_file) is EncodedStreamObject
assert len(font.font_descriptor.font_file.get_data()) == 28464

with pytest.raises(PdfReadError) as exception:
Copy link
Collaborator

@stefan6419846 stefan6419846 Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with pytest.raises(PdfReadError) as exception:
with pytest.raises(PdfReadError, match=r"^More than one /FontFile found in .+$"):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants