Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct.pack() - the "p" case needs a revisit #124220

Open
rehierl opened this issue Sep 18, 2024 · 0 comments
Open

struct.pack() - the "p" case needs a revisit #124220

rehierl opened this issue Sep 18, 2024 · 0 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@rehierl
Copy link

rehierl commented Sep 18, 2024

Bug report

Bug description:

There are several issues with the "p" format case of the struct.pack() function. The documentation is unclear about the meaning of the $count value. Furthermore, the function may even end up writing more than 255 characters, which is in conflict with the documentation, and even with the purpose of the "p" case (short pascal strings) - i.e. an actual ERROR/BUG.

  • platform.platform() - 'Linux-5.15.0-119-generic-x86_64-with-glibc2.35'
  • sys.version - '3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]'
  • sys.byteorder - 'little'

The documentation on section (8): the "p" format case is quite unclear in regards to the $count value. The contents of that section could be rephrased as (to reflect my current understanding):

  1. the 1st byte stored is the length of the string, or 255, whichever is smaller. the bytes of the string follow.
  2. if the string is too long (longer than $count-1), only the leading ($count-1) bytes are stored.
  3. if the string is shorter than ($count-1), it is padded with null bytes so that exactly $count bytes in all are used.
  4. note that for unpack(), the "p" format character consumes $count bytes, but the string returned can never contain more than 255 bytes.

NOTE that the documentation does not clearly state that, with pack(), $count needs to also include the length byte. Only (4) provides some hint. In short: If you intend to write $num character bytes, then $count needs to have a value of ($num+1).

#- the output buffer is empty ...
buf = struct.pack("<0p", b'a')
print(buf) #- b''

Even though this is not a strict error, the response to a format string "0p" should not be an empty buffer. Python should either raise an error, or write a length value of 0. That is because writing no output at all will most likely break the programmer's intended output.

NOTE that (1) states that the $length value (not the $count) value is defined to be within the range (0,254). Apart from the $length value, the pack() function will therefore only write up to 254 character bytes. The returned buffer is therefore restricted to a maximum of 255 bytes. That aspect seems to be related to (4).

  • There should be a clear warning that the "p" case will not write more than 254 character bytes.
  • The mention of the numeric value 255 (\xff) is misleading. If the pack() function only writes strings of 0 to 254 characters, then the length value should only ever hold a numeric value within the range (0,254).
  • Either that, or it needs to be explained what the meaning of that value is.
#- an input string in excess of 254 character bytes
input = b'12345678901234567890123456789012345678901234567890' \
  b'12345678901234567890123456789012345678901234567890' \
  b'12345678901234567890123456789012345678901234567890' \
  b'12345678901234567890123456789012345678901234567890' \
  b'12345678901234567890123456789012345678901234567890' \
  b'1234567890'
num = len(input) #- 260

#- print(buf) => b'\xff
#  12345678901234567890123456789012345678901234567890
#  12345678901234567890123456789012345678901234567890
#  12345678901234567890123456789012345678901234567890
#  12345678901234567890123456789012345678901234567890
#  12345678901234567890123456789012345678901234567890
#  1234567890\x00'
buf = struct.pack("<262p", input)
print(buf)

#- ERROR - this assert() will trigger (!)
num = len(buf) #- 262 (!)
assert (num <= 255)

If pack() is instructed to write 261 characters, then 261 character bytes plus one length byte with the numeric value 255 (\xff) will be written. Obviously, a reader process has no means to determine just how many character bytes follow the length value. Because of that, this output can only be considered broken.

Since section (9) states that the $count value in the "s" case has a different meaning, one could assume that the $count value for the "p" case could be "redefined" such that it refers to the number of character bytes one intends to write. However, that is a matter for discussion.

Since freepascal.org does not mention that a length value of 255 needs to be understood to introduce an "extended" encoding (e.g. introduces a subsequent short-encoded length value), I can only assume that the restriction to 254 character bytes is due to the underlying implementation. The question is, if that is a hard limit that cannot be changed?

Either way, the pack() function in the "p" case should only write its output such that the $length value accurately reflects the amount of character bytes that follow.

CPython versions tested on:

3.10

Operating systems tested on:

Linux

@rehierl rehierl added the type-bug An unexpected behavior, bug, or error label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant