Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defragment incoming TLS handshake messages #9872

Merged

Conversation

rojer
Copy link
Contributor

@rojer rojer commented Dec 25, 2024

Description

Defragment incoming TLS handshake messages.

Fixes #1840

PR checklist

@Neustradamus
Copy link

To follow this PR.

@Harry-Ramsey Harry-Ramsey added enhancement needs-review Every commit must be reviewed by at least two team members, needs-reviewer This PR needs someone to pick it up for review labels Dec 27, 2024
@mpg mpg requested review from mpg and waleed-elmelegy-arm January 2, 2025 09:31
@waleed-elmelegy-arm
Copy link
Contributor

Thanks @rojer for the PR. do you currently have the capacity to add automated tests?

@rojer
Copy link
Contributor Author

rojer commented Jan 9, 2025

Thanks @rojer for the PR. do you currently have the capacity to add automated tests?

not in the near future, unfortunately. holidays are over, and so is my free time.

@minosgalanakis minosgalanakis self-requested a review January 13, 2025 10:39
@mpg mpg removed their request for review January 13, 2025 10:39
@mpg mpg removed the needs-reviewer This PR needs someone to pick it up for review label Jan 13, 2025
Co-authored-by: minosgalanakis <[email protected]>
Signed-off-by: Deomid Ryabkov <[email protected]>
@minosgalanakis minosgalanakis self-requested a review January 16, 2025 15:27
Copy link
Contributor

@minosgalanakis minosgalanakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, some minor comments.

Signed-off-by: Deomid rojer Ryabkov <[email protected]>
@waleed-elmelegy-arm
Copy link
Contributor

This check here

if (ssl->in_msglen < mbedtls_ssl_hs_hdr_len(ssl)) {
        MBEDTLS_SSL_DEBUG_MSG(1, ("handshake message too short: %" MBEDTLS_PRINTF_SIZET,
                                  ssl->in_msglen));
        return MBEDTLS_ERR_SSL_INVALID_RECORD;
    }

causes some failures if part of the message is less than the minimum. I added a fix in #9928 that checks if hslen is equal to zero but this will fail if the fragment length is 1, also we need to add check if the final message less the minimum so not sure if this is the best way.

Also we might want to add a config to enable or disable handshake fragmentation?

@rojer
Copy link
Contributor Author

rojer commented Jan 26, 2025

if (ssl->in_msglen < mbedtls_ssl_hs_hdr_len(ssl)) {

ok, i see what you mean - good catch: we require that fragment be at least 4 bytes, which is somewhat of an arbitrary requirement. your change relaxes it to only the first fragment being at least 4 bytes.
i adopted your check with a slight change: i am testing ssl->in_hslen, which i think is the same for the purpose of checking if it is the first fragment or not. in parctice, of course, for a fragmented handshake we will have in_hslen == in_hsfraglen == 0 on the first fragment, then in_hslen set to some value X from the header, and in_hsfraglen set to some Y < X.
we could test either of them but i find that in_hslen == 0 is easier to understand.

in theory we should be able to reassemble a message chopped into 1 byte fragments, but i think it's unlikely to occur in practice (whereas tail fragments of 1 byte are are more likely). supporting fragmented handshake message header would require significant change to the code, so perhaps leave it for the future?

Also we might want to add a config to enable or disable handshake fragmentation?

@mpg suggested that defragmentation (which is what we're doing here) should always happen, and fragmentation on the way out is controlled by negotiated max fragment length / record size (or rather will be in the future).

rojer added 2 commits January 26, 2025 11:12
Signed-off-by: Deomid rojer Ryabkov <[email protected]>
Except the first

Signed-off-by: Deomid rojer Ryabkov <[email protected]>
@mpg
Copy link
Contributor

mpg commented Jan 27, 2025

in theory we should be able to reassemble a message chopped into 1 byte fragments, but i think it's unlikely to occur in practice (whereas tail fragments of 1 byte are are more likely). supporting fragmented handshake message header would require significant change to the code, so perhaps leave it for the future?

Agreed, we're not trying to achieve maximal coverage of all cases that could theoretically happen, just the ones that are likely to happen in practice.

@mpg suggested that defragmentation (which is what we're doing here) should always happen, and fragmentation on the way out is controlled by negotiated max fragment length / record size (or rather will be in the future).

Indeed, and that's still my opinion. There's a big difference between defragmentation (reassembly) and fragmentation: we control what we send, not what we receive. The peer is free to send fragmented messages at any time for any reason, and this doesn't need to be negotiated, that's part of the core TLS standard and now it's something that popular servers happen to do in practice. So I'm inclined to think we should always be prepared for it: if we're not, we'll fail handshakes for no good reason (the current situation before this PR).

Signed-off-by: Deomid rojer Ryabkov <[email protected]>
@minosgalanakis
Copy link
Contributor

This PR has been tested alongside #9928 in #9948 . We may use the latter as a way of intergrating and merging both PR's in.

Also as a side note commit afa11db needed the change introduced by 31f2d82 but I may address that #9948 .

It looks good so far, thank you for updating the contributions promtly.

Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Waleed is on holiday, I'm taking over as a reviewer. I didn't re-review everything, just the last commits that Waleed hadn't approved yet, which look good to me - but I couldn't help look at the ChangeLog entry though.

Also, as Minos already mentioned, can you cherry-pick the commit that fixes the "unused variable" warning/error in builds without DEBUG_C? CI is unlikely to pass without that.

@@ -0,0 +1,2 @@
Changes
* Defragment incoming TLS handshake messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can eve make classify that as a bugfix: (1) there is nothing in the spec that says support for receiving fragemented messages is optional (quite the opposite: it's the first point in the implementation pitfalls section), and (2) it was causing interop failures in practice.

Also, I think we should expand a bit so that users who are not aware of TLS internals may make more sense of the entry, perhaps:

Bugfix
   * Support re-assembly of fragmented handshake messages in TLS, as mandated
     by the spec. Lack of support was causing handshake failures with some
     servers, especially with TLS 1.3 in practice (though both protocol
     version could be affected in principle, and both are fixed now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1808,6 +1808,8 @@ struct mbedtls_ssl_context {

size_t MBEDTLS_PRIVATE(in_hslen); /*!< current handshake message length,
including the handshake header */
unsigned char *MBEDTLS_PRIVATE(in_hshdr); /*!< original handshake header start */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment that was posted on the tests's pr 9928

@gilles-peskine-arm commented:

Introducing a new buffer pointer creates two risks: buffer overflow, and use-after-free.

I would prefer to have more clarity as to the size of the (sub-)buffer accessible from in_hshdr (it isn't in_hsfraglen), and also the impact on the in_hdr field (which has become less straightforward). I don't have any bad feeling about the current code here, but I'd prefer to have better documentation to help future maintainers (I do have a bad feeling about someone later fixing a bug that's unrelated to fragmentation, and missing some interaction between their bug and fragmentation).

I am a little worried about a use-after-free. handle_buffer_resizing takes care of this pointer, and it's the only risky place I can think of, but I am not very confident. Please reset in_hshdr in mbedtls_ssl_update_in_pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in_hshdr points to the first handshake fragment and may legitimately stay behind the current/last message and thus should not be reset in mbedtls_ssl_update_in_pointers.

the overview of the approach to defragmentation is as follows:

  1. we start with an empty buffer, in_hslen == 0 and in_hshdr == NULL
  2. when the first record of the handshake message is received (which has to be at least 4 bytes) we parse the expected length into in_hslen, set in_hshdr = in_hdr and in_hsfraglen = in_msglen.
  3. as more fragments arrive, we accumulate them in the buffer and keep track of the total length so far in_hsfraglen until it covers the entire expected handshake message. during this time in_hdr advances forward while in_hshdr stays behind.
  4. once we have enough fragments buffered, we merge them, starting from in_hshdr, removing the record headers and obtaining a complete handshake message, which we then process and start over.

i hope this makes it clear. any suggestions wrt comments or documentation are welcome.

as a more general note, i think that an extensive comment or perhaps even a readme explaining the ways in/out buffers and pointers are managed would be helpful. it took me a while to wrap my head around it when i started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, that's very useful!

Would you mind adding comments? Both on the context fields, and also a comment in mbedtls_ssl_update_in_pointers to explain why this in-pointer should not be updated.

i think that an extensive comment or perhaps even a readme explaining the ways in/out buffers and pointers are managed would be helpful. it took me a while to wrap my head around it when i started.

Absolutely, yes! The more we add features like defragmentation, the more complicated it is, and it doesn't help that there's so little documentation. Whenever we identify new wisdom about how things work, we should write them down inside the code, even if it's incomplete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, upon reflection i realized that we don't need to store this pointer: it will always be the first message in the buffer. so, instead in dd14c0a i'm removing it.

i tested manually with facebook as well as #9928 with this change - defrag tests are passing.

i also expanded in_hsfraglen comment a bit to clarify that it goes from 0 up to in_hslen.

rojer and others added 2 commits February 5, 2025 13:09
Signed-off-by: Deomid rojer Ryabkov <[email protected]>
Signed-off-by: Waleed Elmelegy <[email protected]>
Signed-off-by: Deomid rojer Ryabkov <[email protected]>
The first fragment of a fragmented handshake message always starts at the beginning of the buffer so there's no need to store it.

Signed-off-by: Deomid rojer Ryabkov <[email protected]>
@mpg mpg changed the base branch from development to features/tls-defragmentation/development February 17, 2025 12:02
@mpg mpg requested review from mpg and minosgalanakis February 17, 2025 12:02
Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (again, incremental review, did not review things that were already approved by Waleed).

@mpg
Copy link
Contributor

mpg commented Feb 19, 2025

@rojer For your information, our new plan (since Monday) is to merge this in a temporary feature branch as soon as this is approved (and the 3.6 backport), so that (1) this doesn't have to wait for the various testing PRs and (2) the various testing PRs have a more stable basis in the meantime. Once all the testing PRs (and possibly additional PRs fixing any issues found by the tests) are merged in the feature branch, we'll merge the feature branch into the main branches (development and 3.6).

Copy link
Contributor

@minosgalanakis minosgalanakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mpg mpg added approved Design and code approved - may be waiting for CI or backports needs-backports Backports are missing or are pending review and approval. and removed needs-review Every commit must be reviewed by at least two team members, needs-backports Backports are missing or are pending review and approval. labels Feb 19, 2025
@mpg mpg added this pull request to the merge queue Feb 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 21, 2025
@mpg mpg merged commit 28f8e20 into Mbed-TLS:features/tls-defragmentation/development Feb 24, 2025
3 of 4 checks passed
Comment on lines +3311 to +3313
ssl->in_hdr = ssl->in_msg + ssl->in_msglen;
ssl->in_msglen = 0;
mbedtls_ssl_update_in_pointers(ssl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing in #9989 reveals a bug: defragmentation of encrypted records does not work correctly in TLS 1.2 when the symmetric encryption is CBC (EtM or not), CCM or GCM. It works with any TLS 1.3 cipher suite, and with ChachaPoly and null encryption in TLS 1.2. The problematic cases are exactly the ones where the record includes an explicit IV (always 8 bytes).

In the initial handshake, only the Finished message is likely to be affected, and it's only 16 bytes, so this happens when the fragment size is less than 16 bytes. This can happen with larger fragment sizes during renegotiation.

In problematic scenarios, the fragment reassembly loop looks for the second fragment 8 bytes too early in the buffer. There is an 8-byte gap between the fragments, but the reassembly expects them to be consecutive.

It's not 100% clear to me yet, but I think the offset update here isn't correct. As far as I can tell from looking at an example in a debugger, at the end of the first incomplete fragment, ssl->in_msg and ssl->in_hdr end up pointing 8 bytes past the end of the end of the fragment.

unsigned char *in_first_hdr = ssl->in_buf + MBEDTLS_SSL_SEQUENCE_NUMBER_LEN;
unsigned char *p = in_first_hdr, *q = NULL;
size_t merged_rec_len = 0;
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was trying to analyze and fix https://github.com/Mbed-TLS/mbedtls/pull/9872/files#r1967669302, I tried to make sense of what the various pointers and offsets should be at any given point (ssl->in_msg, ssl->in_hdr, ssl->in_hsfraglen, etc., as well as the various record fields while processing records), and I have a hard time figuring out whether the value I'm seeing is the value that should be there at any given point during parsing. I'm not sure if the reassembly loop should be changed, or if it's getting unexpected data.

One of the difficulties in understanding the code is that fragment accumulation is completely disconnected from fragment reassembly, so reassembly has to re-parse data. I would find it easier to understand if the structure of the code was: when we have finished parsing a fragment, if it wasn't the initial fragment, then merge it with the initial fragment. With this structure, there'd be fewer moving parts. Fragment reassembly would have access to the offsets and record data from the latest fragment, and wouldn't need to do any parsing, so it would be easier to figure out offsets. There would also be fewer opportunities for parsing errors or integer/buffer overflows. In addition, the input buffer would fill out less quickly — the current structure adds a 5- or 13-byte overhead per fragment. @rojer Did you try doing fragment reassembly incrementally? Are there any difficulties in doing it that way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, i don't see any reason why it couldn't be done that way. this just happened to be the way i went at it at the time.

@hrushikesh430
Copy link

Hi, will this fix be backported to version 3.6.2

@davidhorstmann-arm
Copy link
Contributor

Hi, will this fix be backported to version 3.6.2

It will be backported to the 3.6 LTS, so it will appear in 3.6.3 which can be easily upgraded-to from 3.6.2 (see #9981). We don't backport to specific point releases, only to LTS versions, but the upgrade between them should be almost entirely painless.

@hrushikesh430
Copy link

Hi, will this fix be backported to version 3.6.2

It will be backported to the 3.6 LTS, so it will appear in 3.6.3 which can be easily upgraded-to from 3.6.2 (see #9981). We don't backport to specific point releases, only to LTS versions, but the upgrade between them should be almost entirely painless.

Thank you, we are only concerned about 3.6 LTS.

@mpg
Copy link
Contributor

mpg commented Mar 5, 2025

so it will appear in 3.6.3

Sorry, but it will appear in 3.6.3 if it is ready in time otherwise it will have to wait for 3.6.4. We are working very hard to include this in 3.6.3, but it won't ship if we don't have a sufficient level of testing to be confident it's not introducing security issues (we already found at least one while extending testing) and we're not there yet. Again, we're hard at work on this, because we'd really like to have a resolution in 3.6.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Design and code approved - may be waiting for CI or backports component-tls enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS Handshake record layer fragmentation not working
9 participants