-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ubsan fixes for various tcpdump printers #1012
base: master
Are you sure you want to change the base?
Conversation
Thank you for preparing these changes. Do you still have the UBSan messages? Does |
I will recreate the UBSan messages, by removing the patches and rebuilding and using the test pcaps. I didn't keep records when I did this work, only pcaps. I'll put the corresponding messages in with each commit that fixes it? The out files were generated with 4.99.1, so if there are changes in master that could explain the mismatch. I will regenerate them. |
P.S. I will also squash the "Address integer overflows" commit into the "ISO:" commit once my discussion with fxlb is complete. |
Placing the messages into respective commits would be perfect, thank you! |
6d0b77d
to
0445332
Compare
The integer overflow commit is still separate, because I want to validate it against a separate implementation of this checksum implementation that I have access to. I updated the .out files to pass with master in this commit, but I see illumos has already failed so I will investigate the check failure(s) and update. |
0445332
to
4b675b2
Compare
Rebased on the current master and added a fixup, the expectation is that this pull request passes all CI checks. |
4b675b2
to
8fe29f5
Compare
I didn't think it was worth pushing my fix for the illumos build, but I have done so now to avoid getting confused by
What's pending here is extracting the proprietary iso checksum algorithm into a form that I can test with. Once I have validated that our checksum algorithm and tcpdump's have the same result, I will squash the checksum fixes into a single commit, push that and at that point I think it will be ready to merge. |
Alright, thank you for the clarifications. |
8fe29f5
to
6c9adf5
Compare
I apologize for the delay. In order to double-check that the new checksum algorithm gets the right value, I have processed all of the pcap files that @fxlb provided using this main.c, and both the (proprietary) gated and the updated tcpdump algorithm got the same value for each packet. #include <pcap/pcap.h>
void callback(u_char *user, const struct pcap_pkthdr *h,
const u_char *bytes) {
int gated_cksum, tcpdump_cksum;
gated_cksum = gated_iso_cksum( bytes, h->caplen, bytes );
tcpdump_cksum = create_osi_cksum( bytes, 0, h->caplen );
if ( gated_cksum != tcpdump_cksum ) {
fprintf( stderr, "*** NOT MATCHING *** " );
}
fprintf( stderr, "gated %04x tcpdump %04x\n", gated_cksum, tcpdump_cksum );
}
int main(int argc, const char **argv) {
char errbuf[PCAP_ERRBUF_SIZE];
pcap_t *p;
p = pcap_open_offline( argv[ 1 ], errbuf );
pcap_loop( p, -1, callback, NULL );
} |
6c9adf5
to
0d233a9
Compare
0d233a9
to
e1c85da
Compare
e1c85da
to
a8dd632
Compare
When converting an integer from ASN.1, use an unsigned value for the partial result and assign it to the integer part of the union at the end, to avoid shifting a negative number left. print-snmp.c:545:19: runtime error: left shift of negative value -1 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:545:19 OID elements are unsigned; a large-enough oid value could result in the undefined behavior of shifting a signed integer left through the sign bit, so simply store them as unsigned. print-snmp.c:751:11: runtime error: left shift of 268435455 by 7 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:751:11 [Part of the PR the-tcpdump-group#1012]
I have commited the fix for SNMP. |
It's likely that some of these are specific to 32-bit builds (e.g., i386_el7). |
You are rigth. I need clang, sanitize undefined + address and 32-bit mode. |
I'm sorry, I did this work over a year ago and did not expect to have to provide detailed notes. I committed all of my work to the Arista repository, and this is the only lwres pcap that I have in the Arista repository, so my best guess is that it was. |
I understand. Can you run the program again with the lwres pcap to check if it displays one or two error messages? |
I tried again (with a tcpdump-4.99.2 base) and only get the one:
|
Check for truncation before doing pointer arithmetic to point to the end of the packet. print-lwres.c:294:10: runtime error: addition of unsigned offset to 0xf3b032be overflowed to 0x9652d560 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-lwres.c:294:10 [Part of the PR the-tcpdump-group#1012]
Same for tcpdump-4.99.3, tcpdump-4.99.4, master. Thank you! |
print-ospf6.c
Outdated
if (ls_length < sizeof(struct lsa_hdr)) { | ||
ND_PRINT("\n\t Bogus length %u < header (%zu)", ls_length, | ||
sizeof(struct lsa_hdr)); | ||
goto trunc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's perhaps more an "invalid" length case than a "truncated" case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's perhaps more an "invalid" length case than a "truncated" case ?
My thinking is "this TLV doesn't have enough data to contain the header of the TLV, so, the TLV itself is truncated".
All 3 callers of ospf6_print_lshdr()
use if (ospf6_print_lshdr(...)) return(1);
. From a skim, I don't see a reason for any of them to distinguish between "truncated" and "invalid in this specific way".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notion of truncation may have evolved over the years, but currently a truncation occurs when we try to get len bytes in the packet buffer and we can't.
(checks via GET_*, ND_TTEST_*, ND_TCHECK_*, etc. macros). For ND_TCHECK_* macros, if ND_LONGJMP_FROM_TCHECK is defined for a printer, a longjmp
occurs, no more goto trunc
(Printers "Modernized" for the 5.0.0 release, remain 34 to do.)
The macros for testing invalid length cases are the ND_ICHECK*, with a goto invalid
.
If we add in line 396 the following command, we get the value 470, with the test file ospf-signed-integer-ubsan.pcap
:
ND_PRINT("[[%u]]", ND_BYTES_AVAILABLE_AFTER(lshp->ls_length));
Thus there is no truncation here.
Tshark says also the packet is malformed
(like our invalid
.), not [Packet size limited during capture: XXX truncated]
:
[...]
Link State ID: 145.245.255.254
Advertising Router: 204.122.189.255
Sequence Number: 0x02000000
Checksum: 0x0000
Length: 0
[Expert Info (Warning/Protocol): Unknown LSA Type 224]
[Unknown LSA Type 224]
[Severity level: Warning]
[Group: Protocol]
[Malformed Packet: OSPF]
[Expert Info (Error/Malformed): Malformed Packet (Exception occurred)]
[Malformed Packet (Exception occurred)]
[Severity level: Error]
[Group: Malformed]
Perhaps this change should be updated after a modernization of print-ospf6.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. I'm not prepared to make a major rework as a prerequisite for my contribution, so if it's better to have undefined behavior than to print an incorrect truncation message, great, feel free to discard my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the proposed change makes the code safer, perhaps we could address any cosmetic issues after merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first idea was to hold the change waiting the modernization of print-ospf6.c, but as Denis says we can do the fix as a temporary change. I will commit this part. Thanks for your work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first idea was to hold the change waiting the modernization of print-ospf6.c, but as Denis says we can do the fix as a temporary change. I will commit this part. Thanks for your work.
Thank you. Undefined behavior does not necessarily lead to incorrect behavior, but, if you enable undefined behavior detection in your fuzzing environment, having undefined behavior can cause so much noise that it can prevent you from discovering real problems via fuzzing.
Handle ls_length shorter than sizeof(lsa_hdr) in the same way as OSPF. Use a u_int32 to hold a loop variable initialized with GET_BE_U_4. print-ospf6.c:815:46: runtime error: signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-ospf6.c:817:46 [Part of the PR the-tcpdump-group#1012]
It's not enough for the *packet* to be able to contain the RD; the route data also has to be long enough; otherwise, we will try to shift a negative length left in order to pass it to bgp_vpn_ip_print() print-bgp.c:1848:9: runtime error: left shift of negative value -8 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-bgp.c:1848:9 [Part of the PR the-tcpdump-group#1012]
When converting an integer from ASN.1, use an unsigned value for the partial result and assign it to the integer part of the union at the end, to avoid shifting a negative number left. print-snmp.c:545:19: runtime error: left shift of negative value -1 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:545:19 OID elements are unsigned; a large-enough oid value could result in the undefined behavior of shifting a signed integer left through the sign bit, so simply store them as unsigned. print-snmp.c:751:11: runtime error: left shift of 268435455 by 7 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-snmp.c:751:11 [Part of the PR the-tcpdump-group#1012] (cherry picked from commit df5cba8)
Check for truncation before doing pointer arithmetic to point to the end of the packet. print-lwres.c:294:10: runtime error: addition of unsigned offset to 0xf3b032be overflowed to 0x9652d560 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-lwres.c:294:10 [Part of the PR the-tcpdump-group#1012] (cherry picked from commit b016347)
Handle ls_length shorter than sizeof(lsa_hdr) in the same way as OSPF. Use a u_int32 to hold a loop variable initialized with GET_BE_U_4. print-ospf6.c:815:46: runtime error: signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-ospf6.c:817:46 [Part of the PR the-tcpdump-group#1012] (cherry picked from commit 7f3c090)
This 1) makes sure that GET_ macros are used to extract data from the structure (which they already were) and 2) avoids undefined behavior if the structure isn't aligned on the appropriate memory boundary. Fixes the-tcpdump-group#1054. (The SNMP issues are fixed by changes for the-tcpdump-group#1012.) (cherry picked from commit a0b7859)
It's not enough for the *packet* to be able to contain the RD; the route data also has to be long enough; otherwise, we will try to shift a negative length left in order to pass it to bgp_vpn_ip_print() print-bgp.c:1848:9: runtime error: left shift of negative value -8 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior print-bgp.c:1848:9 [Part of the PR the-tcpdump-group#1012] (cherry picked from commit 5a5646b)
a8dd632
to
aa83645
Compare
…ecksum calculation The fletcher checksum calculation would sometimes left-shift a negative number, which is an undefined operation. Rework the code to avoid this. checksum.c:186:20: runtime error: left shift of negative value -36 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior checksum.c:186:20 in Unlike some checksum routines that use the defined semantics of 2's-complement unsigned overflow to their advantage, this one gets the wrong value if it is allowed to overflow, due to the use of mod-255. Convert c1 to uint64_t to avoid overflow checksum.c:163:16: runtime error: unsigned integer overflow: NNN + NNN cannot be represented in type 'unsigned int' Use integers during subtraction to avoid implicit conversion to unsigned when calculating both x and y checksum.c:172:18: runtime error: unsigned integer overflow: NNN - NNN cannot be represented in type 'unsigned int' checksum.c:172:9: runtime error: implicit conversion from type 'unsigned int' of value NNN (32-bit, unsigned) to type 'int' changed the value to -NNN (32-bit, signed) checksum.c:173:12: runtime error: unsigned integer overflow: NNN - NNN cannot be represented in type 'unsigned int' checksum.c:173:9: runtime error: implicit conversion from type 'unsigned int' of value NNN (32-bit, unsigned) to type 'int' changed the value to -NNN (32-bit, signed)
aa83645
to
0c8ce2b
Compare
Rebase on main branch. 4/5 commits have already been merged. |
There is a warning with |
I ran some fuzzing with ubsan enabled. These are my fixes to the problems that this fuzzing found.
(This was kind of an accident; I didn't realize that ubsan was enabled and I got a bunch of surprise fuzz failures, but I figure these fixes are probably helpful anyway)