CSRF assumes the token to be valid latin-1 #3763

hynek · 2024-07-09T09:22:05Z

Bug Report

Describe the bug
Today Sentry reported a crash that I don't think I can do anything about (except adding custom checks) where someone (presumable an attacker) submitted a CSRF token that cannot be encoded at latin-1.

To Reproduce
I'm currently traveling so can't come up with anything simple, but given that the check looks like this:

pyramid/src/pyramid/csrf.py

Lines 43 to 48 in ef0f686

    
           def check_csrf_token(self, request, supplied_token): 
        
               """Returns ``True`` if the ``supplied_token`` is valid.""" 
        
               expected_token = self.get_csrf_token(request) 
        
               return not strings_differ( 
        
                   bytes_(expected_token), bytes_(supplied_token) 
        
               )

and bytes_ looks like this:

pyramid/src/pyramid/util.py

Lines 38 to 43 in ef0f686

    
           def bytes_(s, encoding='latin-1', errors='strict'): 
        
               """If ``s`` is an instance of ``str``, return 
        
               ``s.encode(encoding, errors)``, otherwise return ``s``""" 
        
               if isinstance(s, str): 
        
                   return s.encode(encoding, errors) 
        
               return s

It makes sense that if someone manages to sneak in a token that's not latin-1-encodable, it will crash with an UnicodeEncodeError.

I guess wrapping strings_differ into a try except UnicodeError this would fix it?

For completeness, the token in question were:

"1��%2527%2522\\\'\\""
"10fc8c867a0c4552831a44a16f193a77��%2527%2522\\\'\\"".

Unfortunately it's a bit difficult to trace what exactly happens, because Sentry removes everything with token in the name.

Expected behavior
No crash.

Additional context

I'm pretty sure I'm not doing anything wrong; my app doesn't appear in the traceback except for tweens that don't touch the headers at all.

It's Pyramid 2.0.2 running in Unicorn 22.0.0 and

config.set_default_csrf_options(require_csrf=True)
config.set_csrf_storage_policy(CookieCSRFStoragePolicy())

The text was updated successfully, but these errors were encountered:

mmerickel · 2024-07-10T18:22:19Z

I feel like this is a valid concern and it's natural to say that it should just be counted as not-equal versus raising an exception for an unusable value in the supplied csrf value. We should fix this.

ztane · 2024-10-09T11:06:25Z

Another approach would be to use errors='backslashreplace' for the supplied_token, then all non-representable characters would turn into Unicode escapes:

>>> '😀'.encode('latin-1', errors='backslashreplace')
b'\\U0001f600'

mmerickel added the bugs label Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSRF assumes the token to be valid latin-1 #3763

CSRF assumes the token to be valid latin-1 #3763

hynek commented Jul 9, 2024 •

edited

Loading

mmerickel commented Jul 10, 2024

ztane commented Oct 9, 2024

CSRF assumes the token to be valid latin-1 #3763

CSRF assumes the token to be valid latin-1 #3763

Comments

hynek commented Jul 9, 2024 • edited Loading

Bug Report

mmerickel commented Jul 10, 2024

ztane commented Oct 9, 2024

hynek commented Jul 9, 2024 •

edited

Loading