You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a user uploads a CSV for a batch send, it needs to be structured a certain way and contain the correct character encoding (ideally this is just straight UTF-8!). However, we don't have total control over what people upload, so we need to make sure we have the proper safeguards in place.
Some of this we've already accounted for, and there's some specific checks we have in place that look for things like BOMs (byte order marks) and such. We've recently encountered another edge case that wasn't handled by this though, and it resulted in the system attempting to retry parsing a busted CSV file an order of magnitude times more than lines in the file itself, which also resulted in millions of log entries being generated in quick succession.
We need to rethink some of our approach to our CSV processing and come up with a better way for handling user input of this nature. We also need to make sure that the system handles failures and exceptions appropriately when it comes to file processing. Finally, we need to make sure we have clear, plain language instructions for users on the site on what to do and how to properly upload these CSVs (spreadsheets).
First Step
The first step with this epic is to write up an ADR that evaluates our current architecture and approach to CSV processing and proposes an alternative(s) that accounts for the following:
Is able to sanitize and structure the input file to a known, valid structure that we want to operate against, regardless of what the original source looked like.
If it can't do the sanitization or restructuring, then we error out immediately and provide useful error messages and feedback to the user so they know what to do and how to fix the problem.
With the sanitized input file, parse the batch and hand things off; if at any point there's a failure that we can't recover from, we need to make sure the application handles that gracefully, terminates the job immediately and fully, and provides useful error feedback and information to the user. We do not want to retry the job again and log thousands or millions of errors!
Once we have an ADR written, we'll discuss it as a team and figure out what makes sense to do moving forward, then update this epic with links to new users stories for the actual implementation and testing of the work.
The text was updated successfully, but these errors were encountered:
When a user uploads a CSV for a batch send, it needs to be structured a certain way and contain the correct character encoding (ideally this is just straight
UTF-8
!). However, we don't have total control over what people upload, so we need to make sure we have the proper safeguards in place.Some of this we've already accounted for, and there's some specific checks we have in place that look for things like BOMs (byte order marks) and such. We've recently encountered another edge case that wasn't handled by this though, and it resulted in the system attempting to retry parsing a busted CSV file an order of magnitude times more than lines in the file itself, which also resulted in millions of log entries being generated in quick succession.
We need to rethink some of our approach to our CSV processing and come up with a better way for handling user input of this nature. We also need to make sure that the system handles failures and exceptions appropriately when it comes to file processing. Finally, we need to make sure we have clear, plain language instructions for users on the site on what to do and how to properly upload these CSVs (spreadsheets).
First Step
The first step with this epic is to write up an ADR that evaluates our current architecture and approach to CSV processing and proposes an alternative(s) that accounts for the following:
Let's also make sure we're really thinking through this: there's the Python CSV documentation itself, the Python Cookbook and Fluent Python O'Reilly books, articles on how to work with CSV files on Geeks for Geeks and Real Python, and any number of other good authoritative sources of info on how to approach this! 🙂
Next Step
Once we have an ADR written, we'll discuss it as a team and figure out what makes sense to do moving forward, then update this epic with links to new users stories for the actual implementation and testing of the work.
The text was updated successfully, but these errors were encountered: