Remove pymongo dependency from this project #771
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The original authors of this project chose to use pymongo's bson utils to handle JSON serialization and deserialization into various VARCHAR types in MySQL.
We have an unresolved security issue related to pymongo.
I've had questions in the past why are we using pymongo here, it suggests that sync-engine uses MongoDB while it does not. I think it's time to break that dependency because it just brings confusion, and pymongo will continue to receive security patches often as it contains C code which makes it prone to security bugs.
MongoDB's extended JSON syntax can serialize more types than stdlib's json module. The docs.
Pymongo's implementation sits in here https://github.com/mongodb/mongo-python-driver/blob/18328a909545ece6e1cd7e172e28271a59e367d5/bson/json_util.py. As you can see it handles many extra types. I analyzed all the call sites and data in those columns for
\$\w
references and concluded that we are only using"field": {"$date": <timestmp>}
syntax in sync-engine.Example
I've vendored the code and then stripped it to only include the parts that we are using.
Separately I think it makes sense to completely get rid of this $date syntax as I don't find it particularly readable.
I've already tested those changes first on my account in production and also on an entire cluster (nylas-e) and did not see any problems.
My intenral monologue about columns using JSON types
actionlog.extra_args
sync-engine/inbox/models/action_log.py
Line 67 in 64e70ec
account.sync_status
sync-engine/inbox/models/account.py
Line 182 in c064bf8
imapuid.extra_flags
sync-engine/inbox/models/backends/imap.py
Line 125 in 1b7f90d
imapuid.g_labels
sync-engine/inbox/models/backends/imap.py
Line 128 in 1b7f90d
message.from_addr
message.sender_addr
message.reply_to
message.to_addr
message.cc_addr
message.bcc_adr
message.in_reply_to
sync-engine/inbox/models/message.py
Lines 133 to 139 in 1b7f90d
Looks like json encoded:
from_addr list[tuple[str, str]]
sender_addr list[tuple[str, str]]
reply_to list[tuple[str, str]]
to_addr list[tuple[str, str]]
cc_addr list[tuple[str, str]]
bcc_addr list[truple[str, str]]
in_reply_to str
metadata.value
sync-engine/inbox/models/metadata.py
Lines 50 to 51 in 05bc8c9
This table is empty
event.participants
sync-engine/inbox/models/event.py
Line 176 in 1b7f90d
list[{"status": enum, notes: str, guests: list[?], name: str, email: str}]