various race conditions in applying old edits to updated documents #19

wrongu · 2023-06-08T14:50:37Z

current trigger/edit lifecycle is

git pull
create AI triggers
git pull (to "debounce" / check for recent human edits and potentially cancel the triggers)
run AI
push changes

but the extra pull in step 3 can invalidate the triggers in step 2

should be able to just remove the extra pull

j6k4m8 · 2023-06-08T15:04:32Z

need to pull RIGHT BEFORE we push to minimize merge errors (esp with overleaf since it rewrites git history all the time, but also generally true of all git)...

wrongu · 2023-06-08T15:11:24Z

I understand wanting to minimize merge errors, but that's also a race condition!

Imagine:

pull
trigger an edit
check for debouncing --> ok! no edits in the last few moments
simultaneously (LLM starts working) and (user deletes a line earlier in the paper)
pull again and edit lines

now the lines being edited in step 5 != the same lines that were triggered in step 2

basically we're rediscovering the old problem that "simultaneous editing is hard" and this is why OT was invented. And one of the core ideas of OT is that every edit needs to be relative to some timestamped/hashed version of the document. If you try to change the hash/timestamp of an edit before/after the edit is done, you're gonna have a bad time.

Right now we're using git merge as a hacky version of OT. git already does timestamping/versioning for us using commits. We just need to lean into that for now, and follow the rule that 1 edit = 1 commit

j6k4m8 · 2023-06-08T15:19:09Z

yep — failure mode here is that if you don't do the race-conditiony thing, your push fails and you throw away all your work anyway. if you do the second pull, you have a last-ditch effort for it to be valid (and will still fail if the doc has been updated during the AI round trip in a way that invalidates the edit). I'm cool with either answer here provided it actually runs; the reason I have the duplicate pull right now is because it invariably failed without it due to overleaf's weird treatment of histories

…

On Thu, Jun 8, 2023 at 11:11 AM wrongu ***@***.***> wrote: I understand wanting to minimize merge errors, but that's also a race condition! Imagine: 1. pull 2. trigger an edit 3. check for debouncing --> ok! no edits in the last few moments 4. simultaneously (LLM starts working) and (user deletes a line earlier in the paper) 5. pull again and edit lines now the lines being edited in step 5 != the same lines that were triggered in step 2 basically we're rediscovering the old problem that "simultaneous editing is hard" and this is why OT was invented. And one of the core ideas of OT is that every edit needs to be relative to some timestamped/hashed version of the document. If you try to change the hash/timestamp of an edit before/after the edit is done, you're gonna have a bad time. Right now we're using git merge as a hacky version of OT. git already does timestamping/versioning for us using commits. We just need to lean into that for now, and follow the rule that 1 edit = 1 commit — Reply to this email directly, view it on GitHub <#19 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFJKB7T5HNL6BYWGID72B3XKHTSPANCNFSM6AAAAAAY7NGIFU> . You are receiving this because you commented.Message ID: ***@***.***>

wrongu · 2023-06-08T15:28:54Z

In that case we really need our own mini-OT system. This would be a function with a signature like

def apply_ot_using_git(commit1, edit1, commit2) -> edit2:
    ....

And the simplest version of this function would be just to update the line numbers in the edit to account for any changes that happened between commit1 and commit2 OR raise an error if commit1->commit2 involved changes to the same lines

wrongu · 2023-06-08T16:07:46Z

...follow up thought: as long as we're happy to lean heavily on git, and we're happy to just error-out when it fails, the ot function here can be a git rebase. We would apply edit1 to commit1, commit it as commit3, and then rebase commit3 onto commit2

wrongu · 2023-06-09T17:54:49Z

...follow up Part II: refactoring the trigger lifecycle further, we want to allow 1 Trigger to give rise to any number of Edits. So I think we may want PaperRemote to support perform_batch_of_edits() not just perform_edit(), which is another reason where we want our own OT, but this one might be ugly if done using git 😢

j6k4m8 · 2023-06-09T18:19:18Z

sigh —

EditResponse should probably have some signature like,

class EditResponse:

    def as_diff(self, repo):
        ...
        
    def as_ot(self, original_text):
        ...
        
    def as_totally_new_string(self, original_text):
        ...

so that we can apply an edit to any doc type....

or maybe EditResponse is super simple in structure but each paperremote needs to know how to apply it themselves.... have NO idea what that API would look like.

wrongu · 2023-06-09T18:27:05Z

See #21 . I made EditResult a data class to parallel EditTrigger. I figure we want the Trigger/Result semantics to be independent of a particular remote, but we can hold the remotes to a high-ish standard where they must provide some basic OT

* add test fixture for OverleafGitPaperRemote * testing that git can resolve non-conflicting edits using merge (partially addressing #19) * testing that we recover gracefully from merge conflicts and other kinds of bad edits * better git diff handling to test for recent human edits * handling edits' revision_id in OverleafGitPaperRemote. Snazzy `with paper.rewind(commit_id)` syntax

wrongu added bug Something isn't working good first issue Good for newcomers git labels Jun 8, 2023

wrongu changed the title ~~race condition in debouncing~~ various race conditions in applying old edits to updated documents Jun 12, 2023

wrongu mentioned this issue Jun 20, 2023

Improvements to OverleafGitPaperRemote #25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

various race conditions in applying old edits to updated documents #19

various race conditions in applying old edits to updated documents #19

wrongu commented Jun 8, 2023

j6k4m8 commented Jun 8, 2023

wrongu commented Jun 8, 2023

j6k4m8 commented Jun 8, 2023 via email

wrongu commented Jun 8, 2023 •

edited

Loading

wrongu commented Jun 8, 2023

wrongu commented Jun 9, 2023

j6k4m8 commented Jun 9, 2023

wrongu commented Jun 9, 2023

various race conditions in applying old edits to updated documents #19

various race conditions in applying old edits to updated documents #19

Comments

wrongu commented Jun 8, 2023

j6k4m8 commented Jun 8, 2023

wrongu commented Jun 8, 2023

j6k4m8 commented Jun 8, 2023 via email

wrongu commented Jun 8, 2023 • edited Loading

wrongu commented Jun 8, 2023

wrongu commented Jun 9, 2023

j6k4m8 commented Jun 9, 2023

wrongu commented Jun 9, 2023

wrongu commented Jun 8, 2023 •

edited

Loading