-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IntegrityError at /api/v2/search when searching remote handles #642
Comments
What version are you running? I suspect this was already fixed as part of #634, it's just not in a release yet. |
I'm on main. The build is from yesterday |
The error does not occur when I reduce gunicorn workers to 1 ( |
OK - I'll see if I can investigate it soon then. It might be a few weeks. Looks like some sort of error in the new fetching code, maybe - have you tried the current stable release to see if it still breaks there? |
No I haven't and unfortunately I can't downgrade due to database issues. I'll try to check this with a test instance. |
Ah right, we've had migrations on main since the release (which is incidentally why I have not done a patch release yet). Let me know what you find - if this is caused by our fix for the other error breaking search, it would be good to know. |
I did some research and at least for mastodon, only follows of the local instance are shown in the mobile app. And for posts, only future posts are downloaded to the local instance and counted. It looks like this is desired behavior.
The same error also occurs on 0.9.0. Error log for version 0.9.0 below.
|
Thanks, appreciate you checking! I can't easily replicate it on my own server, but the traceback you provided should hopefully be enough to put a potential fix in place anyway. |
Hmm, taking at a look at the source code, we already have something in place to stop this happening - it's meant to check if the user exists right before it makes it. Are you able to go into the Django Admin for me and get a screenshot of what the |
Like this? Without looking at the code I assume this is a concurrency issue because it only occurs the first time a search for a user. I guess there are two processes that check for the existence of the db entry, both get the result that it doesn't exist and both try to create it. The creation fails the second time because the first one already created it. |
Ah, I missed that it only happens on the first search for a user, sorry - you're probably right. The search endpoint should only be called once, but my guess is that the client you're using calls it twice for some reason. |
OK, the commit I just landed above should fix it - I made that whole process happen in a transaction. Let me know if it doesn't and we can reopen. |
Ah nvm the error still occurs. Same as before :/ |
I did some testing and added a few print statements to see whats wrong. Diffdiff --git a/users/models/identity.py b/users/models/identity.py
index bebaa31..46f5b14 100644
--- a/users/models/identity.py
+++ b/users/models/identity.py
@@ -2,6 +2,9 @@ import ssl
from functools import cached_property, partial
from typing import Literal, Optional
from urllib.parse import urlparse
+import traceback
+from datetime import datetime
+from random import randint
import httpx
import urlman
@@ -396,14 +399,20 @@ class Identity(StatorModel):
domain = domain.lower()
with transaction.atomic():
+ debug_nonce = randint(0, 2**16 - 1)
+ tprint = lambda x: print(str(debug_nonce) + " " + str(datetime.now().strftime("%d.%m.%Y %H:%M:%S.%f")) + " " + x)
+ #traceback.print_stack()
+ tprint("Entering transaction atomic")
try:
if local:
+ tprint("local-before-return")
return cls.objects.get(
username__iexact=username,
domain_id=domain,
local=True,
)
else:
+ tprint("else-before-return")
return cls.objects.get(
username__iexact=username,
domain_id=domain,
@@ -412,9 +421,11 @@ class Identity(StatorModel):
if fetch and not local:
actor_uri, handle = cls.fetch_webfinger(f"{username}@{domain}")
if handle is None:
+ tprint("handle-before-return")
return None
# See if this actually does match an existing actor
try:
+ tprint("try-before-return")
return cls.objects.get(actor_uri=actor_uri)
except cls.DoesNotExist:
pass
@@ -422,13 +433,16 @@ class Identity(StatorModel):
username, domain = handle.split("@")
if not domain_instance:
domain_instance = Domain.get_remote_domain(domain)
+ tprint("domain-before-return")
return cls.objects.create(
actor_uri=actor_uri,
username=username,
domain_id=domain_instance,
local=False,
)
+ tprint("final-before-return")
return None
+ tprint("no-return hit")
@classmethod
def by_actor_uri(cls, uri, create=False, transient=False) -> "Identity": And I got the following output (random number per process, time, message), sorted by time:
So there are two processes that run concurrently. But even with atomic transaction, the second process starts executing the code before the first one is finished. Maybe also relevant: I see no error in the docker log, it's only sent via mail due to the |
Atomic transactions don't guarantee that the code isn't run concurrently, just that the view of the database should be linear - though that's obviously not quite what's happening here. I have some other ideas about how to fix this, so I'll try one of those in the next commit. |
Did that next commit happen? |
I'm using Tusky as client app for my own takahe server. In the takahe configuration I enabled
ERROR_EMAILS
.Whenever I search for a (remote) handle the first time e.g.
@[email protected]
using the Tusky search, I get two mails informing me about a server error.(full message see below)
This happens independently of the remote server software. I noticed this for mastodon, takahe, kbin and also some others. This only happens if the user is unknown to the server, so the very first time I do the search. Afterwards, the error does no longer occur but only the pinned posts are shown in the Tusky app and follows are also not displayed correctly (see screenshot).
Log in error mail (The log leaks some usernames and passwords, I redacted them manually):
Screenshot from the Tusky app:
The text was updated successfully, but these errors were encountered: