-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activate share_pos on the room-list sliding sync instance #4035
base: main
Are you sure you want to change the base?
Conversation
3e42b53
to
d17e83f
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4035 +/- ##
==========================================
- Coverage 84.90% 84.86% -0.04%
==========================================
Files 271 271
Lines 29070 29071 +1
==========================================
- Hits 24681 24671 -10
- Misses 4389 4400 +11 ☔ View full report in Codecov by Sentry. |
d17e83f
to
07310ee
Compare
Ahhh crypto dragons has been triggered :/ . Time to set up complement-crypto :) |
07310ee
to
11d63f2
Compare
Hum |
11d63f2
to
4140bbd
Compare
The failed test was a timeout so I think it's a false alarm. |
It failed again, it may not be a flake especially since it's restart related ? |
The log files show:
..which imply that |
A successful local run shows:
|
Right, found the cause. When it starts up, 2 sliding sync connections are made:
This causes 2x HTTP requests to /sync. On passing tests, both finish quickly. On failing tests, one does not return at all for the duration of the test. Looking at the MITM dump for the failed run and it's clear that the one that doesn't return is {
"conn_id": "room-list",
"extensions": {
"account_data": {
"enabled": true
},
"receipts": {
"enabled": true,
"rooms": [
"*"
]
},
"typing": {
"enabled": true
}
},
"lists": {
"all_rooms": {
"filters": {
"not_room_types": [
"m.space"
]
},
"include_heroes": true,
"ranges": [
[
0,
0
]
],
"required_state": [
[
"m.room.encryption",
""
],
[
"m.room.member",
"$LAZY"
],
[
"m.room.member",
"$ME"
],
[
"m.room.name",
""
],
[
"m.room.canonical_alias",
""
],
[
"m.room.power_levels",
""
]
],
"timeline_limit": 1
}
},
"txn_id": "86a22ed3a03948dc985c7c32be1b5327"
} I suspect the sync code assumes that room-list requests return immediately on startup (which has been true up until now). By sharing the So this is a real bug unfortunately. The reason why it is flakey is likely due to the slow speed of GHA boxes. I suspect the test itself runs much more slowly, giving enough time for the new data to come down before the restart, meaning when it starts up again there is no new data to fetch, hence it blocks waiting for more data. The end-user impact of this will presumably be a spinner showing for 30s (as it's also waiting on the same listener the tests are) upon restart. You likely didn't see this because your account is way more active, meaning there is a high chance something will come down before the timeout, but in a test scenario this is not the case. |
@kegsay thanks a lot for the detailed investigation 🙏 The fix should be easy-ish with all those infos, I'll have a look tomorrow. |
The fix fwiw is to just use ?timeout=0 on the first sync (before the room list state is loaded). This matches sync V2 behaviour in other clients. |
I don't think it will be useful: syncv2 spec is saying that no timeout specified is the same as timeout=0. Synapse code seems to agree, for both syncv2 and SSS. |
Nevermind, timeout of the first req is 60s in your logs. |
It seems we don't really need to use this trick here, cf 492166a. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes for the comments in testing; I'll let @Hywan finish the review and have the final words. Thanks!
@@ -156,6 +156,9 @@ impl RoomListService { | |||
.with_to_device_extension( | |||
assign!(http::request::ToDevice::default(), { enabled: Some(true) }), | |||
); | |||
} else { | |||
// TODO: This is racy with encryption, needs cross-process lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comment to specify that this is racy if encryption is used within the room list.
pub fn loading_state(&self) -> Subscriber<RoomListLoadingState> { | ||
self.loading_state.subscribe() | ||
self.loading_state.subscribe_reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this?
We are very close to see a merge here. As soon as you address the last comments, we are good. |
Fixes #3936.
Signed-off-by: Mathieu Velten [email protected]