Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a rare issue with incorrect message order when sending messages offline #3316

Merged
merged 9 commits into from
Jul 29, 2024

Conversation

laevandus
Copy link
Contributor

@laevandus laevandus commented Jul 18, 2024

🔗 Issue Links

Resolves PBE-4319

🎯 Goal

Fix the issue of incorrect order of messages when multiple messages are in the pending send state and connection comes back.

📝 Summary

  • Forward queued messages to OfflineRequestsRepository for retrying on the next web-socket connect event (otherwise we have a race between it and MessageSender). Message statuses are set to .sendingFailed.
  • When MessageSender is waiting for connection, it sends all the requests to the offline repository until it is notified connection status change (ensures that MessageSender and OfflineRequestsRepository are not sending messages for the came channel at the same time).
  • OfflineRequestsRepository removes duplicate send message requests (in case or manual retries while offline)

🛠 Implementation

APIClient sends failed API requests to SyncRepository here. This will then be passed to OfflineRequestsRepository which uses the shouldBeQueuedOffline for deciding if it should retry the failed request.
When this happens, we have MessageSender and OfflineRequestsRepository both trying to send messages on connect. This PR eliminates this race and makes OfflineRequestsRepository to be solely responsible of sending messages when connection status changes. When we have connected, MessageSender is allowed to send messages again if user does so. But all the previous ones, are handled by OfflineRequestsRepository.

🎨 Showcase

Recorded from the point where the first request has failed with a connection issue.

Before After
img img

🧪 Manual Testing Notes

Test case 1:

  1. Log in and open a channel
  2. Network link conditioner to 100% loss
  3. Send 5 messages
  4. Wait until the failure happens (takes a while, but all 5 will get failure badge)
  5. Manually trigger retries for failed messages
  6. Turn off network link conditioner in settings (I set it back to Wi-Fi, reconnection takes a bit of time as well)
  7. Observe the message list > messages are sent one by one and keep the order and there is no duplicate message error in the console
  8. Send some new messages > messages are sent

Test case 2

  1. Log in and open a channel
  2. Network link conditioner to 100% loss
  3. Send 5 messages
  4. Wait until the failure happens (takes a while, but all 5 will get failure badge)
  5. Turn off network link conditioner in settings (I set it back to Wi-Fi, reconnection takes a bit of time as well)
  6. Immediately try to send some more messages

Messages are sent one by one. Currently order changes because server gives updated timestamps. Can be improved if we would like to, but I did not want to include more changes to this PR. Solution would be to update local timestamps after each message send.

☑️ Contributor Checklist

  • I have signed the Stream CLA (required)
  • This change should be manually QAed
  • Changelog is updated with client-facing changes
  • Changelog is updated with new localization keys
  • New code is covered by unit tests
  • Comparison screenshots added for visual changes
  • Affected documentation updated (docusaurus, tutorial, CMS)

@laevandus laevandus added 🐞 Bug An issue or PR related to a bug 🌐 SDK: StreamChat (LLC) Tasks related to the StreamChat LLC SDK 🤞 Ready For QA A PR that is Ready for QA labels Jul 18, 2024
@laevandus laevandus requested a review from a team as a code owner July 18, 2024 13:28
Copy link

github-actions bot commented Jul 18, 2024

1 Warning
⚠️ The changes should be manually QAed before the Pull Request will be merged

Generated by 🚫 Danger

@laevandus laevandus force-pushed the fix/incorrect-message-order-while-offline branch from 85cee3b to a05ad79 Compare July 19, 2024 07:11
Copy link
Contributor

@martinmitrevski martinmitrevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 Added few questions, and also e2e tests need to be checked before merging.

CHANGELOG.md Outdated Show resolved Hide resolved
@laevandus laevandus force-pushed the fix/incorrect-message-order-while-offline branch from a05ad79 to f058c43 Compare July 22, 2024 07:25
@laevandus laevandus marked this pull request as draft July 23, 2024 08:27
@laevandus laevandus force-pushed the fix/incorrect-message-order-while-offline branch from f058c43 to ed70b02 Compare July 23, 2024 10:47
@GetStream GetStream deleted a comment from Stream-SDK-Bot Jul 23, 2024
@laevandus
Copy link
Contributor Author

After a longer discussion we'll change the direction a bit.

@laevandus laevandus changed the title Fix a rare issue with incorrect message order when sending messages offline [Refactoring to a different approach] Fix a rare issue with incorrect message order when sending messages offline Jul 23, 2024
@laevandus laevandus force-pushed the fix/incorrect-message-order-while-offline branch from ed70b02 to 45f360d Compare July 24, 2024 08:06
@laevandus laevandus changed the title [Refactoring to a different approach] Fix a rare issue with incorrect message order when sending messages offline Fix a rare issue with incorrect message order when sending messages offline Jul 24, 2024
@laevandus laevandus marked this pull request as ready for review July 24, 2024 08:20
@Stream-SDK-Bot
Copy link
Collaborator

StreamChat XCMetrics

target metric benchmark branch performance status
MessageList Hitches total duration 10 ms 11.68 ms -16.8% 🔽 🔴
Duration 2.6 s 2.56 s 1.54% 🔼 🟢
Hitch time ratio 4 ms per s 4.56 ms per s -14.0% 🔽 🔴
Frame rate 75 fps 78.14 fps 4.19% 🔼 🟢
Number of hitches 1 1.0 0.0% 🟰 🟢
ChannelList Hitches total duration 12.5 ms 15.02 ms -20.16% 🔽 🔴
Duration 2.6 s 2.55 s 1.92% 🔼 🟢
Hitch time ratio 5 ms per s 5.88 ms per s -17.6% 🔽 🔴
Frame rate 72 fps 74.17 fps 3.01% 🔼 🟢
Number of hitches 1.2 1.4 -16.67% 🔽 🔴

Copy link
Member

@nuno-vieira nuno-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code wise for now it looks good, but I'm going to request changes, since while doing QA it is introducing some regressions 👍

@@ -88,6 +88,47 @@ class MessageRepository {
})
}
}

/// Marks the message's local status to failed and adds it to the offline retry which sends the message when connection comes back.
func scheduleOfflineRetry(for messageId: MessageId, completion: @escaping (Result<ChatMessage, MessageRepositoryError>) -> Void) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MessageSender uses this to send queued messages directly to offline repo when it sees that there is connection issue.

Comment on lines +135 to +139
// Run offline actions requests as the first thing
if config.isLocalStorageEnabled {
operations.append(ExecutePendingOfflineActions(offlineRequestsRepository: offlineRequestsRepository))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to the first item on the syncing list so that we retry more quickly rather than waiting after other stuff.

Copy link

sonarcloud bot commented Jul 26, 2024

Copy link
Member

@nuno-vieira nuno-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ✅

@nuno-vieira nuno-vieira added the ❗️ High Priority An issue or PR that is marked as a high priority label Jul 29, 2024
@testableapple testableapple added 🟢 QAed A PR that was QAed and removed 🤞 Ready For QA A PR that is Ready for QA labels Jul 29, 2024
Copy link
Contributor

@testableapple testableapple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@laevandus laevandus merged commit ead44a7 into develop Jul 29, 2024
14 checks passed
@laevandus laevandus deleted the fix/incorrect-message-order-while-offline branch July 29, 2024 13:58
@Stream-SDK-Bot Stream-SDK-Bot mentioned this pull request Jul 30, 2024
nuno-vieira added a commit that referenced this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 Bug An issue or PR related to a bug ❗️ High Priority An issue or PR that is marked as a high priority 🟢 QAed A PR that was QAed 🌐 SDK: StreamChat (LLC) Tasks related to the StreamChat LLC SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants