Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for crash caused by persistent logging #321

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

KennyHuRadar
Copy link
Contributor

@KennyHuRadar KennyHuRadar commented Mar 29, 2024

Context

There were some crash logs coming in that suggests a potential issue introduced by our implementation of persistent logging.
This is the reason for termination:

"termination" : {"code":2343432205,"flags":6,"namespace":"FRONTBOARD","reasons":["<RBSTerminateContext| domain:10 code:0x8BADF00D explanation:scene-update watchdog transgression: app<com.arbys.ios.arbysapp(953B57DB-6DA7-4815-87D5-E63C81A9B801)>:44562 exhausted real (wall clock) time allowance of 10.00 seconds","ProcessVisibility: Background","ProcessState: Running","WatchdogEvent: scene-update","WatchdogVisibility: Background","WatchdogCPUStatistics: (",""Elapsed total CPU time (seconds): 19.060 (user 15.870, system 3.190), 32% CPU",",""Elapsed application CPU time (seconds): 9.625, 16% CPU"",") reportType:CrashLog maxTerminationResistance:Interactive>"]},

This error is caused when a process is blocking the main thread for some time. https://developer.apple.com/documentation/xcode/addressing-watchdog-terminations

The frames of the logs contain these notable snippets:

{"imageOffset":9896,"symbol":"_dispatch_call_block_and_release","symbolLocation":32,"imageIndex":10},{"imageOffset":17152,"symbol":"_dispatch_client_callout","symbolLocation":20,"imageIndex":10},{"imageOffset":76184,"symbol":"_dispatch_main_queue_drain","symbolLocation":984,"imageIndex":10},{"imageOffset":75184,"symbol":"_dispatch_main_queue_callback_4CF","symbolLocation":44,"imageIndex":10},{"imageOffset":225180,"symbol":"CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE","symbolLocation":16,"imageIndex":9},{"imageOffset":212136,"symbol":"__CFRunLoopRun","symbolLocation":1996,"imageIndex":9},

This suggests that this chunk of code was dispatched to the main queue.

{"imageOffset":339008,"symbol":"__CFSimpleMergeSort","symbolLocation":380,"imageIndex":9},{"imageOffset":338876,"symbol":"__CFSimpleMergeSort","symbolLocation":248,"imageIndex":9},{"imageOffset":338876,"symbol":"__CFSimpleMergeSort","symbolLocation":248,"imageIndex":9},{"imageOffset":338876,"symbol":"__CFSimpleMergeSort","symbolLocation":248,"imageIndex":9},{"imageOffset":338876,"symbol":"__CFSimpleMergeSort","symbolLocation":248,"imageIndex":9},{"imageOffset":338904,"symbol":"__CFSimpleMergeSort","symbolLocation":276,"imageIndex":9},{"imageOffset":338904,"symbol":"__CFSimpleMergeSort","symbolLocation":276,"imageIndex":9},{"imageOffset":338904,"symbol":"__CFSimpleMergeSort","symbolLocation":276,"imageIndex":9},{"imageOffset":338500,"symbol":"CFSortIndexes","symbolLocation":472,"imageIndex":9},{"imageOffset":626744,"symbol":"-[NSArray sortedArrayFromRange:options:usingComparator:]","symbolLocation":724,"imageIndex":9},{"imageOffset":625972,"symbol":"-[NSMutableArray sortedArrayFromRange:options:usingComparator:]","symbolLocation":60,"imageIndex":9},

This suggest that the sortedArrayFromRange:options:usingComparator within RadarFileStorage was ran.

"imageOffset":12692,"symbol":"CFStringCreateMutableCopy","symbolLocation":312,"imageIndex":8},{"imageOffset":345112,"symbol":"-[NSString stringByReplacingOccurrencesOfString:withString:options:range:]","symbolLocation":72,"imageIndex":9},{"imageOffset":6440552,"imageIndex":7},

{"imageOffset":473940,"symbol":"_NSScanLongLongFromString","symbolLocation":1332,"imageIndex":9},{"imageOffset":472552,"symbol":"-[NSString longLongValue]","symbolLocation":76,"imageIndex":9},

Another notable part is that before each crash, the app seemed to progress to a different point of the sort.

Discussion

The first suspected culprit was definitely the sort. But a closer look at the implementation of the sort casts some doubt on that hypothesis. The sort is being carried out on an immutable array of strings with the comparator casting the string into NSIntergers. If the comparator fails to parse the strings, the strings will simply be unsorted. It is unlikely that such a sort would itself cause terrible lag. The fact that the app seemed to be progressing through different points of the sort when it got shut down seems to suggest that there is no one part of the sort that is deterministically slowing everything down.

The second culprit is the thread in which the crashes were occurring in the main thread. The main thread is responsible for updating the UI and is generally considered "less performant" and more "crowded". Close inspection of the code path revealed 2 instances where logging-related operations were dispatched to the main thread. Considering how persistent logging involves synchronized reads and writes to storage, this implementation is not appropriate and will impact UI performance.

Furthermore, a ramped-up tracking state creates a lot of logs while also attempting to sync logs with the server frequently. This may cause a "backup" of synchronized calls waiting to be executed. Hence, a persistent logging operation that was dispatched to the main queue may be blocked by preceding persistent logging operations, increasing the likelihood of the watchdog shutting the operation down.

Suggested Remediations

  • Move persistent logging operations off the main queue
  • Fix a possible data casting issue used in the sorting

Limitations

  • I was unable to reproduce such crashes in Waypoint. To the best of my knowledge this issue has not occurred in other apps that use this SDK. There may be other unexpected interactions within the Arby's app that causes this issue to manifest.

@KennyHuRadar KennyHuRadar marked this pull request as ready for review April 1, 2024 17:48
@KennyHuRadar KennyHuRadar changed the title [WIP] Fix for crash caused by persistent logging Fix for crash caused by persistent logging Apr 1, 2024
RadarSDK/Radar.m Outdated
onComplete(status);
}];
}
onComplete(status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what thread is being used to call this completion handler (could also be the main thread)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy to reason through which thread will call this chunk at runtime. It won't t hurt to be explicit in this case. I should just send it to a another thread.

@@ -70,7 +70,9 @@ - (void)logWithLevel:(RadarLogLevel)level type:(RadarLogType)type message:(NSStr

os_log(OS_LOG_DEFAULT, "%@", log);

[[RadarDelegateHolder sharedInstance] didLogMessage:log];
dispatch_async(dispatch_get_main_queue(), ^{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to use the main thread?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the typical pattern we use throughout the codebase callbacks

Copy link
Contributor Author

@KennyHuRadar KennyHuRadar Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are calling the dispatcher, we should use the main thread as the lister on the other end may decide to use it to update the UI. e.g. have a super user log console that print out logs as they occur. I think our current pattern is an oversimplification. I think we should only call "end user supplied" callbacks and delegate dispatches on the main thread as those may be used to cause UI updates. Other callbacks within the internal working of our SDK should not needlessly crowd the main thread baring threading/synchronization concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at this, I'm thinking we shouldn't have this on the main thread. Unlikely a production user is gonna update UI based on the logs, and if there's lots of logging happening this could be contributing to main thread bloat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds right to me. But this is technically a breaking change so I'm adding a note in migration guide for completeness.

@lmeier lmeier force-pushed the make-resign-active-log-append branch from 756538a to a9a884d Compare April 11, 2024 21:19
ShiCheng-Lu pushed a commit that referenced this pull request Jun 13, 2024
* make notification take same custom icon and background as foreground service

* implemented new radarnotificationsoptions class with usage in creating notifications and tests

* cleanup

* make name singular and formatting changes

* changed comments

* add punctioation to comment

* spacing formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants