Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk.internal.misc.Unsafe.park #281

Open
ravendanot opened this issue Oct 22, 2024 · 10 comments
Open

jdk.internal.misc.Unsafe.park #281

ravendanot opened this issue Oct 22, 2024 · 10 comments
Labels

Comments

@ravendanot
Copy link

ravendanot commented Oct 22, 2024

Describe the bug
ANRs are being logged in Firebase Crashlytics. This issue occurs during the application closing process.

client.unregisterStatusListener(statusListener)
client.close()

Logs

jdk.internal.misc.Unsafe.park (Native method)
java.util.concurrent.locks.LockSupport.park (LockSupport.java:211)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire (AbstractQueuedSynchronizer.java:715)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly (AbstractQueuedSynchronizer.java:1047)
java.util.concurrent.Semaphore.acquire (Semaphore.java:318)
com.launchdarkly.sdk.internal.events.DefaultEventProcessor$EventProcessorMessage.waitForCompletion (DefaultEventProcessor.java:290)
com.launchdarkly.sdk.internal.events.DefaultEventProcessor.postMessageAndWait (DefaultEventProcessor.java:228)
com.launchdarkly.sdk.internal.events.DefaultEventProcessor.close (DefaultEventProcessor.java:162)
com.launchdarkly.sdk.android.ComponentsImpl$EventProcessorBuilderImpl$DefaultEventProcessorWrapper.close (ComponentsImpl.java:200)
com.launchdarkly.sdk.android.LDClient.closeInternal (LDClient.java:581)
com.launchdarkly.sdk.android.LDClient.closeInstances (LDClient.java:594)
com.launchdarkly.sdk.android.LDClient.close (LDClient.java:570)

SDK version
v5.3.0

Language version, developer tools
Kotlin v1.7.20

OS/platform
Android

@tanderson-ld
Copy link
Contributor

Thank you for reporting this.

Could you provide the percentage of sessions encountering this as well as the daily active users that are running the affected version of your app?

Are you able to reproduce this locally?

@ravendanot
Copy link
Author

Hello @tanderson-ld

Could you provide the percentage of sessions encountering this as well as the daily active users that are running the affected version of your app?
We see that all cases occur on devices with Android 14, mostly on Samsung devices in more than 80% of cases.

Are you able to reproduce this locally?
I have not been able to replicate it locally.

@tanderson-ld
Copy link
Contributor

tanderson-ld commented Oct 23, 2024

Thanks for answering those questions. The Android 14 part is very helpful.

For our first question, we are looking for a percentage of sessions affected. Ex: 1 thousand sessions in a month, and 10 people hit the issue, that would be 1%. This is a useful number for us as it helps us evaluate if it is related to usage, execution order, or a race condition within the SDK.

@ravendanot
Copy link
Author

In the last 30 days, this ANR has occurred in 33% of users and has occurred more than 1,500 times

@tanderson-ld
Copy link
Contributor

That is many more than expected.

  1. Is this new with a recent app version release?
  2. Has anything changed wrt to application shutdown lifecycle?

We will investigate.

@ravendanot
Copy link
Author

The last thing that was done was the migration of the sdk, from v3.1.1 to v5.3.0. But I don't have the data from when we did the update.

@tanderson-ld
Copy link
Contributor

Hi again @ravendanot. I have spent some time investigating this more.

Theory:
Events are used to drive most of the data you see about flag evaluations, experiments, and rollouts in the LaunchDarkly dashboard. Current theory here is that the SDK is attempting to clear out the queue of events and for some reason that is blocking. Best guess is internet/routing was available but then drops once the event flush task executes. I'm trying to reproduce this theory and may try to optimize the failure handling.

Questions:

  1. Does this application operate in a firewalled environment where the events endpoint may be blocked? This may be leading to the event flush routine taking longer due to errors/retry logic.
  2. Have you customized the events URL via the LDConfig? Normally this is not done unless you need to route events through a proxy.
  3. You said approximately 33% of users are encountering this issue. Is your application used in an environment that is more likely to have intermittent internet connectivity when compared to most mobile applications?

Workaround options:

  1. This may seem counter-intuitive, but you don't need to call close() if you are OK with not having those final events sent.
  2. Instead of calling close(), call client.flush(). This will result in flush being attempted in a non-blocking manner, but ultimately if the application process is terminated before the thread that sends the events can execute, events may be dropped.

@tanderson-ld
Copy link
Contributor

Are you able to see the duration of the ANR? I have been able to reproduce something similar for ~1 second related to the retry interval when sending events. Code is here.

@ravendanot
Copy link
Author

Hello @tanderson-ld.

We have not been able to see the duration of the ANR. I am going to try client.flush() solution and see the behavior.

@tanderson-ld
Copy link
Contributor

tanderson-ld commented Nov 7, 2024

Sounds good, please let us know if that ends up resolving the issue. I still think we will try to make a fix, but will wait for your result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants