Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Exception Groups #37716

Closed
11 of 13 tasks
AbhiPrasad opened this issue Aug 11, 2022 · 21 comments
Closed
11 of 13 tasks

Support Exception Groups #37716

AbhiPrasad opened this issue Aug 11, 2022 · 21 comments
Assignees

Comments

@AbhiPrasad
Copy link
Member

AbhiPrasad commented Aug 11, 2022

Problem Statement

Typically, we are used to exceptions having a linear relationship to one another (an exception causes another, which causes another). We represent this by allowing folks to send in a list of exceptions for error events.

In several languages though, there is this idea of AggregateExceptions. With AggregateExceptions, we have error relationships where an AggregateException has multiple causes (a 1 to N relationship).

This was initially raised by @mortargrind in getsentry/sentry-javascript#5469, borrowing their example below:

Legend:

  • Error1 => Error2 means Error1 caused by Error2 (handled by existing Sentry data model)
  • Error1 [Error2, Error3] means Error1 aggregates Error2 and Error3 (would be handled by the hypothetical AggregatedErrors)

Sample error chain:

HighLevelError => AggregateException [LowLevelError1, LowLevelError2 => LowerLevelError1, LowLevelError3] => LowestLevelError

Typically these AggregateExceptions are produced when you have futures/async await/coroutines, so a set of tasks (and their possible exceptions) getting associated by the callee of the tasks.

There is no way in Sentry to represent the relationship between these errors.

Examples:

UPDATE

This feature is coming soon. Here is the list of items being tracked to complete it.

Sentry Backend & Relay

Preview Give feedback
  1. mattjohnsonpint
  2. Scope: Backend
  3. Scope: Backend

Sentry UI

Preview Give feedback
  1. Scope: Frontend
  2. Scope: Frontend
  3. Scope: Frontend

SDKs

Preview Give feedback
  1. antonpirker
  2. mattjohnsonpint
  3. Status: In Progress Type: Improvement
    lforst

Misc

Preview Give feedback
  1. 1 of 4
    Product Area: Issues Type: Content
@polamjag
Copy link

polamjag commented Oct 9, 2022

I think golang/go#53435 (comment) (multierror support in standard library) should be an example in Golang

@andyljones
Copy link

We've started using ExceptionGroups extensively in our codebase and support for this has become important to us. Sentry has become dramatically less useful to us now that both the UI stacktrace and the raw output are missing the original error.

@sbv-csis
Copy link

We've also recently started using python ExceptionGroups - if even just the raw stack trace contained the nice stringified exceptiongroup stacktrace that would be major improvement

@therealarkin
Copy link

@andyljones : Can you describe why Sentry has become less useful because of less of support? Thanks!

@therealarkin
Copy link

@sbv-csis What do you use ExceptionGroups for?

@andyljones
Copy link

@therealarkin Sentry only displays the top-level error in the tree. This means it hides the origin of any error that rises up as a group.

Here's an example:

import sentry_sdk
import trio

async def f():
    raise ValueError()
   
async def main():
    try:
         async with trio.open_nursery() as nursery:
             nursery.start_soon(f)
    except Exception as e:
        sentry_sdk.capture_exception(e)
        raise

trio.run(main)

This code runs f in a background task, and f raises an error.

The stack trace printed by this shows both the stack for the code that started the background task (main), and the actual background task (f).

  + Exception Group Traceback (most recent call last):
  |   File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3442, in run_code
  |     exec(code_obj, self.user_global_ns, self.user_ns)
  |   File "<ipython-input-10-f738d97c5b2e>", line 15, in <module>
  |     trio.run(main)
  |   File "/root/code/belt/belt/triospy.py", line 498, in run
  |     return _original_trio_run(*args, **kwargs)
  |   File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/trio/_core/_run.py", line 2010, in run
  |     raise runner.main_task_outcome.error
  |   File "<ipython-input-10-f738d97c5b2e>", line 9, in main
  |     async with trio.open_nursery() as nursery:
  |   File "/root/.pyenv/versions/3.10.8/lib/python3.10/site-packages/trio/_core/_run.py", line 850, in __aexit__
  |     raise combined_error_from_nursery
  | trio.NonBaseMultiError: ValueError()
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "<ipython-input-10-f738d97c5b2e>", line 5, in f
    |     raise ValueError()
    | ValueError
    +------------------------------------

Sentry meanwhile only shows where the error came out of the nursery

image

@mattjohnsonpint
Copy link
Contributor

mattjohnsonpint commented Mar 16, 2023

Hi. I'm working on the overall design for how Sentry will handle this, and I would appreciate some feedback on a few things.

A few quick notes first:

  • I'll use the term "exception group" throughout this, but I'm referring to all platforms that implement similar features, as called out in the top of this issue.
  • I lead the Sentry .NET SDKs and I'm most familiar with AggregateException in .NET.
  • We're designing the feature generically. .NET and Python will be implemented first.
  • Existing workarounds in .NET will likely be removed in favor of this new design. For sake of design consistency, we'll pretend Flatten aggregate exception sentry-dotnet#1672 didn't happen.

The first thing to decide on is what to do with a given exception group. Let's look at a Python example (taken from here):

raise ExceptionGroup("nested",
    [
        ValueError(654),
        ExceptionGroup("imports",
            [
                ImportError("no_such_module"),
                ModuleNotFoundError("another_module"),
            ]
        ),
        TypeError("int"),
    ]
)
  + Exception Group Traceback (most recent call last):
  |   ...
  | ExceptionGroup: nested (3 sub-exceptions)
  +-+---------------- 1 ----------------
    | ValueError: 654
    +---------------- 2 ----------------
    | ExceptionGroup: imports (2 sub-exceptions)
    +-+---------------- 1 ----------------
      | ImportError: no_such_module
      +---------------- 2 ----------------
      | ModuleNotFoundError: another_module
      +------------------------------------
    +---------------- 3 ----------------
    | TypeError: int
    +------------------------------------

What might we expect to see in Sentry? My thinking is that we would want four separate issues raised for ValueError, ImportError, ModuleNotFoundError and TypeError.

The exception groups themselves are less important, however they might have stack traces showing where they were fired, and those might need to be visible on each issue.

Now, if say for example that ImportError also had a chained exception of its own that was a an ExceptionGroup, I don't think that one needs to be expanded. We'd just take the first exception from the group and discard the rest. Reason being - the extra exceptions probably don't create any additional insight into troubleshooting the issue.

Does that make sense?

Also, I think that we should be thinking about an upper limit of how many top-level exceptions within an exception group we'll support. A small number (perhaps 10?) makes sense to me. I'm not sure that we would want hundreds to be sent, which could happen in the case of parallel work.

Would appreciate any other insights you may have. Thanks.

@mattjohnsonpint
Copy link
Contributor

Next, assuming that we'd create separate issues as described above, how important would it be for those issues to be linked together in some way? In the example above, is it important that the four issues were raised together? Would you need a way to navigate from the ValueError to the TypeError (for example)?

If this is important, my idea is that we could assign each issue created for an exception group with the same unique identifier in a new field (perhaps group_id). We'd make that field indexed and searchable. Then we could update the UI to provide navigation to the issues list, filtering to all issues with the same group_id.

Or is that less important and we should just focus on getting actionable issues created with the right titles and groupings?

@therealarkin
Copy link

cc: @andyljones, @sbv-csis pls see @mattjohnsonpint comments above and feel free to chime in!

@andyljones
Copy link

Would appreciate any other insights you may have. Thanks.

My immediate instinct is to lean in the opposite direction to you - to just raise one ExceptionGroup error for the whole thing. My reason for this is that while you can be smart about dismantling an error group into components that are more semantically meaningful, it seems precarious - while the dismantling will be useful most of the time, the few times it becomes misleading will cost a lot of effort.

I obviously don't know much of Sentry internals, but my expectation is that the 80/20 MVP is just to have an equivalent of Python's ExceptionGroup stack trace in the Sentry UI. That'd take Sentry-with-exception-groups from 'unusable' to 'functional' for me immediately, and then anything on top of that would be a bonus.

@mattjohnsonpint
Copy link
Contributor

@andyljones - If we were to go that route instead, how would you expect the issue to be titled on the issues list in Sentry?

Keep in mind:

  • We'd want to avoid having multiple unrelated issues all titled the same, such as "ExceptionGroup: nested (3 sub-exceptions)"
  • We'd want Sentry's issue grouping to work in a sensible manner, such that the quantity or order of sub-exceptions doesn't affect the way we present multiple events as a single issue.

@andyljones
Copy link

We'd want to avoid having multiple unrelated issues all titled the same,

I think my aesthetic is that this is fine? My feeling is that this is something to be dealt with by a combination of (a) users raising informative exception types from exception groups (b) maybe (maybe?) some kind of improved grouping system that can match on nested errors.

I'm being lead here by how exception group handling is done in the language itself.

@mattjohnsonpint mattjohnsonpint changed the title Support AggregateExceptions Support Exception Groups Mar 24, 2023
@mattjohnsonpint
Copy link
Contributor

Thanks for the feedback. Hopefully we can come up with a solution that will accurately reflect the way the exception group is handled in the language, and also fit in to the expected workflow for issues in Sentry.

I have a few more questions. We'll start with this one:

In .NET, an AggregateException inherits from Exception. All Exception types have a nullable InnerException property that when populated indicates the cause of the exception. With AggregateException, we also have an InnerExceptions array. In all cases of AggregateException, the InnerException will match InnerExceptions[0].

However in Python, things are slightly different. On all exceptions, there is both a __context__ and a __cause__ attribute. __context__ is set if an exception is raised while handling an exception. __cause__ is set if an exception is raised from another exception.

try:
    try:
        raise Exception("1")
    except Exception as e1:
        e2 = Exception("2")
        raise Exception("3") from e2
except Exception as e:
    print (e.__context__) # 1
    print (e.__cause__)   # 2
    print (e)             # 3

Both can exist on a single exception, though context is suppressed in traceback output if cause has been set.

.NET's InnerException is closest to Python's __cause__. .NET doesn't have the concept of "context".

So far so good. The interesting part is that with a Python ExceptionGroup, none of the provided exceptions are considered to be either the context or cause.

try:
    e1 = Exception("1")
    e2 = Exception("2")
    raise ExceptionGroup("group", [e1, e2])
except ExceptionGroup as eg:
    print (eg.__context__) # None
    print (eg.__cause__)   # None
    print (eg.exceptions)  # (Exception('1'), Exception('2'))

So in theory, one could have completely different context or cause than the exceptions in the group.

e1 = Exception("1")
e2 = Exception("2")
e3 = Exception("3")
e4 = Exception("4")

try:
    try:
        raise e1
    except:
        raise ExceptionGroup("group", [e3, e4]) from e2
except ExceptionGroup as eg:
    print (eg.__context__) # 1
    print (eg.__cause__)   # 2
    print (eg.exceptions)  # (Exception('3'), Exception('4'))

I'm not sure how to rationalize this. Does it mean that exceptions 3 and 4 were also contributing to the cause of the exception group? If we could only pick one exception to show as the next item in the chain, which would it be?

The Sentry Python SDK currently does the following:

if exc_value.__suppress_context__:
    cause = exc_value.__cause__
else:
    cause = exc_value.__context__

Should we simply expand that and say if there's no cause or context, use the first exception of the exception group? (That would align with .NET InnerException = InnerExceptions[0].). If so, which should take precedence?

@antonpirker
Copy link
Member

I think there are two questions here for Python:

  1. What exception in an exception group to pick as "the problem to fix" of the exception group
  2. How to deal with chained exceptions in exception groups.

For 1) I think we just pick the first exception in eg.exeptions and see if the users complain that this is not what they wan.t

For 2) I guess we can ignore the eg.__context__ and eg.__cause because we are just presenting the user with one Exception and not the entire ExceptionGroup. And possible cause of the one Exception should be handled the same way as we deal now with chained exceptions.

This of course just the first step. To get some handling of ExceptionGroup out there and then we can collect feedback and improve the solution over time.

@mattjohnsonpint
Copy link
Contributor

mattjohnsonpint commented Mar 24, 2023

@antonpirker - Are you suggesting that __cause__ always be ignored on an ExceptionGroup, and we'd use exceptions[0] instead? Or should we use exceptions[0] only if __cause__ is None?

We need to pick a path, even if exception groups are located further down the chain. For example:

try:
    raise ValueError
except Exception as e1:
    try:
        raise ExceptionGroup("group", [Exception('A'), Exception('B')]) from e1
    except Exception as e2:
        raise TypeError from e2

Python will give the following output:

Traceback (most recent call last):
  File "test.py", line 2, in <module>
    raise ValueError
ValueError

The above exception was the direct cause of the following exception:

  + Exception Group Traceback (most recent call last):
  |   File "test.py", line 5, in <module>
  |     raise ExceptionGroup("group", [Exception('A'), Exception('B')]) from e1
  | ExceptionGroup: group (2 sub-exceptions)
  +-+---------------- 1 ----------------
    | Exception: A
    +---------------- 2 ----------------
    | Exception: B
    +------------------------------------

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    raise TypeError from e2
TypeError

So in that case, the exception group is in the middle of the chain, and there's already a cause.

  • ✅ The issue in Sentry would be for the TypeError, because it's the most recent exception. It would be the first exception shown on the issue (with the default sorting).
  • ✅ The second exception in the chain shown in Sentry would be fore the ExceptionGroup, because it was the direct cause of the TypeError
  • ❓ What is the third exception in the chain to show on the Sentry issue? Is it ValueError because it's the direct cause of the ExceptionGroup? Or is it Exception: A because it's the first item in the exception group? Does our answer change if __cause__ is None?

Answer that will help us understand how to build the exception tree in Python. One we figure that part out, I think I have a working plan for the rest. That would be as follows:

UPDATE: the below has been redacted because we've significantly deviated from this plan. Please see RFC 79 for details on the updated plan.

Phase 1 (what we'll commit to in the near term):

  • Issues in Sentry will be created for the "first path" through the tree of exceptions. In other words, by traversing from the root node to the first child until reaching a leaf. All other nodes of the tree would be discarded.
  • To handle issue titles and grouping, an SDK option will be provided to decide whether to keep the exceptions that are of the exception group type or not. The option name can be language specific (KeepAggregateExceptions in .NET, keep_exception_groups in Python, etc.).
    • The default should be false, meaning to discard the newest exception groups in the chain until reaching an exception that is not an exception group. (Any others further along in the chain would remain.)
    • Setting to true would opt-in to having issues raised for the entire chain, so the title would be similar to "ExceptionGroup: nested (3 sub-exceptions)" or "System.AggregateException: One or more errors occurred. (Test 1) (Test 2)", etc.
  • Elsewhere on the event we will include the entire exception group (as an extra, or message, or attachment, TBD). This will allow two things to happen:
    1. When looking at the "first path" issue, you'll still be able to see some representation of the entire exception group in the Sentry UI.
    2. It provides data we can use for Phase 2 (below).

Phase 2 (for future consideration - not committing to this just yet):

  • Sentry can run a background job when processing issues that will look for the exception group extra data.
  • If it finds any, and there's more than one top-level exception that should be its own issue, then it can split it apart, creating new issues. (It would clone the issue and replace the title and stack traces sections with the appropriate information from that part of the exception group tree.)
  • The net result would be that the SDKs would only send data for the one event, and that could create multiple issues in Sentry depending on what real exceptions were at the top-level in the exception groups.

@mattjohnsonpint
Copy link
Contributor

Some further discussion on the Python specifics here: getsentry/sentry-python#1788 (comment)

@mattjohnsonpint
Copy link
Contributor

mattjohnsonpint commented Mar 26, 2023

Hi all. I have drafted an RFC with the proposed changes. Please review, and feel free to leave feedback on that PR (or discuss here, if preferred).
https://github.com/getsentry/rfcs/blob/main/text/0079-exception-groups.md

Thanks!

@mattjohnsonpint
Copy link
Contributor

mattjohnsonpint commented Apr 1, 2023

Hi everyone. After significant discussion internally, we've revised the plan yet again. The RFC is updated with the new approach. Thanks.

https://github.com/getsentry/rfcs/blob/main/text/0079-exception-groups.md

@mattjohnsonpint
Copy link
Contributor

We're getting there! See the UPDATE section at top of this issue for current status. Thanks!

mattjohnsonpint added a commit that referenced this issue May 22, 2023
Implements issue grouping and titling changes required for processing
exception groups per [Sentry RFC
79](https://github.com/getsentry/rfcs/blob/main/text/0079-exception-groups.md#sentry-issue-grouping).

Part of #37716

A few notes:

- The issue titling requirements were implemented by dropping a
`main_exception_id` on the event data in the exception grouping logic,
and then later using it in the `ErrorEvent.extract_metadata` function.
If present, it uses the corresponding exception for metadata including
the title. This is working well, but just wanted to check if this is
safe, or if there's a better way.

- In the RFC, I had indicated that disparate top-level exceptions would
be grouped under their immediate parent exception, which might not
necessarily be the root exception. This turned out to be difficult to
implement, and didn't add much value. Instead, I always use the root
exception. This seems to work well enough. I included a comment at the
appropriate place in the code, in case it comes up in the future.

- I only modified the `NewStyle` strategy. I'm not sure if `Legacy` or
other strategies should be updated as well?

- I fixed a related bug in `exception.py`, which was that the first
exception was being used to create default issue tags instead of the
last. This should be done regardless of exception groups, as it corrects
the `handled` and `mechanism` event tags such that they are for the
outermost (latest) exception. Tests are updated to match this change as
well.
@mattjohnsonpint
Copy link
Contributor

This feature is now available for SaaS (sentry.io) customers. It will also be included in the next Sentry Self-Hosted release (23.6.0, est. June 2023).

On the client side, it requires a supporting SDK. Currently this is supported in:

JavaScript will be coming soon.

volokluev pushed a commit that referenced this issue May 30, 2023
Implements issue grouping and titling changes required for processing
exception groups per [Sentry RFC
79](https://github.com/getsentry/rfcs/blob/main/text/0079-exception-groups.md#sentry-issue-grouping).

Part of #37716

A few notes:

- The issue titling requirements were implemented by dropping a
`main_exception_id` on the event data in the exception grouping logic,
and then later using it in the `ErrorEvent.extract_metadata` function.
If present, it uses the corresponding exception for metadata including
the title. This is working well, but just wanted to check if this is
safe, or if there's a better way.

- In the RFC, I had indicated that disparate top-level exceptions would
be grouped under their immediate parent exception, which might not
necessarily be the root exception. This turned out to be difficult to
implement, and didn't add much value. Instead, I always use the root
exception. This seems to work well enough. I included a comment at the
appropriate place in the code, in case it comes up in the future.

- I only modified the `NewStyle` strategy. I'm not sure if `Legacy` or
other strategies should be updated as well?

- I fixed a related bug in `exception.py`, which was that the first
exception was being used to create default issue tags instead of the
last. This should be done regardless of exception groups, as it corrects
the `handled` and `mechanism` event tags such that they are for the
outermost (latest) exception. Tests are updated to match this change as
well.
@mattjohnsonpint mattjohnsonpint removed their assignment Jun 20, 2023
@lforst lforst self-assigned this Jul 4, 2023
@AbhiPrasad
Copy link
Member Author

Closing this issue as we've added the necessary JS implementation here: getsentry/sentry-javascript#5469

We now have getsentry/sentry-docs#7501 and getsentry/develop#932 for docs, but that can be tracked separately there.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants