Support for query complexity #2988

patrick91 · 2023-07-28T09:21:01Z

Hey folks, I while ago I wrote down some ideas on how we can add support for query complexity. For this we'll need two things:

An extensions that calculates the costs based on an operation
Support for customising the costs based on fields/arguments

For customising costs on fields and arguments I was thinking about something like this:

import strawberry

@strawberry.type
class Query:
    @strawberry.field(cost=2)
    def expensive_list(self, limit: Annotated[int, strawberry.argument(cost_multiplier=1)]) -> list[str]:
        return ["Item"] * limit

Here we are saying that each individual item in the expensive_list field costs 2, and the total is a multiplication between the field cost and the limit, so the result would be:

field_cost × limit × 1

We can also have defaults like this:

class StrawberryConfig:
    # Existing attributes and methods ...
    
    default_argument_cost_multiplier: Dict[str, Union[int, float]] = {
        "limit": 1,
        "first": 1, # probably not needed, but worth showing as an example
        "last": 1
    }

default_argument_cost_multiplier is a map between argument name and their multiplier, though I don't like the name that much.

What do you think?

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The text was updated successfully, but these errors were encountered:

devkral · 2023-08-07T22:54:35Z

nice idea. But maybe a proposal:

why not extending graphene-protector? This extension works for strawberry as well as graphene as well as plain graphql-python

And it is tested

devkral · 2023-08-07T22:55:27Z

(side note: a project of mine)

patrick91 · 2023-08-08T10:59:59Z

@devkral oh I forgot about it!

I think we could use internally 😊 my reasoning around having this feature built-in is so that we have a great API (I quite the new argument on field/argument) and also we make sure it always works, especially if we change the core of Strawberry 😊

devkral · 2023-08-09T23:19:08Z

I see your point. Assigning arguments with costs is a cool idea.

But how would an extension like mine retrieve the costs? It would be nice to have a function/property on every field which returns the accumulated costs

bradleyoesch · 2023-08-29T16:35:41Z

@patrick91

I've been looking at this quite a bit as well.

Here are some resources I've found and things I like about them that we could maybe take advantage of

https://graphql-ruby.org/queries/complexity_and_depth
- Some way to log/see the complexity of a request (maybe only in graphiql? or configurable to decide when to allow clients to be able to see it?)
- Default values for common types, e.g. simple types default to 1, connection types multiply
- Complexity override functions, not just integers
https://github.com/pa-bru/graphql-cost-analysis
- Map of costs (as you've posted above)
- Clear documentation on how unions and interfaces are handled
- I don't love setting the cost as a directive, I feel like complexity limiting is a security thing, not something that clients should really care about, assuming they're good actors
https://github.com/slicknode/graphql-query-complexity
- Estimators that can be defined, extended, and stacked, i.e. ordered list of functions that calculate complexity of a field, returning the first one that "matches"
https://shopify.engineering/rate-limiting-graphql-apis-calculating-query-complexity
- Documents defaults and why
- Outputs calculated cost in the extensions object at root response level, with varying levels of detail
- Requested vs actual cost (e.g. asking for 1000 things but only getting 2, estimated cost is much higher than actual cost)

I do like the Annotated with field/argument idea, because it makes it very clear what the cost is. A potential tradeoff is that the cost of fields is scattered around the codebase, rather than being defined explicitly in one place, like the map idea.

I think being able to define your own Estimators and pass those into the Extension could be a great way to provide defaults and to give a lot of flexibility for clients to do things themselves

So my overall thoughts on what I'd like to be able to do/have:

Easily be able to enable complexity limiting
Set a max complexity on the schema
- It could be useful to be able to set a max complexity override on specific operations, though exact use cases of this are unclear atm
Have some default cost values so if I add the extension and do nothing else, I do get some value out of it
- e.g. types: 1, scalars: 0, lists/connections: multipliers, metafields: 0 (e.g. cursor, pageInfo)
Set my own cost values, potentially with more configurability than just a number
- I'd like to be able to define costs and/or multipliers on types or on "categories of types"
A way to see/log/debug the complexity of a request
Would be nice to be able to work in a supergraph setting, where not all fields in the request are resolved by my schema
- I assume you'd lose some features here, but presumably for federated fields, the schema could add some extra complexity or a multiplier or something

Something like

QueryComplexityLimiter extension
Has a default max complexity limit
Has a default list of estimators that determine how to calculate cost on a field
Has a default map of costs/cost functions
Clients can extend/override the estimators and map
Clients can define costs in the map or at the field/input level
- Inheritance would be nice here, e.g. a Connection type that has a cost/multiplier built in that would apply to all child implementations
Complexity is calculated with cost integers and multipliers
Potentially calculate cost at validation time, and actual cost at resolve time

devkral · 2023-08-30T00:55:35Z

some parts I did already, for example you can exclude some path parts from adding complexity/depth. By default connections are reduced to one.

Some parts which are missing, are:

cost annotations by type (isinstance should be sufficient), ... (I would do the mapping via the main config object)
cost annotations by direct field annotation (extend the Limit dataclass with cost attribute)

zulrang · 2023-12-12T01:21:58Z

Ariadne has similar features: https://ariadnegraphql.org/docs/query-validators#setting-fields-costs-in-schema

cp2587 · 2024-05-06T12:46:35Z

Hello,

I think this would be very cool as well to have as a feature. Me and my team would be more than happy to help if there is a draft or minimal working extension

devkral · 2024-05-06T13:06:39Z

actually it is already implemented (graphene-protector) (except I don't check the arguments yet, bug).

What I need is someone who can help me with a graphql parser in rust (see graphql-core discussion), I need some help integrating them in python but I am short in time to learn rust.

Why? having some fancy protection algorithms is useless without a fast input parser which can handle the rough edges of graphql parsing.

cp2587 · 2024-05-06T13:36:40Z

Ok so it's already working but it's not fast enough for production usage without rust if i understand correctly. For now this is ok for us as i want to use this to essentially monitor the query constructed by the front team.
We don't have any particular skill in rust sadly but we are really interested in boosting the performance of strawberry so it could be useful for us to also understand this kind. I will see internally

devkral · 2024-05-06T13:53:23Z

The issue is in graphql-core (which is the python port of the reference implementation and probably also in use by adriane).

graphql-python/graphql-core#216

Of course it would be possible to build a stack free dfs in both languages. But this is double effort and still not as efficient as machine code. So my proposal is to remove these hacks and implement it right in rust, which can be used from both languages

serramatutu · 2024-06-25T11:48:23Z

Hey @bradleyoesch and @patrick91, adding to your suggestion:

IMO it would be nice if the implementation of the query complexity calculator was separate from the actual limiter. This way, users can implement the limiting in more creative ways such as.

Limit based on maximum complexity limit (as you suggested)
- Although simpler, the drawback of this approach is that it still allows a user to send 100000 requests with cost 1, even if it would block 1 query with cost 100000.
Limit based on a token bucket (this is probably what GitHub does under the hood from reading their article):
- Users should be able to pick which bucket to take from. They could use a global bucket, but also assign buckets by client IP, user identity, a combo of these or whatever fits their needs
- We take x tokens from the bucket(s) for a query of cost x. If the bucket(s) ran out of tokens, we block the request.
- This token bucket could be locally stored in memory, but also distributed (think redis, or some rate-limiting service like Envoy rate-limiter) to enforce global rate-limiting across instances.
Choose to not enforce rate-limiting and add that decision to context instead. This would allow users to gradually rollout and monitor rate-limiting in prod without risking disruptions. Envoy proxy supports this with their (ratelimit.http_filter_enforcing config)

I'm willing to implement a first version of this as a Strawberry extension that crawls nodes like determine_depth in QueryDepthLimiter (see implementation). There's probably a better approach by calculating query cost on the fly as we parse, but that would require hooking into graphql-core as it traverses the tree.

Is there anything else I should consider before I get going with this?

devkral · 2024-06-25T12:12:43Z

you can access the costs. You aren't required to block. You can even dynamically provide the limit.

serramatutu · 2024-06-25T12:14:44Z

@devkral do you have any benchmarks or data about the GraphQL parsing performance? I'm wondering if the parser would be such a bottleneck that it's a blocker to this. Since we already parse the query with what exists and use tree traversal in other extensions such as the QueryDepthLimiter, I think it should probably be fine to just make this in Python as well. I get this could be an attack surface if someone sends a ludicrously big query, but then you could just limit by character limit, then depth and it should be fine.

Of course, having a Rust parser would be sweet, I just don't think the parser is a blocker to this issue.

Also, I understand you have your own plugin, but it would be nice to have this as a native Strawberry extension.

erikwrede · 2024-06-25T12:16:27Z

@serramatutu the rust parser is almost ready for beta use 😉
Even then though, we should support pure-python for now.

+1 on separating cost calculation logic vs limitation logic, this would enable easy abstraction of both components

devkral · 2024-06-25T21:47:59Z

Someone picked this up and build a rust parser for embedding?

where do I find the rust parser? How can I test it?

Or did I misunderstand?

patrick91 · 2024-07-03T15:44:48Z

@devkral https://github.com/erikwrede/rustberry :)

devkral · 2024-07-10T01:57:25Z

nice, thank you

patrick91 added the feature-request label Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for query complexity #2988

Support for query complexity #2988

patrick91 commented Jul 28, 2023 •

edited by polar-sh bot

Loading

devkral commented Aug 7, 2023 •

edited

Loading

devkral commented Aug 7, 2023

patrick91 commented Aug 8, 2023

devkral commented Aug 9, 2023 •

edited

Loading

bradleyoesch commented Aug 29, 2023

devkral commented Aug 30, 2023

zulrang commented Dec 12, 2023

cp2587 commented May 6, 2024

devkral commented May 6, 2024

cp2587 commented May 6, 2024

devkral commented May 6, 2024 •

edited

Loading

serramatutu commented Jun 25, 2024 •

edited

Loading

devkral commented Jun 25, 2024

serramatutu commented Jun 25, 2024 •

edited

Loading

erikwrede commented Jun 25, 2024

devkral commented Jun 25, 2024 •

edited

Loading

patrick91 commented Jul 3, 2024

devkral commented Jul 10, 2024

Support for query complexity #2988

Support for query complexity #2988

Comments

patrick91 commented Jul 28, 2023 • edited by polar-sh bot Loading

Upvote & Fund

devkral commented Aug 7, 2023 • edited Loading

devkral commented Aug 7, 2023

patrick91 commented Aug 8, 2023

devkral commented Aug 9, 2023 • edited Loading

bradleyoesch commented Aug 29, 2023

devkral commented Aug 30, 2023

zulrang commented Dec 12, 2023

cp2587 commented May 6, 2024

devkral commented May 6, 2024

cp2587 commented May 6, 2024

devkral commented May 6, 2024 • edited Loading

serramatutu commented Jun 25, 2024 • edited Loading

devkral commented Jun 25, 2024

serramatutu commented Jun 25, 2024 • edited Loading

erikwrede commented Jun 25, 2024

devkral commented Jun 25, 2024 • edited Loading

patrick91 commented Jul 3, 2024

devkral commented Jul 10, 2024

patrick91 commented Jul 28, 2023 •

edited by polar-sh bot

Loading

devkral commented Aug 7, 2023 •

edited

Loading

devkral commented Aug 9, 2023 •

edited

Loading

devkral commented May 6, 2024 •

edited

Loading

serramatutu commented Jun 25, 2024 •

edited

Loading

serramatutu commented Jun 25, 2024 •

edited

Loading

devkral commented Jun 25, 2024 •

edited

Loading