Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy: refactor opaq stack to use client-policy #3306

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

zaharidichev
Copy link
Member

This work (based on the early sketch seen here) changes the opaq routing stack to use not only profiles routing but to consider client policy responses as well. The way this is done is that we use the opaq:Routes type to express the routing information our stack contains. This type is geared towards the client-policy API. Whenever we need to use the profiles response, we simply translate its contents to a client-policy one.

Notable changes in this PR are:

  • we introduce an OpaqSidecar type that holds opaq::Routes objects which are routes that have been constructed from either profiles or policy response
  • during the construction of the OpaqSidecar we consider profiles responses only if there are extra targets in the response (in order to support TrafficSplit functionality). In any other case we prefer the policy response.

Signed-off-by: Zahari Dichev [email protected]

@@ -45,6 +45,7 @@ linkerd-proxy-client-policy = { path = "../../proxy/client-policy", features = [
] }
linkerd-retry = { path = "../../retry" }
linkerd-tls-route = { path = "../../tls/route" }
linkerd-opaq-route = { path = "../../opaq-route" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move above linkerd-proxy-client-policy (sorting)

@@ -0,0 +1,49 @@
//! An TLS route matching library for Linkerd to support the TLSRoute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! An TLS route matching library for Linkerd to support the TLSRoute
//! A TCP route matching library for Linkerd to support the TCPRoute


/// Matches TCP sessions. For now, this is a placeholder
#[derive(Clone, Debug, Default, Hash, PartialEq, Eq)]
pub struct MatchSession(());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is coming from the TLSRoute type--do we expect TCP routes to use "session" terminology?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we think of a better name. MatchTCPConnection ? Or try to get rid of this whole machinery here. And yes, this is coming from the TLSRoute type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is that we remove the parts of this that aren't really coherent for opaque routing -- since all of our policy is already hung off of an ip:port, there's nothing else worth matching on; and this means that we can only ever expect to support a single route per policy.

My branch reverts to using an Option and denying policies that return more than one route. We may need to update the control plane to only return the first route in its list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense indeed. Will merge and update control plane to return only one route always.

linkerd/app/outbound/src/opaq/concrete.rs Outdated Show resolved Hide resolved
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
@olix0r olix0r self-assigned this Nov 4, 2024
Comment on lines 207 to 209
// TODO(eliza): eventually, can we configure the opaque
// policy to fail conns?
policy: None,
routes: std::sync::Arc::new([]),
Copy link
Member

@olix0r olix0r Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this TODO now, right?

Comment on lines 176 to 179
// TODO(eliza): eventually, can we configure the opaque
// policy to fail conns?
policy: None,
routes: std::sync::Arc::new([]),
},
Copy link
Member

@olix0r olix0r Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this TODO is addressed.

Comment on lines 68 to 74
.push_on_service(svc::layer::mk(move |concrete: N| {
svc::stack(concrete.clone())
.push(router::Router::layer())
.push(svc::NewMapErr::layer_from_target::<LogicalError, _>())
.arc_new_clone_tcp()
.into_inner()
}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simplify this:

Suggested change
.push_on_service(svc::layer::mk(move |concrete: N| {
svc::stack(concrete.clone())
.push(router::Router::layer())
.push(svc::NewMapErr::layer_from_target::<LogicalError, _>())
.arc_new_clone_tcp()
.into_inner()
}))
.push_on_service(router::Router::layer())
.push_on_service(svc::NewMapErr::layer_from_target::<LogicalError, _>())
// Rebuild the inner router stack every time the watch changes.
.push(svc::NewSpawnWatch::<Routes, _>::layer_into::<
router::Router<T>,
>())
.arc_new_clone_tcp()

Comment on lines +102 to 118
// we are trying to preserve the original metrics format here.
// for that purpose we utilize the logical address that will be
// present only if we have used profiles for routing. This means:
//
// 1. If we are using policy logical and concrete will be the same
// 2. If we are using profiles we are doing so because the profiles
// contains traffic split data. In this case the logical and
// concrete are potentially different based on the backend choice.
let logical = match bal.parent.param() {
Some(profiles::LogicalAddr(addr)) => addr.to_string(),
None => bal.addr.to_string(),
};

self.metrics(&ConcreteLabels {
logical: bal.logical.to_string().into(),
concrete: bal.concrete.to_string().into(),
logical: logical.into(),
concrete: bal.addr.to_string().into(),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started a branch but haven't got it sensible yet, but we also need to consider the case where a balancer is provided by policy -- in that case we want to include parent and backend references (as we do in HTTP). The logical and concrete values cannot be removed due to backwards compatibility, but we should provide reference-oriented labels in the policy case to be consistent.

Copy link
Member Author

@zaharidichev zaharidichev Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ver This makes total sense. I have put together a branch that does that. So in my proposal, for the two cases we balance we get the following results. I do not see a point to special casing profiles. We can still do our best to populate refs data based on the logical addrs that we get from the profiles api

The case when we use the policy response when we target a Service

outbound_tcp_balancer_queue_requests_total{
    parent_group="core"
    parent_kind="Service"
    parent_namespace="mixed-env"
    parent_name="legacy-app-cluster"
    parent_port="80"
    parent_section_name=""
    backend_group="core"
    backend_kind="Service"
    backend_namespace="mixed-env"
    backend_name="legacy-app-cluster"
    backend_port="80"
    backend_section_name=""
    logical="legacy-app-cluster.mixed-env.svc.cluster.local:80"
    concrete="legacy-app-cluster.mixed-env.svc.cluster.local:80"
} 1

The case when we use the profiles response because a TrafficSplit is configured

outbound_tcp_balancer_queue_requests_total{
    parent_group="core"
    parent_kind="Service"
    parent_namespace="trafficsplit-sample"
    parent_name="backend-svc"
    parent_port="8080"
    parent_section_name=""
    backend_group="core"
    backend_kind="Service"
    backend_namespace="trafficsplit-sample"
    backend_name="failing-svc"
    backend_port="8080"
    backend_section_name=""
    logical="backend-svc.trafficsplit-sample.svc.cluster.local:8080"
    concrete="failing-svc.trafficsplit-sample.svc.cluster.local:8080"
} 2

olix0r and others added 2 commits November 5, 2024 09:34
Because the opaq route mocks its selection logic, the first route returned by
the API will always be used. This means that all subsequent routes are
incoherent and cannot be selected.

There's no value in preserving incoherent states in the proxy, so it's
preferable to initially require that the control plane only returns one opaq
route.

This change removes some of the unused matching types and simplifies the opaq
router to only handle one route.
Signed-off-by: Zahari Dichev <[email protected]>
@zaharidichev zaharidichev changed the title policy: refactor opaq stack to use use client-policy policy: refactor opaq stack to use client-policy Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants