-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy: refactor opaq stack to use client-policy #3306
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zahari Dichev <[email protected]>
linkerd/app/outbound/Cargo.toml
Outdated
@@ -45,6 +45,7 @@ linkerd-proxy-client-policy = { path = "../../proxy/client-policy", features = [ | |||
] } | |||
linkerd-retry = { path = "../../retry" } | |||
linkerd-tls-route = { path = "../../tls/route" } | |||
linkerd-opaq-route = { path = "../../opaq-route" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move above linkerd-proxy-client-policy (sorting)
linkerd/opaq-route/src/lib.rs
Outdated
@@ -0,0 +1,49 @@ | |||
//! An TLS route matching library for Linkerd to support the TLSRoute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! An TLS route matching library for Linkerd to support the TLSRoute | |
//! A TCP route matching library for Linkerd to support the TCPRoute |
linkerd/opaq-route/src/match.rs
Outdated
|
||
/// Matches TCP sessions. For now, this is a placeholder | ||
#[derive(Clone, Debug, Default, Hash, PartialEq, Eq)] | ||
pub struct MatchSession(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is coming from the TLSRoute type--do we expect TCP routes to use "session" terminology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we think of a better name. MatchTCPConnection
? Or try to get rid of this whole machinery here. And yes, this is coming from the TLSRoute type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is that we remove the parts of this that aren't really coherent for opaque routing -- since all of our policy is already hung off of an ip:port, there's nothing else worth matching on; and this means that we can only ever expect to support a single route per policy.
My branch reverts to using an Option and denying policies that return more than one route. We may need to update the control plane to only return the first route in its list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense indeed. Will merge and update control plane to return only one route always.
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
// TODO(eliza): eventually, can we configure the opaque | ||
// policy to fail conns? | ||
policy: None, | ||
routes: std::sync::Arc::new([]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this TODO now, right?
// TODO(eliza): eventually, can we configure the opaque | ||
// policy to fail conns? | ||
policy: None, | ||
routes: std::sync::Arc::new([]), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this TODO is addressed.
.push_on_service(svc::layer::mk(move |concrete: N| { | ||
svc::stack(concrete.clone()) | ||
.push(router::Router::layer()) | ||
.push(svc::NewMapErr::layer_from_target::<LogicalError, _>()) | ||
.arc_new_clone_tcp() | ||
.into_inner() | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can simplify this:
.push_on_service(svc::layer::mk(move |concrete: N| { | |
svc::stack(concrete.clone()) | |
.push(router::Router::layer()) | |
.push(svc::NewMapErr::layer_from_target::<LogicalError, _>()) | |
.arc_new_clone_tcp() | |
.into_inner() | |
})) | |
.push_on_service(router::Router::layer()) | |
.push_on_service(svc::NewMapErr::layer_from_target::<LogicalError, _>()) | |
// Rebuild the inner router stack every time the watch changes. | |
.push(svc::NewSpawnWatch::<Routes, _>::layer_into::< | |
router::Router<T>, | |
>()) | |
.arc_new_clone_tcp() |
// we are trying to preserve the original metrics format here. | ||
// for that purpose we utilize the logical address that will be | ||
// present only if we have used profiles for routing. This means: | ||
// | ||
// 1. If we are using policy logical and concrete will be the same | ||
// 2. If we are using profiles we are doing so because the profiles | ||
// contains traffic split data. In this case the logical and | ||
// concrete are potentially different based on the backend choice. | ||
let logical = match bal.parent.param() { | ||
Some(profiles::LogicalAddr(addr)) => addr.to_string(), | ||
None => bal.addr.to_string(), | ||
}; | ||
|
||
self.metrics(&ConcreteLabels { | ||
logical: bal.logical.to_string().into(), | ||
concrete: bal.concrete.to_string().into(), | ||
logical: logical.into(), | ||
concrete: bal.addr.to_string().into(), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started a branch but haven't got it sensible yet, but we also need to consider the case where a balancer is provided by policy -- in that case we want to include parent and backend references (as we do in HTTP). The logical and concrete values cannot be removed due to backwards compatibility, but we should provide reference-oriented labels in the policy case to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ver This makes total sense. I have put together a branch that does that. So in my proposal, for the two cases we balance we get the following results. I do not see a point to special casing profiles. We can still do our best to populate refs data based on the logical addrs that we get from the profiles api
The case when we use the policy response when we target a Service
outbound_tcp_balancer_queue_requests_total{
parent_group="core"
parent_kind="Service"
parent_namespace="mixed-env"
parent_name="legacy-app-cluster"
parent_port="80"
parent_section_name=""
backend_group="core"
backend_kind="Service"
backend_namespace="mixed-env"
backend_name="legacy-app-cluster"
backend_port="80"
backend_section_name=""
logical="legacy-app-cluster.mixed-env.svc.cluster.local:80"
concrete="legacy-app-cluster.mixed-env.svc.cluster.local:80"
} 1
The case when we use the profiles response because a TrafficSplit
is configured
outbound_tcp_balancer_queue_requests_total{
parent_group="core"
parent_kind="Service"
parent_namespace="trafficsplit-sample"
parent_name="backend-svc"
parent_port="8080"
parent_section_name=""
backend_group="core"
backend_kind="Service"
backend_namespace="trafficsplit-sample"
backend_name="failing-svc"
backend_port="8080"
backend_section_name=""
logical="backend-svc.trafficsplit-sample.svc.cluster.local:8080"
concrete="failing-svc.trafficsplit-sample.svc.cluster.local:8080"
} 2
Because the opaq route mocks its selection logic, the first route returned by the API will always be used. This means that all subsequent routes are incoherent and cannot be selected. There's no value in preserving incoherent states in the proxy, so it's preferable to initially require that the control plane only returns one opaq route. This change removes some of the unused matching types and simplifies the opaq router to only handle one route.
Signed-off-by: Zahari Dichev <[email protected]>
This work (based on the early sketch seen here) changes the
opaq
routing stack to use not only profiles routing but to consider client policy responses as well. The way this is done is that we use theopaq:Routes
type to express the routing information our stack contains. This type is geared towards the client-policy API. Whenever we need to use the profiles response, we simply translate its contents to a client-policy one.Notable changes in this PR are:
OpaqSidecar
type that holdsopaq::Routes
objects which are routes that have been constructed from eitherprofiles
orpolicy
responseOpaqSidecar
we consider profiles responses only if there are extra targets in the response (in order to supportTrafficSplit
functionality). In any other case we prefer the policy response.Signed-off-by: Zahari Dichev [email protected]