Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NewClient functions behaviour is incompatible with secure forward-proxies #7556

Open
puneet-traceable opened this issue Aug 23, 2024 · 18 comments
Assignees
Labels
Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. Type: Bug

Comments

@puneet-traceable
Copy link

puneet-traceable commented Aug 23, 2024

What version of gRPC are you using?

1.64.0 and v1.67.0-dev

What version of Go are you using (go version)?

1.22

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

If possible, provide a recipe for reproducing the error.

  1. setup a squid proxy
  2. setup grpc client(examples/helloworld/greeter_client) and grpc server(examples/helloworld/greeter_server) to use tls
  3. run grpc server
  4. on client shell set env var for using proxy(export https_proxy="http://<proxy_host:port")
  5. start tcpdump

What did you expect to see?

the target should be hostname while it's sent to proxy and dns resolution for target should happen on proxy

What did you see instead?

dns is resolved on the client and only ip is sent.
Attaching tcpdump screenshot with difference
Screenshot 2024-08-23 at 7 50 34 PM

tcpdump for curl
Screenshot 2024-08-23 at 9 48 22 PM

@dfawley
Copy link
Member

dfawley commented Aug 23, 2024

If you need a short-term workaround to keep things working as they were before, you should be able to use the passthrough resolver:

client, err := grpc.NewClient("passthrough:///<hostname>:<port>", ...)

That probably should not be required, though.

@puneet-traceable
Copy link
Author

@dfawley I get that this would work in most of the cases.
But there are more issues with this change.
For example: if dns lookups are disabled on app instances and it's only the proxy that can resolve the dns, current grpc client built with NewClient does not work. curl for http2 does not face this problem.

@dfawley
Copy link
Member

dfawley commented Aug 27, 2024

But there are more issues with this change.

Are you saying the workaround of using passthrough:///<hostname>:<port> as the target string isn't working for you in these situations? If so, can you provide some more information and debugging logs of that?

@puneet-traceable
Copy link
Author

target

No, I didn't mean passthrough won't work. But I think NewClient api should have greater flexibility to be able to configure dns versus passthrough. Since we use opentelemetry, the real grpc-go integration is way down the stack and it's hard to configure the url this way as the same url gets used at multiple places.

@dfawley
Copy link
Member

dfawley commented Aug 28, 2024

Thanks for confirming. Yes, this should be treated as a bug and the workaround was not suggested to avoid fixing it.

@dfawley
Copy link
Member

dfawley commented Aug 30, 2024

For reference, this gRFC comes into play here: https://github.com/grpc/proposal/blob/master/A1-http-connect-proxy-support.md

But note that Java did not implement this gRFC, and we may or may not want to do things this way.

@ejona86
Copy link
Member

ejona86 commented Aug 31, 2024

Java added support for "Use Case 1". But it did not use the gRFC's design. grpc/grpc-java#10022 tracks implementing Use Case 2 in Java.

puneet-traceable added a commit to hypertrace/opentelemetry-collector that referenced this issue Sep 17, 2024
With NewClient API usage, we are facing issues at few customers
who have intermediate proxies between collector and platform.
With NewClient API instead DialContext, DNS resolution happens
on the client side while it should happen on proxy.
Also, with SGProxy client does not get the correct certificate.
can be changed once grpc fixes grpc/grpc-go#7556
and otel collector picks the fix
puneet-traceable added a commit to hypertrace/opentelemetry-collector that referenced this issue Sep 17, 2024
With NewClient API usage, we are facing issues at few customers
who have intermediate proxies between collector and platform.
With NewClient API instead DialContext, DNS resolution happens
on the client side while it should happen on proxy.
Also, with SGProxy client does not get the correct certificate.
Passthrough scheme was the prior default and prevents resolution to
happen beforehand. This change can be removed once grpc fixes grpc/grpc-go#7556
and otel collector picks the fix
puneet-traceable added a commit to hypertrace/opentelemetry-collector that referenced this issue Sep 18, 2024
With NewClient API usage, we are facing issues at few customers
who have intermediate proxies between collector and platform.
With NewClient API instead DialContext, DNS resolution happens
on the client side while it should happen on proxy.
Also, with SGProxy client does not get the correct certificate.
can be changed once grpc fixes grpc/grpc-go#7556
and otel collector picks the fix
puneet-traceable added a commit to hypertrace/opentelemetry-collector that referenced this issue Sep 18, 2024
With NewClient API usage, we are facing issues at
few customers who have intermediate proxies between collector
and platform. With NewClient API instead Dial,
DNS resolution happens on the client side while it should happen on proxy.
Also, with SGProxy client does not get the correct certificate.
This can be changed once grpc fixes grpc/grpc-go#7556 and otel collector picks the fix
puneet-traceable added a commit to hypertrace/opentelemetry-collector that referenced this issue Sep 18, 2024
With NewClient API usage, we are facing issues at
few customers who have intermediate proxies between collector
and platform. With NewClient API instead Dial,
DNS resolution happens on the client side while it should happen on proxy.
Also, with SGProxy client does not get the correct certificate.
This can be changed once grpc fixes grpc/grpc-go#7556 and otel collector picks the fix
puneet-traceable added a commit to hypertrace/opentelemetry-collector that referenced this issue Sep 18, 2024
With NewClient API usage, we are facing issues at
few customers who have intermediate proxies between collector
and platform. With NewClient API instead Dial,
DNS resolution happens on the client side while it should happen on proxy.
Also, with SGProxy client does not get the correct certificate.
This can be changed once grpc fixes grpc/grpc-go#7556 and otel collector picks the fix
@arjan-bal
Copy link
Contributor

Keeping this issue open to track the fix.

@arjan-bal arjan-bal reopened this Sep 19, 2024
@purnesh42H purnesh42H added the Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. label Sep 23, 2024
tim-mwangi pushed a commit to hypertrace/opentelemetry-collector that referenced this issue Oct 3, 2024
With NewClient API usage, we are facing issues at
few customers who have intermediate proxies between collector
and platform. With NewClient API instead Dial,
DNS resolution happens on the client side while it should happen on proxy.
Also, with SGProxy client does not get the correct certificate.
This can be changed once grpc fixes grpc/grpc-go#7556 and otel collector picks the fix
@guyni
Copy link

guyni commented Oct 17, 2024

We also ran into this issue. It's very easy to reproduce. Just use https://github.com/grpc/grpc-go/tree/master/examples/helloworld/greeter_client and run nc to mock the proxy:

  1. in one terminal, run "nc -l 8888";
  2. in another terminal, run "HTTPS_PROXY=http://localhost:8888 go run main.go -addr=example.com:50051"

If using 1.58.x, you will see the nc outputs:
% nc -l 8888
CONNECT example.com:50051 HTTP/1.1
Host: example.com:50051
User-Agent: grpc-go/1.58.4-dev

If using 1.66.x, you will see:
% nc -l 8888
CONNECT 93.184.215.14:50051 HTTP/1.1
Host: 93.184.215.14:50051
User-Agent: grpc-go/1.66.4-dev

@guyni
Copy link

guyni commented Oct 17, 2024

https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_client.go#L181
Shouldn't it call proxyDial(ctx, addr.ServerName, grpcUA) instead of proxyDial(ctx, address, grpcUA) here?

@eshitachandwani
Copy link
Member

master/internal/transport/http2_client.go#L181 Shouldn't it call proxyDial(ctx, addr.ServerName, grpcUA) instead of proxyDial(ctx, address, grpcUA) here?

Hey @guyni , I think the using address is correct because as metioned in gRFC A1 case 2 , we need to send the server address (specifically the resolved server address) in the HTTP_CONNECT request. The property ServerName is used for TLS certificate validation and might be empty most of the time unless specifically set while using custom CA.

@puneet-traceable
Copy link
Author

@eshitachandwani If you look at the usecase no. 1 here, it mentions that the external address should be resolved at the proxy. That means the hostname should be supplied in CONNECT call.

@eshitachandwani
Copy link
Member

eshitachandwani commented Oct 18, 2024

Hi @puneet-traceable, currently, gRPC-Go supports only use case 2, and I’m actively working on adding support for use case 1. Apologies for not mentioning this in my previous comment.

@eshitachandwani
Copy link
Member

We also ran into this issue. It's very easy to reproduce. Just use master/examples/helloworld/greeter_client and run nc to mock the proxy:

  1. in one terminal, run "nc -l 8888";
  2. in another terminal, run "HTTPS_PROXY=http://localhost:8888 go run main.go -addr=example.com:50051"

If using 1.58.x, you will see the nc outputs: % nc -l 8888 CONNECT example.com:50051 HTTP/1.1 Host: example.com:50051 User-Agent: grpc-go/1.58.4-dev

If using 1.66.x, you will see: % nc -l 8888 CONNECT 93.184.215.14:50051 HTTP/1.1 Host: 93.184.215.14:50051 User-Agent: grpc-go/1.66.4-dev

This happens because in v1.58.x we were using grpc.Dial and we used the passthrough scheme as default name resolving scheme, and so if there is no scheme, it did not get resolved on client and proxy received the unresolved name. But now, we have changed to use grpc.NewClient() with default name resolving scheme as dns and so the proxy gets the resolved name by default if no scheme is specified. As mentioned earlier, we are actively trying to resolve this.

tim-mwangi pushed a commit to hypertrace/opentelemetry-collector that referenced this issue Oct 21, 2024
With NewClient API usage, we are facing issues at
few customers who have intermediate proxies between collector
and platform. With NewClient API instead Dial,
DNS resolution happens on the client side while it should happen on proxy.
Also, with SGProxy client does not get the correct certificate.
This can be changed once grpc fixes grpc/grpc-go#7556 and otel collector picks the fix
@erezrokah
Copy link

As mentioned earlier, we are actively trying to resolve this.

Thanks @eshitachandwani for handing this issue. Didn't find if it was already mentioned but this issue impacts consumers of the GCP Go SDK (https://github.com/googleapis/google-cloud-go), can we make sure any solution applies for those consumers as well (e.g. if the solution is by adding a new option, that consumers of the GCP Go SDK can enable the option)?

@easwars
Copy link
Contributor

easwars commented Nov 5, 2024

@erezrokah : Could you please elaborate on your last comment?

How does the GCP Go SDK currently use the proxy feature?

Are they currently affected by using grpc.NewClient? Or do you think they will be affected if they make the switch?

Thanks.

@erezrokah
Copy link

erezrokah commented Nov 6, 2024

Are they currently affected by using grpc.NewClient?

They are currently affected due to this change googleapis/google-cloud-go@be2d56d#diff-215847f913454f2311866edbdbf41d7a3cd3879e0ed0e7aa26f4ad771dd6a1dcR296.

We're experiencing the same issue as this one only when using the Google Cloud Go SDK.

I think https://github.com/grpc/grpc-go/blob/master/Documentation/anti-patterns.md#the-wrong-way-grpcdial should say both Dial and DialContext use passthrough and switching to NewClient as is should be considered a breaking change, WDYT?

Edit

See googleapis/google-cloud-go#11089

@arjan-bal
Copy link
Contributor

I can think of a workaround which doesn't involve a code change in the googleapis client library:

import (
	"google.golang.org/grpc/resolver"
	"google.golang.org/grpc"
        "google.golang.org/api/option"
)

// A resolver that registers the passthrough resolver under the "dns" scheme
// to indirectly set the default resolver used by NewClient as the passthrough
// resolver.
// Note: This will shadow the dns resolver.
type customPassthroughResolver struct{}

func (r *customPassthroughResolver) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
	passthrough := resolver.Get("passthrough")
	if passthrough == nil {
		panic("Passthrough resolver not registered!")
	}
	return passthrough.Build(target, cc, opts)
}

func (r *customPassthroughResolver) Scheme() string {
	return "dns"
}

// Use a grpc client option to avoid modifying the global grpc resolver registry.
clientOption := option.WithGRPCDialOption(grpc.WithResolvers(&customPassthroughResolver{}))
// Use the client option while creating the googleapis client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. Type: Bug
Projects
None yet
9 participants