GRPC (with TLS) receiving Plain Text response

I'm trying to configure traefik 2 with GRPC but drawing a blank.

The setup here is that I'm running a Thanos Querier on Cluster A which accepts a store record configured as follows:

--store=dns+thanos.domain.com:443

The querier then makes grpc requests to that domain. On Cluster B, which is thanos.domain.com I have traefik configured with the following route:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: thanos-grpc
  namespace: monitoring
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`thanos.domain.com`)
      kind: Rule
      services:
        - name: prometheus-k8s
          port: 10901
  tls:
    certResolver: default
    options: {}

Checking the logs for the traefik pod suggests this service is configured properly, for example:

level=debug msg="Configuration received from provider kubernetescrd: {\"http\":{\"routers\":{\"monitoring-thanos-grpc-56cfaafbc01c26ca0ffd\":{\"entryPoints\":[\"websecure\"],\"service\":\"monitoring-thanos-grpc-56cfaafbc01c26ca0ffd\",\"rule\":\"Host(`thanos.domain.com`)\",\"tls\":{\"certResolver\":\"default\"}}},\"services\":{\"monitoring-thanos-grpc-56cfaafbc01c26ca0ffd\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.36.4.39:10901\"},{\"url\":\"http://10.36.5.69:10901\"}],\"passHostHeader\":true}}}},\"tcp\":{},\"tls\":{}}" providerName=kubernetescrd

However, switching back to Cluster A and checking the calling service yields the follwoing error:

initial store client info fetch: rpc error: code = Internal desc = transport: received the unexpected content-type \"text/plain; charset=utf-8\"" address={IP_OF_SUBDOMAIN}:443

Obviously that is an external log, but from that the important part is:

received the unexpected content-type \"text/plain; charset=utf-8\""

Which suggests that the backend is uncoinfigured but I can't figure out where. I don't have any error logs that help with this on the cotnainers withing the prometheus-k8s service.

Any pointers would be very welcome.

Never really worked with gRPC, can you use a test client or something like that to test that gRPC connection? As it stands it is not clear which part went wrong, so isolating that would be where I'd start.

Here is a thread with sample code - https://github.com/containous/traefik/issues/4421

Here are few more related issues that were swept under the rug:

I am having the exact same issue as @andrew-waters and my setup is almost the same, the only difference being the TLS cert is referenced by a Secret pre generated by cert-manager. Were you able to get to the bottom of this?

Yep @SayakMukhopadhyay I got it working in the end. I don't remember the exact change I made given the time frame but I have a working configuration, Can you share your set up and I'll try and chime in

So, @andrew-waters I have 2 Thanos queriers (A and B0 setup in my monitoring cluster, A is connecting to the local stores and the B querier whereas the B querier is should be connecting to the external stores, which are the sidecars for now. The reason why I am doing 2 queriers is to handle mtls only for external stores.

So, the querier B is configured as such

      containers:
      - args:
        - query
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --query.timeout=5m
        - --store=dns+prometheus.mydomain.com:443

Then I have Traefik setup in the other cluster and I have created an IngressRoute as such

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: thanos-sidecar
  namespace: monitoring
spec:
  entryPoints:
  - websecure
  routes:
  - kind: Rule
    match: Host(`prometheus.mydomain.com`)
    services:
    - kind: Service
      name: prometheus-k8s
      port: 10901
  tls:
    secretName: prometheus.mydomain.com-tls

As you will notice that this is the very similar to your configuration in the OP and thus I am getting the same error too.

I have tried exposing the sidecar with a LoadBalancer service and it works but can't make it work via Traefik.

The obvious place I'd start looking is the entrypoint. Websecure will be 443 where you want 10901.

For your external querier, use:

--store=prometheus.mydomain.com:10901

instead of --store=dns+prometheus.mydomain.com:443

Then check the Service and Deployment on prometheus.mydomain.com server and expose port 10901 on the Service (name = traefik):

spec:
  ports:
    - name: web
      port: 80
      protocol: TCP
    - name: websecure
      port: 443
      protocol: TCP
    - name: thanos
      port: 10901
      protocol: TCP

note - you may have other ports open - merge them instead of replacing. Then in your Deployment (name = traefik) ensure spec.template.spec.containers[0].ports also allows 10901:

            - containerPort: 10901
              name: thanos

And if you're configuring the deployment via the container args add --entrypoints.thanos.Address=:10901 to spec.template.spec.containers[0].args. If you use a different config method, you'll need to use the format for that.

1 Like

@andrew-waters so, you are saying that I need to create another entrypoint? I am yet to try this but why can't I use an existing entrypoint?

That's how I configured it and has been running in prod for > 1 year without issue.

One thing you will want to do that I missed looking at your Ingress Route is that you should change it to kind: IngressRouteTCP because thanos is using gRPC. Try that first.

Will it be possible to share the IngressRouteTCP. I think there are some differences to the http router like using HostSNI as a rule. Cause after enabling the TCP router, I am able to get a valid response from gprcurl. So, this is definitely the right direction.

Yes there are. This is what we run:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: thanos-grpc
  namespace: monitoring
spec:
  entryPoints:
    - thanos
    - websecure
  routes:
    - match: HostSNI(`CLUSTER_NAME.our-domain.com`)
      kind: Rule
      services:
        - name: prometheus-k8s
          port: 10901
  tls:
    certResolver: default
    options: {}

I have been trying a few things and this is the result

I have created an IngressRoute

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: thanos-sidecar
  namespace: monitoring
spec:
  entryPoints:
  - websecure
  routes:
  - kind: Rule
    match: Host(`my.domain1.com`)
    services:
    - kind: Service
      name: prometheus-k8s
      port: 10901
      scheme: h2c
  tls:
    secretName: my.domain1.com-tls

and I have also created an IngressRouteTCP

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: thanos-sidecar
  namespace: monitoring
spec:
  entryPoints:
  - thanos
  - websecure
  routes:
  - kind: Rule
    match: HostSNI(`my.domain2.com`)
    services:
    - kind: Service
      name: prometheus-k8s
      port: 10901
  tls:
    secretName: my.domain2.com-tls

From the querier side, I have tried setting the store arguments in many ways

  1. - --store=my.domain2.com:10901
  2. - --store=my.domain2.com:443
  3. - --store=my.domain1.com:10901

All of them are giving the following log message

level=warn ts=2020-10-01T18:46:22.01068219Z caller=storeset.go:487 component=storeset msg="update of store node failed" err="getting metadata: fetching store info   from my.domain2.com:10901: rpc error: code = Unimplemented desc = Not Found: HTTP status code 404; transport: received the unexpected content-type \"text/plain; charset=utf-8\"" address=my.domain2.com:10901

But I tried hitting the endpoint using grpcurl and all 3 are giving me valid responses. This is the grpcurl url I am trying
grpcurl -import-path="${GOGOPROTO_ROOT}" -import-path=. -proto=store/storepb/rpc.proto -v my.domain2.com:10901 thanos.Store/Info

And it returns the Info json. So, the IngressRoute and IngressRouteTCP are both working from grpcurl but not from my querier. Are there any flags or anything I need to turn on the querier side?

I got it! I just needed to use the --grpc-client-tls-secure flag on the querier. To conclude, one can also use a IngressRoute but need to ensure that the target Service uses schema: h2c

1 Like