Internal Server Error accessing container though fine when connecting through port

I'm trying to upgrade my traefik config from v1 to v2. I have gotten everything working expect gitlab omnibus. traefik recognizes the route but when I connect through traefik, I get an internal server error while connecting directly to the docker port the site loads just fine.

I assume it is related to my tls config as all other non https containers are fine.

Anyone notice what I am missing?

Traefik compose file:

networks:
  default:
    external:
      name: gateway
services:
  traefik:
    image: 'traefik:latest'
    container_name: traefik
    restart: always
    command: 
      --api.insecure=true 
      --providers.docker 
      --entryPoints.http.address=:80 
      --entryPoints.https.address=:443 
      --certificatesresolvers.letsencrypt.acme.tlschallenge=true
      --certificatesresolvers.letsencrypt.acme.email=email@gmail.com
      --certificatesresolvers.letsencrypt.acme.storage=/etc/traefik/acme.json
      --log
      --log.filepath=/etc/traefik/traefik.log
      --log.level="DEBUG"
      --accesslog
      --accesslog.filepath=/etc/traefik/access.log
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - '/var/run/docker.sock:/var/run/docker.sock:ro'
      - '${DWD}/traefik:/etc/traefik/'
    labels:
      traefik.enable: true
      traefik.http.routers.traefik.rule: Host(`traefik.domain`)
      traefik.http.routers.traefik.entrypoints: http
      traefik.http.services.traefik.loadbalancer.server.port: 8080

gitlab compose file:

version: '3.6'
networks:
  default:
    external:
      name: gateway # docker network create gateway
services:
  gitlab:
    image: gitlab/gitlab-ce:latest
    container_name: gitlab
    restart: always
    hostname: git.domain.com # change to access domain
    environment:
      GITLAB_OMNIBUS_CONFIG: |
        external_url 'git.domain'
        gitlab_rails['gitlab_shell_ssh_port'] = 2200
    ports:
      - '2200:22'
      - '8088:443'
    volumes:
      - ${DWD}/gitlab/config:/etc/gitlab
      - ${DWD}/gitlab/logs:/var/log/gitlab
      - ${DWD}/gitlab/data:/var/opt/gitlab
    labels:
      traefik.enable: true
      traefik.http.routers.gitlab.rule: Host(`git.domain`)
      traefik.http.routers.gitlab.entrypoints: https
      traefik.http.routers.gitlab.tls: true
      traefik.http.routers.gitlab.tls.certresolver: letsencrypt
      traefik.http.routers.gitlab.tls.domains[0].main: domain
      traefik.http.services.gitlab.loadbalancer.server.scheme: https
      traefik.http.services.gitlab.loadbalancer.server.port: 443

Why are you mounting /etc/traefik/ ? If there is a configuration file in ${DWD}/traefik then you are mixing static configuration which you should not do.

there is no config file in the directory. I mapped it strictly to make access to the log files easier. I will map them directly but don't think it will make a difference since no config file is present.

I changed the volumes and it had no change in result

      - '/var/run/docker.sock:/var/run/docker.sock:ro'
      - '${DWD}/traefik/traefik.log:/etc/traefik/traefik.log'
      - '${DWD}/traefik/access.log:/etc/traefik/access.log'

For some reason, when going through traefik v2, traefik is producing the error since I can get to it directly to the container without issue.

If I setup the gitlab container to use port 80, non tls it works fine so it is definitely related to the ssl cert.

Do you know if the error is produced by traefik or by the target server? What are in the logs?

Some software does not like being behind proxy and require special setup. I understand that you had it working with v1 but apparently there is something different in the way the request travels between hops that breaks it. We need to identify what it is and fix it.

I'd start with finding out if the target server receives request at all or if the error returned by traefik.

Since you calling gitlab via HTTPS, maybe you have a certificate that cannot be verified.

It's definitely coming from Traefik as I can access the application via the container directly.

It also works fine when TLS is not indicated in the labels so I think it is related to that.

Are my settings setup correctly for TLS. I do see in the logs that the cert cannot be generated.

time="2019-09-26T13:00:50Z" level=debug msg="Domains [\"git.domain\"] need ACME certificates generation for domains \"git.domain\"." providerName=letsencrypt.acme
time="2019-09-26T13:00:50Z" level=debug msg="Loading ACME certificates [git.domain]..." providerName=letsencrypt.acme
time="2019-09-26T13:00:50Z" level=debug msg="legolog: [INFO] nonce error retry: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: \"0001kGOwh9SC-OrT-uVUYU107cdGw3p5RHn-rfeRHFvum2Y\", url: "
time="2019-09-26T13:00:51Z" level=debug msg="legolog: [INFO] [git.domain] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/512137492"
time="2019-09-26T13:01:03Z" level=debug msg="legolog: [INFO] Unable to deactivate the authorization: https://acme-v02.api.letsencrypt.org/acme/authz-v3/512137492"
time="2019-09-26T13:01:03Z" level=error msg="Unable to obtain ACME certificate for domains \"git.domain\" : unable to generate a certificate for the domains [git.domain]: acme: Error -> One or more domains had a problem:\n[git.domain] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during connect (likely firewall problem), url: \n" providerName=letsencrypt.acme
time="2019-09-26T13:03:24Z" level=debug msg="Provider event received {Status:die ID:e633164836c1648bc91212e8c85391f619536d689cb5e423ce3c96aedbed0c1b From:gitlab/gitlab-ce:latest Type:container Action:die Actor:{ID:e633164836c1648bc91212e8c85391f619536d689cb5e423ce3c96aedbed0c1b Attributes:map[com.docker.compose.config-hash:b6531a6ae6afd878e0441104e29b3fdd5dd2dc01b0935623af341ff3f43452c8 com.docker.compose.container-number:1 com.docker.compose.oneoff:False com.docker.compose.project:gitlab com.docker.compose.service:gitlab com.docker.compose.version:1.22.0 exitCode:137 image:gitlab/

The fact that you can access the application directly does not have to mean that the error comes from traefik - just for future reference. A proxy, such as traefik can affect what headers a passed, and I've seen the cases when misconfigured, that the server would reply okay without traefik, but would retrun server error when request comes through traefik, because of the changes to the request caused by traefik (mis) configuration.

Your logs shows errors in generating the certificate, but it looks like a transient error. Also it's a very small porition of the log is there more?

I'd suggest if you have issues both with acme and getting you the service that you debug them separately. Ususally acme errors do not affect back end much (except for broken https), so you can ignore those while you are debugging your service, but if you feel distracted by these errors, switch acme off temproarely and see if you can get it talk to the service back end without it.

As I said in the previous message, one possible reason is that traefik cannot validate the cert that is supplied by the service. This also should produce a error in the log, but you not showing that, so I don't know if that's happening. Where is that cert on the service come from btw?

I have a similar issue. I have a computer external to the cluster (same network, but not a Kubernetes node) that is running Gitlab (not a container, installed old fashioned right on the drive).

I have the following set up to create a service that connects to it and makes it available in the cluster and then an ingress that references that service to make it accessible publicly:


apiVersion: v1
kind: Secret
metadata:
  name: gitlab-tls
data:
  # generated with:
  #    sudo cat /etc/gitlab/ssl/gitlab.example.com.crt | base64 | tr -d '\n'
  #    sudo cat /etc/gitlab/ssl/gitlab.example.com.key | base64 | tr -d '\n'
  tls.crt: <...>
  tls.key: <...>
type: kubernetes.io/tls

---
kind: Service
apiVersion: v1
metadata:
  name: gitlab
spec:
  type: ClusterIP
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
    - name: https
      protocol: TCP
      port: 443
      targetPort: 443
---
kind: Endpoints
apiVersion: v1
metadata:
  name: gitlab
subsets:
- addresses:
   # this is the local ip in my network accessible from the other nodes
  - ip: 192.168.1.32
  ports:
  - port: 80
    name: http
  - port: 443
    name: https

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: traefik
  name: gitlab
spec:
  tls:
  - hosts:
      # mine isn't example.com 
    - gitlab.example.com
    secretName: gitlab-tls
  rules:
  # again mine isn't example.com
  - host: gitlab.example.com
    http:
      paths:
      - path: /
        backend:
         serviceName: gitlab
         servicePort: http
      - path: /
        backend:
          serviceName: gitlab
          servicePort: https

After all this, my browser does report that the certificate is valid that it's receiving, but it's still receiving an Internal Server Error error 500. I can report that Gitlab is up though. It's a fresh install, and if I access directly in my network or if I point to it via DMZ then it has no issue.

Anyone else see what it could be?

Enable debug logging and look at the logs.

It appears that you are using Ingress object instead of IngressRoute I don't think it support the tls options they way you are specifying them.

Also if your Gitlab is on the same network, why are you using traefik, that is what are you trying to achieve? The direct connect should work.

If it must be traefik, I would not use your kubernetes ingress for that. Ingress purpose is to faciliate traefik from outside to the cluster, this is not your case. You have an opposite direction, from cluster outside. Set up a separate instance of traefik and use that if you must.

Many thanks @zespri for your comment. I'll turn on the logging and look more into Ingress object vs Ingress Route (I'd not heard of IngressRoutes).

I have other services I wish to make available in my cluster via subdomains. Instead of dealing with that routing elsewhere, I'm trying to keep it centralized in Kubernetes ingress (as my load 7 balancer) a bit like:

                         ┌───────k8s cluster───────┐
                L4 LB    │ L7 LB(traefik)          │
                  │      │ │                       │
                  │      │ │                       │
┌─internet─┐      │      │ ├──────▶┌───┐           │
│          │──────▶      │ ├─────▶ │ ┌─┴─┐         │
└──────────┘      │      │ ├───────▶other svcs     │
                  │      │ │         └─┤   │       │
                  ├──────┼─▶           └───┘       │
                  │      │ │                       │
                  │      │ │   ┌─gitlab svc─┐      │
                  │      │ ├──▶│            │      │
                  │      │ │   └────────────┘      │
                  │      │ │          │            │
                         └────────────┼────────────┘
                                      │             
                                      │  ┌─gitlab─┐ 
                                      └─▶│        │ 
                                         └────────┘ 

that way I can access my cluster from outside my local network (remotely when I travel and so other's can collaborate).

I think that the tls is going through right as my browser is reporting the correct tls certificate. I could be reading this wrong though, or possibly the fact that it's passing through the certificate doesn't mean that it can read it properly? I'll report back here when I have more details. Thanks again!

I forgot to mention this is all OnPrem - this is not hosted in GCP, Azure, AWS, etc.