Request duration discrepancy

Hi,

We run Traefik inside docker swarm running on a single server. Traefik proxies requests to around 30 services. We run Traefik with

      - --providers.docker=true
      - --providers.docker.swarmMode=true

We have noticed that every 5ish seconds we are getting a request that takes considerably longer than it should, well over 10s for small files.
All services seem to be effected the same way. Looking at the response time in the backend for these requests, they are always reported as much faster compared to traefik.

{
  "unix": 1706720348,
  "source": "uwsgi-request",
  "pid": "461",
  "addr": "0.0.0.0",
  "ClientHost": "12.34.56.78",
  "CorrelationId": "018d6075-6b37-7a29-8680-ff17e8a011fb",
  "vars": "64",
  "pktsize": "2780",
  "ctime": "Wed Jan 31 16:59:08 2024",
  "method": "GET",
  "RequestPath": "/myaccount/campaigns",
  "rsize": "10615",
  "msecs": "92",
  "proto": "HTTP/1.1",
  "status": "200",
  "headers": "4",
  "hsize": "110",
  "switches": "2",
  "core": "19"
}

{
  "ClientAddr": "12.34.56.78:56884",
  "ClientHost": "12.34.56.78",
  "ClientPort": "56884",
  "ClientUsername": "-",
  "DownstreamContentSize": 10615,
  "DownstreamStatus": 200,
  "Duration": 24257415947,
  "OriginContentSize": 10615,
  "OriginDuration": 24256876820,
  "OriginStatus": 200,
  "Overhead": 539127,
  "RequestAddr": "www.mydomain.co.uk",
  "RequestContentSize": 0,
  "RequestCount": 146988,
  "RequestHost": "www.mydomain.co.uk",
  "RequestMethod": "GET",
  "RequestPath": "/myaccount/campaigns",
  "RequestPort": "-",
  "RequestProtocol": "HTTP/2.0",
  "RequestScheme": "https",
  "RetryAttempts": 0,
  "RouterName": "mydomain-www@docker",
  "ServiceAddr": "10.0.7.33:9000",
  "ServiceName": "mydomain-www@docker",
  "ServiceURL": {
    "Scheme": "http",
    "Opaque": "",
    "User": null,
    "Host": "10.0.7.33:9000",
    "Path": "",
    "RawPath": "",
    "OmitHost": false,
    "ForceQuery": false,
    "RawQuery": "",
    "Fragment": "",
    "RawFragment": ""
  },
  "StartLocal": "2024-01-31T16:58:44.40699531Z",
  "StartUTC": "2024-01-31T16:58:44.40699531Z",
  "TLSCipher": "TLS_AES_128_GCM_SHA256",
  "TLSVersion": "1.3",
  "entryPointName": "web-secured",
  "level": "info",
  "msg": "",
  "request_X-Correlation-Id": "018d6075-6b37-7a29-8680-ff17e8a011fb",
  "time": "2024-01-31T16:59:08Z"
}

uwsgi here reports the request took 92ms where as traefik reports 24 seconds. How is this possible?

Looking at the debug level output of traefik we do see a similar amount of lines like this:

  "level": "debug",
  "middlewareName": "traefik-internal-recovery",
  "middlewareType": "Recovery",
  "msg": "Request has been aborted [23.24.25.26:50344 - /common.8971645234.css]: net/http: abort Handler",
  "time": "2024-01-31T13:30:28Z"
}

Could this be related to anything?

Does anyone have ideas on how to debug this issue and whether those log lines are something to worry about?
Thanks

Is your network stable, no issues with VLANs and too large MTUs? Are the target services stable?

Hi,

Thanks for the reply.

Everything is running on one host computer inside one "proxy" VLAN.

I have checked the MTU's of my devices and they are all set to 1500 including the docker ones.

The services are stable.

It certainly "feels" like a networking issue because of the slow responses but I do not know how to diagnose such a problem.

Any further help much appreciated

If you want to get your hands dirty, you could check [HOWTO] Capture the communication of and inside a Docker container using Wireshark with Edgeshark plugin

That looks amazing but I cant get it to run. The ui complains about a missing overlays in javascript land.
I can do a tcpdump inside the network but its a lot of traffic and I'm not sure what I am looking for!