Request duration discrepancy

dunkfordyce · January 31, 2024, 5:30pm

Hi,

We run Traefik inside docker swarm running on a single server. Traefik proxies requests to around 30 services. We run Traefik with

      - --providers.docker=true
      - --providers.docker.swarmMode=true

We have noticed that every 5ish seconds we are getting a request that takes considerably longer than it should, well over 10s for small files.
All services seem to be effected the same way. Looking at the response time in the backend for these requests, they are always reported as much faster compared to traefik.

{
  "unix": 1706720348,
  "source": "uwsgi-request",
  "pid": "461",
  "addr": "0.0.0.0",
  "ClientHost": "12.34.56.78",
  "CorrelationId": "018d6075-6b37-7a29-8680-ff17e8a011fb",
  "vars": "64",
  "pktsize": "2780",
  "ctime": "Wed Jan 31 16:59:08 2024",
  "method": "GET",
  "RequestPath": "/myaccount/campaigns",
  "rsize": "10615",
  "msecs": "92",
  "proto": "HTTP/1.1",
  "status": "200",
  "headers": "4",
  "hsize": "110",
  "switches": "2",
  "core": "19"
}

{
  "ClientAddr": "12.34.56.78:56884",
  "ClientHost": "12.34.56.78",
  "ClientPort": "56884",
  "ClientUsername": "-",
  "DownstreamContentSize": 10615,
  "DownstreamStatus": 200,
  "Duration": 24257415947,
  "OriginContentSize": 10615,
  "OriginDuration": 24256876820,
  "OriginStatus": 200,
  "Overhead": 539127,
  "RequestAddr": "www.mydomain.co.uk",
  "RequestContentSize": 0,
  "RequestCount": 146988,
  "RequestHost": "www.mydomain.co.uk",
  "RequestMethod": "GET",
  "RequestPath": "/myaccount/campaigns",
  "RequestPort": "-",
  "RequestProtocol": "HTTP/2.0",
  "RequestScheme": "https",
  "RetryAttempts": 0,
  "RouterName": "mydomain-www@docker",
  "ServiceAddr": "10.0.7.33:9000",
  "ServiceName": "mydomain-www@docker",
  "ServiceURL": {
    "Scheme": "http",
    "Opaque": "",
    "User": null,
    "Host": "10.0.7.33:9000",
    "Path": "",
    "RawPath": "",
    "OmitHost": false,
    "ForceQuery": false,
    "RawQuery": "",
    "Fragment": "",
    "RawFragment": ""
  },
  "StartLocal": "2024-01-31T16:58:44.40699531Z",
  "StartUTC": "2024-01-31T16:58:44.40699531Z",
  "TLSCipher": "TLS_AES_128_GCM_SHA256",
  "TLSVersion": "1.3",
  "entryPointName": "web-secured",
  "level": "info",
  "msg": "",
  "request_X-Correlation-Id": "018d6075-6b37-7a29-8680-ff17e8a011fb",
  "time": "2024-01-31T16:59:08Z"
}

uwsgi here reports the request took 92ms where as traefik reports 24 seconds. How is this possible?

Looking at the debug level output of traefik we do see a similar amount of lines like this:

  "level": "debug",
  "middlewareName": "traefik-internal-recovery",
  "middlewareType": "Recovery",
  "msg": "Request has been aborted [23.24.25.26:50344 - /common.8971645234.css]: net/http: abort Handler",
  "time": "2024-01-31T13:30:28Z"
}

Could this be related to anything?

Does anyone have ideas on how to debug this issue and whether those log lines are something to worry about?
Thanks

bluepuma77 · January 31, 2024, 8:57pm

Is your network stable, no issues with VLANs and too large MTUs? Are the target services stable?

dunkfordyce · February 1, 2024, 9:54am

Hi,

Thanks for the reply.

Everything is running on one host computer inside one "proxy" VLAN.

I have checked the MTU's of my devices and they are all set to 1500 including the docker ones.

The services are stable.

It certainly "feels" like a networking issue because of the slow responses but I do not know how to diagnose such a problem.

Any further help much appreciated

bluepuma77 · February 1, 2024, 10:39am

If you want to get your hands dirty, you could check [HOWTO] Capture the communication of and inside a Docker container using Wireshark with Edgeshark plugin

dunkfordyce · February 1, 2024, 12:17pm

That looks amazing but I cant get it to run. The ui complains about a missing overlays in javascript land.
I can do a tcpdump inside the network but its a lot of traffic and I'm not sure what I am looking for!

thediveo · April 12, 2024, 4:02pm

Edgeshark dev here, sorry for your inconvenience. This bug should be fixed and you might have just narrowly missed it, because the fix went in on Feb 1st. Could you please give it another try by just redeploying to pull the most recent images, and then report success or failure?

Please note that for captures you'll need to install the Wireshark plugin from Release v0.10.7 · siemens/cshargextcap · GitHub.

You can then start captures even directly from Wireshark without going through the web UI.

Topic		Replies	Views
504 Timeouts and slow requests Traefik v2 docker , docker-swarm , middleware	1	1925	October 18, 2022
Slow response time from docker host #7941 Traefik v2 docker , metrics , tracing , tcp	1	2964	March 3, 2021
Long response time from Traefik (10 seconds too long - 2x 5 second freezes) Traefik v2 docker	6	3463	March 9, 2023
Slow transfer speeds (when running in docker swarm cluster) Traefik v2 docker-swarm	4	1524	December 7, 2024
HTTP Duration in Traefik seems very low Traefik v2 metrics	0	447	November 24, 2023

Request duration discrepancy

Related topics