I'm running a mender server in the kubernetes and using Traefik as ingress. I'm getting a 499 Client Closed Request error while uploading artifact to mender-server if the upload takes more than 1minute. If I upload smaller artifacts, which takes less than one minute to upload, works fine. I'm only getting error if I upload large artifacts which takes more than one minute to upload.
I saw in the documentation that if this error occurs, we can override the response timeout argument and will solve the problem. I did that but still getting the same 499 error.
I have also checked mender server's configuration, There are no size or time restrictions from the server. Default uplaod timeout is 1 hour and max upload size is 10GB in the mender configuration. So I think it's the traefik causing the error.
Following is my traefik deployment and traefik ingress configuration :
Name: traefik
Namespace: default
CreationTimestamp: Wed, 23 Oct 2024 10:26:44 +0000
Labels: app.kubernetes.io/instance=traefik-default
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-32.1.1
Annotations: deployment.kubernetes.io/revision: 3
meta.helm.sh/release-name: traefik
meta.helm.sh/release-namespace: default
Selector: app.kubernetes.io/instance=traefik-default,app.kubernetes.io/name=traefik
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 0 max unavailable, 1 max surge
Pod Template:
Labels: app.kubernetes.io/instance=traefik-default
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-32.1.1
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 9100
prometheus.io/scrape: true
Service Account: traefik
Containers:
traefik:
Image: docker.io/traefik:v3.1.6
Ports: 9100/TCP, 9000/TCP, 8000/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
--global.checknewversion
--global.sendanonymoususage
--entryPoints.metrics.address=:9100/tcp
--entryPoints.traefik.address=:9000/tcp
--entryPoints.web.address=:8000/tcp
--entryPoints.websecure.address=:8443/tcp
--api.dashboard=true
--ping=true
--metrics.prometheus=true
--metrics.prometheus.entrypoint=metrics
--providers.kubernetescrd
--providers.kubernetescrd.allowEmptyServices=true
--providers.kubernetesingress
--providers.kubernetesingress.allowEmptyServices=true
--entryPoints.websecure.http.tls=true
--entryPoints.websecure.transport.respondingTimeouts.readTimeout=300
--log.level=INFO
Liveness: http-get http://:9000/ping delay=2s timeout=2s period=10s #success=1 #failure=3
Readiness: http-get http://:9000/ping delay=2s timeout=2s period=10s #success=1 #failure=1
Environment:
POD_NAME: (v1:metadata.name)
POD_NAMESPACE: (v1:metadata.namespace)
Mounts:
/data from data (rw)
/tmp from tmp (rw)
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Node-Selectors: <none>
Tolerations: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: traefik-6c56784b69 (0/0 replicas created), traefik-7858bc95cb (0/0 replicas created)
NewReplicaSet: traefik-7d9bf4d54 (1/1 replicas created)
Events: <none>
Name: mender-ingress
Labels: <none>
Namespace: default
Address:
Ingress Class: traefik
Default backend: <default>
TLS:
new-tls-secret terminates mender.scanomat.com
Rules:
Host Path Backends
---- ---- --------
mender.scanomat.com
/ mender-api-gateway:80 (10.42.0.172:9080)
Annotations: cert-manager.io/issuer: letsencrypt
Events: <none>
I got the following logs in the mender-deployment while calling a mender-api to upload artifact which is taking more than one minute to upload.
time="2024-10-21T16:55:48Z" level=error msg="azblob PutObject: failed to upload object to blob: context canceled" caller="view.(*RESTView).RenderInternalError@view.go:72" request_id=66772fb8-f862-451a-83b7-046743424cc2 user_id=753afdfb-ee20-4fd3-985e-85c74fe4c56e
time="2024-10-21T16:55:48Z" level=info msg="500 59998118μs POST /api/management/v1/deployments/artifacts/generate HTTP/1.1 - Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" byteswritten=78 caller="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1@middleware.go:82" method=POST path=/api/management/v1/deployments/artifacts/generate qs= request_id=66772fb8-f862-451a-83b7-046743424cc2 responsetime=59.998118647 status=500 ts="2024-10-21 16:54:48.518920248 +0000 UTC" type=http user_id=753afdfb-ee20-4fd3-985e-85c74fe4c56e
I tried to upload using mender-cli as well and got the 499 error as shown below:
67.22 MiB / 535.74 MiB [------------>] 12.55% 1.11 MiB p/sVERBOSE response: HTTP/1.1 499 status code 499
Connection: close
Content-Length: 21
Date: Wed, 23 Oct 2024 12:14:13 GMT
Referrer-Policy: no-referrer
Strict-Transport-Security: max-age=31536000; includeSubDomains; preloadVary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Client Closed Request
FAILURE: artifact upload to 'mender.scanomat.com' failed with status 499ERROR: exit status: 1
When I used curl to upload artifact, I got the following response:
curl -v -X POST \ https://mender.scanomat.com/api/management/v1/deployments/artifacts \
-H "Authorization: Bearer ..." \
-F "artifact=@boss-imx8mm-var-dart-0.0.0-dev.mender"
Note: Unnecessary use of -X or --request, POST is already inferred.* Host mender.scanomat.com:443 was resolved.*
IPv6: (none)* IPv4: 13.69.133.251* Trying 13.69.133.251:443...* Connected to mender.scanomat.com (13.69.133.251)
port 443* ALPN: curl offers h2,http/1.1* (304) (OUT), TLS handshake, Client hello (1):* CAfile: /etc/ssl/cert.pem*
CApath: none* (304) (IN), TLS handshake, Server hello (2):* (304) (IN), TLS handshake, Unknown (8):* (304) (IN), TLS
handshake, Certificate (11):* (304) (IN), TLS handshake, CERT verify (15):* (304) (IN), TLS handshake, Finished (20):* (304)
(OUT), TLS handshake, Finished (20):* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] /
UNDEF* ALPN: server accepted h2* Server certificate:* subject: CN=mender.scanomat.com*
start date: May 17 06:19:53 2024 GMT* expire date: Jun 18 06:19:53 2025 GMT* subjectAltName:
host "mender.scanomat.com" matched cert's "mender.scanomat.com"*
issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/;
CN=Go Daddy Secure Certificate Authority - G2* SSL certificate verify ok.* using HTTP/2* [HTTP/2] [1] OPENED stream for https://mender.scanomat.com/api/management/v1/deployments/artifacts*
[HTTP/2] [1] [:method: POST]* [HTTP/2] [1] [:scheme: https]* [HTTP/2] [1] [:authority: mender.scanomat.com]*
[HTTP/2] [1] [:path: /api/management/v1/deployments/artifacts]*
[HTTP/2] [1] [user-agent: curl/8.7.1]* [HTTP/2] [1]
[accept: */*]* [HTTP/2] [1] [authorization: Bearer ...]* [HTTP/2] [1] [content-length: 561762549]* [HTTP/2] [1]
[content-type: multipart/form-data; boundary=------------------------qKlT1gdNLonJbvM5ahJJxT]>
POST /api/management/v1/deployments/artifacts HTTP/2> Host: mender.scanomat.com> User-Agent: curl/8.7.1> Accept: */*>
Authorization: Bearer ...> Content-Length: 561762549> Content-Type: multipart/form-data; boundary=------------------------qKlT1gdNLonJbvM5ahJJxT>
< HTTP/2 499 < date: Wed, 23 Oct 2024 12:54:45 GMT< referrer-policy: no-referrer< strict-transport-security: max-age=31536000; includeSubDomains; preload< vary:
Accept-Encoding< x-content-type-options: nosniff< x-xss-protection: 1; mode=block< content-length: 21< *
HTTP error before end of send, stop sending* abort upload after having sent 479788000 bytes* Connection #0 to host mender.scanomat.com left intactClient Closed Request%
I have tried almost all the solution from internet related to traefik and mender but nothing was helpful. I don't know from where this timeout value is coming from or the overridden values are not taking into effect. I'm totally stuck and have no idea how to solve this issue. I'll be grateful for any suggestion.