Hello all:
I am trying to run netboot.xyz behind Traefik CE v3.1.2, but the TFTP protocol is not working.
I believe the Traefik configuration is correct, but I am looking for validation.
My environment is using Podman 5.1.2, Nomad 1.8.3, Consul 1.19.1.
Here is the Nomad definition for Traefik:
job "traefik" {
datacenters = ["homelab"]
type = "service"
group "traefik" {
network {
port "http" {
static = 80
}
port "https" {
static = 443
}
port "tftp" {
static = 69
}
}
service {
name = "traefik"
port = "https"
tags = [
"traefik.enable=true",
"traefik.http.routers.dashboard.rule=Host(`traefik.example.com`)",
"traefik.http.routers.dashboard.service=api@internal",
"traefik.http.routers.dashboard.entrypoints=web,websecure",
"traefik.http.routers.dashboard.tls.certresolver=internal",
"traefik.http.routers.dashboard.tls=true",
]
check {
name = "alive"
type = "tcp"
port = "http"
interval = "10s"
timeout = "2s"
}
}
task "traefik" {
driver = "podman"
config {
image = "docker.io/library/traefik:v3.1.2"
ports = [
"http",
"https",
"tftp",
]
args = [
"--api.dashboard=true",
"--log.level=DEBUG",
"--accesslog=true",
# Consul integration
"--providers.consulcatalog=true",
"--providers.consulcatalog.exposedByDefault=false",
"--providers.consulcatalog.prefix=traefik",
"--providers.consulcatalog.endpoint.address=${NOMAD_IP_http}:8500",
# Internal ACME/PKI
"--certificatesresolvers.internal.acme.caserver=https://ca.example.com/acme/acme/directory",
"--certificatesresolvers.internal.acme.email=${NOMAD_SHORT_ALLOC_ID}@example.com",
"--certificatesresolvers.internal.acme.storage=/local/internal.acme.json",
"--certificatesresolvers.internal.acme.dnsChallenge=true",
"--certificatesresolvers.internal.acme.dnschallenge.provider=rfc2136",
"--certificatesresolvers.internal.acme.dnschallenge.resolvers=ww.xx.yy.zz:53",
"--certificatesresolvers.internal.acme.dnschallenge.delayBeforeCheck=60",
"--certificatesresolvers.internal.acme.certificatesduration=720",
# HTTP entrypoints
"--entrypoints.web.address=:${NOMAD_PORT_http}",
"--entrypoints.websecure.address=:${NOMAD_PORT_https}",
# Non-HTTP entrypoints
"--entrypoints.netbootxyz.address=:${NOMAD_PORT_tftp}/udp",
]
}
artifact {
source = "https://ca.example.com/roots.pem"
mode = "file"
}
env {
LEGO_CA_CERTIFICATES = "/local/roots.pem"
RFC2136_TSIG_KEY = "traefik-dns-challenge"
RFC2136_TSIG_SECRET = "<REDACTED>"
RFC2136_TSIG_ALGORITHM = "hmac-sha256"
RFC2136_NAMESERVER = "ww.xx.yy.zz"
}
resources {
cpu = 100
memory = 128
}
}
}
}
Here is the Nomad definition for netboot.xyz:
job "netbootxyz" {
datacenters = ["homelab"]
type = "service"
group "netbootxyz" {
network {
port "ui" {
to = 3000
}
port "tftp" {
to = 69
}
}
service {
name = "netbootxyz-ui"
port = "ui"
tags = [
"traefik.enable=true",
"traefik.http.routers.netbootxyz-ui.rule=Host(`netbootxyz.example.com`)",
"traefik.http.routers.netbootxyz-ui.service=netbootxyz-ui",
]
}
service {
name = "netbootxyz-tftp"
port = "tftp"
tags = [
"traefik.enable=true",
"traefik.udp.routers.netbootxyz-tftp.entryPoints=netbootxyz",
"traefik.udp.routers.netbootxyz-tftp.service=netbootxyz-tftp",
]
}
reschedule {
attempts = 3
delay = "20s"
delay_function = "exponential"
interval = "3m"
unlimited = false
}
task "netbootxyz" {
driver = "podman"
config {
image = "netbootxyz/netbootxyz:0.7.3-nbxyz1"
ports = ["ui", "tftp"]
}
resources {
cpu = 200
memory = 600
}
}
}
}
Here is the port mapping on the netboot.xyz container:
podman container inspect netbootxyz-f0446b3f-4816-50f8-eb29-c0f52d0d387b | jq '.[].NetworkSettings.Ports'
{
"3000/tcp": [
{
"HostIp": "10.1.18.13",
"HostPort": "25857"
}
],
"3000/udp": [
{
"HostIp": "10.1.18.13",
"HostPort": "25857"
}
],
"69/tcp": [
{
"HostIp": "10.1.18.13",
"HostPort": "28631"
}
],
"69/udp": [
{
"HostIp": "10.1.18.13",
"HostPort": "28631"
}
],
"80/tcp": null
}
In Traefik's log, I see the client is attempting to connect to the TFTP server:
2024-08-28T11:41:32.368542293-04:00 stdout F 2024-08-28T15:41:32Z DBG github.com/traefik/traefik/v3/pkg/udp/proxy.go:23 > Handling UDP stream from 172.16.100.116:7371 to 10.1.18.13:28631
2024-08-28T11:41:40.318793590-04:00 stdout F 2024-08-28T15:41:40Z DBG github.com/traefik/traefik/v3/pkg/udp/proxy.go:23 > Handling UDP stream from 172.16.100.116:7371 to 10.1.18.13:28631
The client attempts to connect and fails with a timeout error while trying to download the PXE boot file.
If I remove Traefik from the equation, everything will work fine.
I would imagine the end-to-end port routing should work:
client (7371/udp) --> Traefik (69/udp) --> container (28631/udp:69/udp)
HTTP is working just fine.
From a Traefik perspective, does the configuration above look correct?
Thanks