Use subfolder of one site as root of another - i.e. reverse proxy a remote host subdir?

This isn't really my "domain" of expertise (no pun intended :wink:), but basically, I'd like to serve the contents of another site's subdirectory under a separate domain - i.e. treat that subdirectory as the root directory of a different domain.

I control both domain names, but I don't control the server that's hosting the subdirectory I want to "proxy". It seems like reverse proxying to a non-local target is what I want.

Conceptually, it seems straightforward, and I've configured a service in a dynamic file provider as described in this post and as shown below...

http:
   services:
      myservice:
         loadBalancer:
            servers:
               - url: https://site-to-proxy.com

When I visit site.com, Traefik does send a request to site-to-proxy.com, but it results in a Cloudflare 403 response code. I also tried proxying a couple of other public sites, and I get a 400 or 404 depending on the site.

The request seems to be incomplete or malformed. Here's what appears in the Traefik log...

{
  "Method": "GET",
  "URL": {
    "Scheme": "",
    "Opaque": "",
    "User": null,
    "Host": "",
    "Path": "/",
    "RawPath": "",
    "ForceQuery": false,
    "RawQuery": "",
    "Fragment": "",
    "RawFragment": ""
  },
  "Proto": "HTTP/2.0",
  "ProtoMajor": 2,
  "ProtoMinor": 0,
  "Header": {
    "Accept": [
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "Cache-Control": [
      "no-cache"
    ],
    "Dnt": [
      "1"
    ],
    "Pragma": [
      "no-cache"
    ],
    "Sec-Ch-Ua": [
      "\" Not;A Brand\";v=\"99\", \"Google Chrome\";v=\"91\", \"Chromium\";v=\"91\""
    ],
    "Sec-Ch-Ua-Mobile": [
      "?0"
    ],
    "Sec-Fetch-Dest": [
      "document"
    ],
    "Sec-Fetch-Mode": [
      "navigate"
    ],
    "Sec-Fetch-Site": [
      "none"
    ],
    "Sec-Fetch-User": [
      "?1"
    ],
    "Upgrade-Insecure-Requests": [
      "1"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"
    ],
    "X-Forwarded-Host": [
      "site.com"
    ],
    "X-Forwarded-Port": [
      "443"
    ],
    "X-Forwarded-Proto": [
      "https"
    ],
    "X-Forwarded-Server": [
      "af3eb2d6f210"
    ],
    "X-Real-Ip": [
      "172.20.0.1"
    ]
  },
  "ContentLength": 0,
  "TransferEncoding": null,
  "Host": "site.com",
  "Form": null,
  "PostForm": null,
  "MultipartForm": null,
  "Trailer": null,
  "RemoteAddr": "172.20.0.1:60938",
  "RequestURI": "/",
  "TLS": null
}

Any ideas? Is what I'm attempting to do even possible? Or perhaps I'm just going about it the wrong way? Any help or insights appreciated!

Hello @shot,

What you are intending to do seems like it should be easy, however there are some aspects that may trip you up.

  1. Domain fronting: This used to be a popular way to avoid censorship, but now many CDNs and proxies block it for security reasons
  2. Request host modification: When your client requests example.com, and you proxy the request to proxy.com, which host is proxy.com supposed to use? External services don't always treat that the same.
  3. External proxying can be flagged as MITM, as essentially you are, but since you control both domains, you should be fine.

You may want to look at setting up a VPN or some backend tunnel to your second service to keep backend requests internal, then you can bypass the CDN, and your configuration would be very simple.

1 Like

Thanks for the info, @daniel.tomcej!

I really appreciate the insights. I guess I thought it would be straightforward because WordPress is commonly used in a similar way. In that case, a subdirectory of the main site is used for the WP site, which is actually hosted elsewhere - e.g. example.com/blog where "blog" is the WP site (as explained here). It's kind of the opposite of what I want to do, but it seems similar in principle.

I guess I'll do some reading and tinkering. Thanks again.

Progress!

I simply added a "host" header as follows...

# Define middleware to add host header
- "traefik.http.middlewares.test_mw.headers.customrequestheaders.host=site-to-proxy.com"
# Assign middleware to router
- "traefik.http.routers.test_rtr.middlewares=test_mw"

...and the site appears. It renders without any errors and without the URL in the address bar changing - i.e. it remains as site.com.

This is encouraging, although I'm unable to fetch a subdirectory. I tried using the AddPrefix and ReplacePath middleware to no avail. I just get a 404.

Any idea where I'm going wrong?

Also, is it possible to set the HTTP2 pseudo headers or otherwise "rewrite" the request to include the path?