Subdomains, Apex Domains, and Canonical URLs

Image of Author
April 4, 2022 (last updated January 12, 2023)

Introduction

A URL can be broken down into parts. Let's take www.gatlin.io as an example. .io is the TLD or Top-Level Domain. gatlin is the Second-Level Domain. www is the sub-domain. gatlin.io, which is the SLD and TLD together, is considered the apex domain.

Subdomains

Subdomains like www are useful for directing traffic to different sets of servers. For example, mail transfer servers could be at "mx.gatlin.io", file transfer servers at "ftp.gatlin.io", and internet servers at "www.gatlin.io".

"www" is historical and arbitrary. Network administrators wanted a short subdomain name for their internet server(s). They picked "www" for "world wide web".

Canonical URLs

A canonical URL is, in some sense, the URL. There are many URL strings that a user can type in to their browser's URL bar, all of which should direct them to your website. You want to be liberal in accepting all these different attempts. For example, I want users to be able to reach my website via any of these URL strings:

gatlin.io
www.gatlin.io
http://www.gatlin.io
https://www.gatlin.io
http://gatlin.io
https://gatlin.io

You can also add a "/" to the end of all of those and they should still work. That's 12 different strings all redirecting to a single string/URL/website. But, you don't want 12 different URLs all claiming to be the URL of your website. The biggest problem with that is SEO. You want the internet to think of your content as coming from a single place, a single source of truth, with a single location. This is where a canonical URL comes in handy. You can declare the "true location" of your content within the content itself via a [canonical link](<link rel="canonical" href="http://example.com/">) in the head of your html.

<link rel="canonical" href="http://example.com/">

It also helps to redirect all the possible paths taken to the single canonical path. This breaks down basically into two steps: (1) Redirecting http to https and, (2) either redirecting apex domain to subdomain, or vice versa.

Regarding the first point, redirecting http to https, I'm under the impression that the best practice here is an application-level redirect. Implementing said redirect is, then, framework specific, and and as such, outside the scope of this note.

Regarding the second point, the core question is: which redirect should you do: apex domain to subdomain, or subdomain to apex domain?

Apex domain vs subdomain

Voices in the space

There are websites devoted to arguing for each side. dropwww and no-www argue for making apex URLs canonical, while yes-www argues for making www canonical. I have read through each of these website and find their argument interesting, but on the whole, I don't find the debate persuasive either way.

What do tech companies do?

You can look at the canoncial link in the head of webpages of major websites to see the choices they made. You can also visiting a website via both the www subdomain and the apex domain, and see how they redirect.

I thought of 10 random tech companies, and, appropriately enough, it was split down the middle. 5 redirected from subdomain to apex domain: stack overflow, github, vercel, tailwindcss, and duckduckgo. 5 redirected from apex domain to subdomain: google, heroku, amazon, netflix, mozilla.

I think the big takeaway here is you can do whatever you want and you will have tech giants backing you up.

Heroku Incident Report

This question was essentially asked by a user on ServerFault, and this answer, while being open-ended, referenced a Heroku incident report which recommended subdomains for DDoS mitigation (they refer to apex domains as root domains):

Root domains are aesthetically pleasing, but the nature of DNS prevents them from being a robust solution for web apps. Root domains don't allow CNAMEs, which requires hardcoding IP addresses, which in turn prevents flexibility on updates to IPs which may need to change over time to handle new load or divert denial-of-service attacks.

We strongly recommend against using root domains. Use a subdomain that can be CNAME aliased to proxy.heroku.com, and avoid ever manually entering IPs into your DNS configuration. We also recommend a low TTL value, which will allow Heroku network engineers to quickly make changes to DNS mapping when necessary.)

There are a few clarifications in order to understand this quote. A CNAME record is a DNS record that points from and alias record to a canonical name record. It points one domain name to another domain name. Root domains don't allow CNAME records because, I think, they are semantically intended to be unique locations, and therefore should point to an IP address, which is also a semantically unique location (which is not a single location, strictly speaking, as we will see later).

Here's my guess as to what happened, and why their recommendation makes sense: Heroku has a domain, proxy.heroku.com, apparently, that sits in front of the servers they host on behalf of their users. Their proxy server(s) job is to watch incoming packets and redirect them to the appropriate user-controlled server. Heroku supports the ability for its users to point their apex domains to their (Heroku hosted) servers. Heroku supports this feature by revealing the IP address(es) of their proxy domain (proxy.heroku.com).

Now, a DDoS attack targeting proxy.heroku.com would be hitting those IP address(es). Heroku would mitigate this by adding new IP addresses and point proxy.heroku.com to those new addresses. If you were using a CNAME to point to proxy.heroku.com, you were safe, because Heroku effectively moved you to a different public IP. But, if you were pointing directly to the attacked IPs, Heroku couldn't move you. Instead Heroku had to do more sophisticated defensive techniques to keep those attacked IPs up and running

So, what can we conclude? Well, since most servers these days are hosted, we should use subdomains, right? So our host providers can more effectively mitigate DDoS attacks? Well, yes, but, also, that's not the whole story.

One thing that has not been mentioned is that I suspect Heroku's IP addresses were using the time-honored unicast IP transmission methodology, where one IP address pointed to one, public-facing server. A one-to-one transmission methodology. The public-facing server in question is likely a load balancer or firewall server, an efficient reverse-proxy server quickly passing off packets to other servers within a subnet. This single point of failure is why, historically, DDoS attacks are successful: you can overwhelm a single point of failure.

Anycast IP transmission

The anycast transmission technique separates the notion of unique from single. A unique IP address can be declared by multiple servers. These multiple servers can be distributed all around the world. This allows routers to direct traffic to the closest node declaring that IP address.

This technique places more responsibility on the server provider. They have to ensure that they are providing the same experience across all those different servers.

One upside to this approach is DDoS mitigation. Routers can tell if a node is flooded with requests (via the Border Gateway Protocol) and can then direct traffic to other anycast nodes declaring the IP address. This technique means there is no longer a single point of failure (at least theoretically).

So, what can we conclude? If we use a sophisticated enough hosting provider, they can use anycast techniques, and we can once again direct apex domain traffic to IP addresses. These IP addresses are now no longer susceptible to DDoS attacks. So, perhaps we were temporarily in a place where subdomains were recommended, but now we can go back to doing what we want?

Vercel uses anycast and still recommends subdomains

Vercel seems to strongly recommended using subdomains. Why? For some reason related to their usage of an anycast edge network.

The following is from their documentation page on domains:

When you add an apex domain, Vercel will recommend that you add a redirect to a www subdomain. This is because www records allow for better control over your domain. Anything configured on the apex domain (for example, cookies or CAA records), will usually apply to all subdomains, rather than setting it on the www subdomain, which will only apply to your www record. In addition, because Vercel's servers use anycast networking, it can handle CNAME records differently, allowing for quicker DNS resolution and therefore a faster website experience for the end user.

In another location in their docs, they explain more about their recommendation to redirect

We recommend using the www subdomain as your primary domain, with a redirect from the non-www domain to it. This allows the Vercel Edge Network more control over incoming traffic for improved reliability, speed, and security. The redirect is also cached on visitor's browsers for faster subsequent visits.

Their Vercel Edge Network is an Anycast IP Range, and they are explicitly advertising apex domain support. I suspect that due to their anycast approach they are not (as) susceptible to DDoS as Heroku was back then (I couldn't tell if Heroku has since migrated to anycast as well).

Why does Vercel still recommend subdomains even with anycast? I think they still fear DDoS attacks, perhaps ones capable of flooding multiple nodes. I ultimately can't tell, but also can imagine that hosting providers will always recommend their users point to domain names instead of IPs within their networks. If you point to a hosting provider's proxy domain name, they will always be able to point that proxy elsewhere, on your behalf. Maybe they have a fallback anycast IP range, or maybe they have a fallback unicast IP range that has special protections. They can point their proxy domain where ever they want, but you have to change the IP address you just manually typed in to your DNS provider's nameserver configuration portal, and you aren't responding day and night to service outages.

Vercel makes another point in their above quotes: "Anything configured on the apex domain... will usually apply to all subdomains". It wouldn't stop someone from taking the apex domain approach, but it has some persuasive value nonetheless.

Conclusion

The conclusion is, ultimately: do what you want.

But, I think a recommendation in favor of subdomains is apparent for people who use cloud hosting. Personally, I've reached the conclusion that I'm going to use (the historically arbitrary) www subdomain as my canonical URL when I'm using cloud hosting in particular (which is basically always for personal projects).