How a Website Works, Conceptually

Image of Author
November 17, 2021 (last updated September 15, 2022)

Introduction

In this blog post I will attempt a concise introduction to how a website works, conceptually speaking. I will make sweeping generalizations. My target audience are new tech workers, tech-adjacent workers, etc.

I will first talk about the 3 major parts of a website. Then, I will talk about where code is stored and how code is tested. Last, I will attempt to define some buzzwords (like microservices and tech debt) that you might here "in the biz", and hopefully they will make a bit more sense after being introduced to some new concepts. Let's go!

The Three Parts of a Website

The database, the backend, and the frontend. The database is where the data lives. The backend is "the code that the users don't see" running on the servers. The frontend is "the code that the users see" running on the browser.

The Database

The database is where data is stored. Think of the database as a backpack or a big bag that the backend (which we will get to shortly) puts data in and reads data from. It's the "source of truth" regarding data, and, as such, should be treated with fear and respect.

Literally saving any "chunk" of data is easy. But, the question of how best to save a "chunk" of data is incredibly complicated, nuanced, context-dependent, and getting it wrong could cause problems that reverberate into the future for generations (or, at least for a while). This means that the question of how to represent/model the data in the database is important, and will always be important for the lifetime of any project.

Broadly speaking, there are two ways to save data: (1) using a "relational model" using SQL (often pronounced "sequel"), and (2) using a "document model" using No-SQL.

SQL is good at capturing relationships between data objects. The prototypical example is social media, i.e., my friend's friend liked my other friend's comment on my post. There's a lot of boxes and arrows between data objects to explain something like that. It's relationally rich.

No-SQL is optimized for less relational, or non-relational data. The prototypical example is a document. A document is defined by what it contains. It's essentially an independent entity. It could live in a different folder, surrounded by different other files, without losing it's "true essence", which is its contents. Other examples are statics for a property listing or a physical description of a person, anything that feels "independent" of it's context, self-contained. It's relationally poor.

SQL is the default option. But, how you store the data is just one of many aspects of the broader topic of data modeling. There is also structuring the data you've already modeled, refactoring the data, normalizing the data, migrating the data, transforming the data, etc. A "data model" is, at best, an inferior facsimile of the reality you are addressing, so don't be surprised when developers want to discuss this always difficult issue.

The Backend

The backend is like a ferry, moving data from one shore (the database) to the other shore (the frontend, which we'll discuss shortly), and back again.

Unfortunately, the ferry analogy is way to simple. The way the data is modeled in the database is usually quite different from the way data is best displayed to the user. This means that the data, before it gets to the frontend, needs to be transformed, manipulated, enriched, cleaned, pruned, integrated with 3rd party data, etc. This is the diffcult and open-ended job that the backend has.

Another name for the backend is "the server", and the reason why is because the backend lives on a server in the cloud. (This is the most egregious oversimplification I will make in this blog post. Network architecture is much more complicated, but at the end of the day, what I've said is fair: the backend lives on the server(s).)

Having now used the word "server", I want to take a small detour to elaborate on client/server terminology. Namely, it's just an analogy expressing a relationship between at least two things, like a restaurant waiter serving a patron. What is called a "server", and what is called a "client" can, and will, change based on context. For example, a database serves data to the backend, so in that context, the database is the sever and the backend is the client. So, already, we've seen the backend referred to as both a server and a client. Here, then, is the takeaway: (1) The default context is where the backend is "the server" and the frontend is "the client", and (2) The default context often does not apply. So, if devs are talking about clients and servers in a way that is confusing you, don't be afraid of asking for clarification.

The Frontend

The frontend is what the browser shows to the user. If you are reading this blog post, you are reading it on my website's frontend. The frontend needs 3 files: An HTML file, a CSS file, and a JavaScript file.

The HTML file contains the content you want to show. The CSS file contains instructions for how you want the browser to display/style that particular content (and, as such, assumes the existence of the HTML file). Lastly, the JavaScript file contains functionality that lets you manipulate the content and styles in question (and, as such, assumes the existence of the HTML and CSS files in question).

Of particular note is the functionality possibilities enabled by JavaScript. Using Javascript, you can add and remove chunks of HTML, CSS and JavaScript. You can fetch data from your backend, and from anywhere else, as well. You can interact with the hardware device your browser is loaded on, e.g., the device camera, microphone, notification interface, augmented reality toolkit, etc. The possibilities often feel endless. Javascript is powerful.

Where the Code is Stored

The backend and the frontend are what developers spend most of their time writing. They write it all in a variety of programming languages (like JavaScript), but, at the end of the day, no matter what programming language they are writing in, all they are doing is writing glorified text files. Code is a slightly fancier version of a normal word document written in a basic text editor. All these glorified texts files, combined together, are the codebase. What makes these text files fancy is the fact that they are read by the computer. The computer will do what they say (unless the developers are frustrated, which means the computer is not listening to their very carefully phrased supplications to do work on their behalf).

Now, since these fancier text files are, indeed, fancier, they need to be stored in a fancier location than Google Drive or Dropbox (though, strictly speaking, you could store code files there and it would be... sufficient). That fancier place is called a repository. Some of the most common repository-as-a-service providers are Github, Gitlab, and BitBucket. They enable developers to all edit the same file at once and keep very close track of changes that are made to these files over time, among other things.

How the Code is Tested

Computers are the most pedantic interlocutor you will ever converse with.

The computer code developers are writing is sitting on top of mountains of other code. The complexity of a single computer is mind boggling. Every keystroke that doesn't immediately explode your machine is a miracle. Sometimes, being a developer feels like you're a hostage negotiator talking to a deranged sociopath (the computer) trying to convince it to free the hostages (the data). It's hard work, and it regularly goes wrong. And tests are the only thing between the user and the crash.

When developers write new code on their local machines, they (should also) write tests, which they write to ensure their new code works as expected. This can be hard to do sometimes, and can take a lot of time.

When the code is ready to go, it is combined, in the cloud, with other code that other developers wrote. If production was the first place all this new code got combined together, that'd be scary because maybe two bits of new code conflict with each other. So, there is often a pre-production environment called "staging" that is intended to be as close to the production environment as possible. This allows team members to test out new features, and also provides a place for code to be integrated. When all the code is combined together and sent to staging, it passes through what is called a "CICD pipeline", which stands for Continuous Integration and Continuous Deployment.

When devs talk about the importance of CICD and having a strong testing suite, trust them. They are spending all day convincing a pedantic, brittle, sociopathic, datacidal maniac to give them data. They need all the safety they can get.

Quick fire definitions of buzz words

  • MicroService Architecture

    The backend ends up doing a lot of things over time. Instead of doing it all in "one place" on one server (or one server group), you break it up into multiple servers (or multiple server groups) each doing a small, micro service which is part of the larger, macro service as a whole.

  • Monolith

    Codebases can grow to be very large. A single repository with a lot of code in it can be referred to as a monolith, or monorepo (short for monolithic repository). Also, the backend server-side code can grow to be very large. This can also be referred to as a monolith. Monoliths and microservices, in some sense, represent different sides of a spectrum, and tech circles debate over the pros and cons of each, often vehemently.

  • Servers and Scaling Horizontally

    I made the backend sound like a single server, almost like a laptop in the cloud. This is a huge oversimplification. There is a complicated world here of VMs (virtual machines), containers, platforms, networking, protocol layers, etc. This doesn't make it harder or more complicated than the frontend or the database, each of which have big complicated worlds inside of them as well. But, for practical purposes, know that there are almost always multiple servers with duplicated versions of your backend code. Writing backend code that can successfully be duplicated across mulitple servers is not easy, and doing so allows you to "scale horizontally", where the horizontal axis is the number of servers. There's not actually a graph being referred to, really, it's more a turn of phrase at this point.

  • Tech Debt

    Since the data model is a best effort representation of some reality you are trying to model, you get better at modeling it as you go. Also, if you want to add a new feature, that might have an impact on the way you model the data. For these reasons and more, it is practically inevitable that your data model will evolve over time. This affects every single layer of the codebase. When business puts undue pressure on developers to deliver new features, developers are forced to shortcut the delicate game of adjusting the data model. As these hacks and shortcuts compound over time, the codebase becomes a brittle mess ready to collapse. Often, the hacks made to the codebase are harmful, and it takes twice the effort to fix it as it would have otherwise. This is tech debt. This is evil. Don't let evil reign. Give developers time to pay down tech debt.

Conclusion

Hopefully this has helped you get a rough idea of how a typical website works, and how to go about building one. Let me know if there's anything I missed, or anything else you'd like me to cover. Thanks for reading! :D