How I Use Obsidian as a CMS for my Website

Image of Author
October 14, 2022 (last updated September 6, 2023)

Introduction

My website, gatlin.io, is sourcing some content from my Obsidian vault. It is an exciting setup, in my opinion, because I can create content in Obsidian and then choose if I want to make the note public or not.

In this post, I will share my technical setup for anyone who finds it interesting or wants to do the same.

Preliminary Points

  • Most of the workflow is encapsulated in a script I wrote called metamark
  • My website is a Nextjs website hosted on Vercel.
  • On my local machine, my website repo and my Obsidian vault are sibling directories.
  • My Obsidian vault has an essentially flat file structure. (See My PKM Workflow#PKM Notes Obsidian for more.)

Abstract

In my website repo I use metamark to iterate through my vault files. Metamark parses the frontmatter for a public boolean. If public is set to true, it adds that file to a "public files set". Next, it iterates through the "public files set", turning each file into structured data. The structured data includes HTML transformations (via unified.js tooling), ToC extracts, and more. Finally, I write that payload to a large contents.json file in my website repo, and that file becomes the "source of truth" for my Obsidian content, which is then read at compile time by Nextjs and served by Vercel.

Obsidian Frontmatter

To begin with, I tag my content using a typical approach of yaml frontmatter. Most notably, when a file is ready to be viewed, I mark it public: true. Here's an example of some of the more important metadata items.

---
public: true
tags: [note]
created: ...
updated: ...
published: ...
---

The script I use parses this data with gray-matter. If public both exists and is set to true, then I add the file name to a set of public files. This is all I do on the first pass.

The reason for a public files set?

Why have a public files set? Why loop through files twice? Because, Obsidian is based on wiki links, and as such, one file will reference an arbitrary number of other files within Obsidian. How to handle these wiki links depends on whether those other files are public. If a linked file is not public, then the link is removed, and it becomes plaintext. Otherwise, it needs to undergo further processing to be turned into an HTML link.

In short: you need a manifest of public files before you actually start parsing files so you know what to do when you hit an arbitrary wiki link.

Parse public files

After I have collected the set of public files, I iterate through that set and parse each file into structured data. The primary type is FileData, which has the following properties.

export interface FileData {
  fileName: string;
  slug: string;
  firstParagraphText: string;
  frontmatter: Record<string, any>;
  html: string;
  toc: TocItem[];
}

Finally, I take my list of generated data and write it to a contents.json file. Nextjs will then read that in and it becomes a data source.

Building the contents with metamark

In my website repo, I have a file in bin/buildContents.mjs that has the following code:

#! /usr/bin/env node

import m from 'metamark'

// important! run this script only from the root of this repo
const data = m.obsidian.vault.process('../vault')
const jsonString = m.utility.jsonStringify(data)
m.utility.writeToFileSync('./contents.json', jsonString)

Running this script generates a top-level file, contents.json. I check in that file to github. It contains only the information I set to public, so there are no secrets in that file. This means your codebase can still be open source with this workflow, if you'd like.

Serving the content in Nextjs

I will keep this section brief since I covered some of it before in a previous blog post, Nextjs Markdown Blog Setup.

  • I serve all my content - blogs, guides, and notes - through a single /content/[slug] route, which is inspired by the path structure of Wikipedia.
  • I use getStaticPaths to read all the content slugs at compile time.
  • Those slugs get passed via getStaticProps. I use the slug as a "soft key" to find the rest of the content. I then prune the content down into whatever is needed for that particular page. This is also done at compile time.

The Upsides

My content is decoupled from my codebase.

I was initially writing content within my website repo while also maintaining an Obsidian vault. Keeping them in-sync was not feasible. I had WIP content in one, experimental content in another, and copied content from each into other one. It was unpleasant. Having my content all in one place is honestly relieving. It even feels good to type. Whew!

Notes are developed in Obsidian, website is developed in repo

I write all my notes, at any stage of development, in my Obsidian vault. I write all my code, at any stage of development, in my website repo.

I used to manage in-progress content and in-progress code changes all within my website repo. One does not write content the way one writes code. So, in brief, it was ghastly. Dirty working trees were the abysmal commons. But, now, I can just go to to my website repo when I want to change the code on the website. Then, when my code is in a good (enough) spot, I can just run my data-import script when I want to update content.

I also now don't have to manage works in progress in my repo. Everything can be a work in progress in Obsidian until I'm done with it. I can then simply flip public: false to public: true. Now, the next time I run my import script, it will regenerate contents.json and all will be well in the world.

Independent table of contents, as data

In the past I have disliked using css-fu or other complicated strategies to manipulate a generated HTML ToC within a page. It would inject a h2 at the top of the content somewhere and I had to be a css wizard to get it looking the way I wanted across all the different screen sizes. I wanted more control over what I did with my ToC.

Turning the ToC into an in-memory object allows me to manipulate the display of it however I want. It can now be an input prop to an arbitrary component in an arbitrary location in an arbitrary page. This is useful already, but could become even more useful if I want to build out more advanced features (like a client-side search function) in the future.

Arbitrary access to metadata

I read in the entire frontmatter and turn it into JSON, which I can do anything with. This is open-ended empowerment. For example, I use it currently to display page aliases. The potential for future features is completely open-ended!

The Downsides

Realtime edits are not viewable in dev mode.

In my old setup, where I edited content within my website repo, a simple refresh showed my content "as the user would experience it". This was a fast feedback loop.

Now, I have to run my import script, restart my dev env, and then reload the page.

My old "editor hat" workflow was reading it on the website to get a sense of the reading experience, and from that making edits, small or large. I now lack fast-feedback access to that "reader experience". I will have to restructure my "editor hat" workflow to accomodate.

That said, the decoupling mentioned above is unquestionably worth it, in my opinion, even years later.

Wiki links are not in the markdown specification. They can only be parsed by specialised tools, like Obsidian itself. This makes handling them non-trivial.

A minor point, but still annoying is that my writing style is forced into a slightly odd position. Sometimes, the way you write a sentence in anticipation of a link is different than how you write a "normal" sentence. Also, sometimes, the way you write content for a website is slightly different from the way you write content for Personal Knowledge Management. For example, I open this blog post talking about how "my website" sources content from my Obsidian vault. But, if you are not me, you are already reading this on "my website". So, I'm talking about "my website" like it's not what you are currently reading. I do this because I don't write it on "my website". I could change the opening text to say "This website...", but that's odd to write when I'm writing it in Obsidian.

Other Obsidian features are also not "to spec"

For each current and future Obsidian syntactic feature, the processing pipeline would have to handle its incorporation or removal. (It would be interesting if Obsidian was open source so I could see how they do it with Obsidian Publish.) The core problem is that I will be forever chasing down Obsidian syntax and making sure my own pipeline can handle it appropriately. I fear my pipeline will balloon in complexity over time. This fear has not been realized in years of using my home-grown solution, but still, it hangs over the project.

Probably the most important Obsidian feature that I have not solved for is the embed syntax: ![[...]]. It might be as simple as an embedded iframe, but I worry that isn't a particularly pleasant reading experience. It's in my backlog

Closing notes

I wouldn't be able to do this without Unified.js

The most important thing enabling all this functionality is the unified.js ecosystem, particularly remark.

A lot of the work I'm doing is building on top of the remark-wiki-links library, so shout-out to that lib for making my life easier.

I am tremendously grateful to the unifiedjs team for taking the time to chat with me about my approach. While it can be difficult to get started with the ecosystem, it is empowering to manipulate your own markdown abstract syntax trees. I encourage people to check it out.

Obsidian Publish

If you are using Obsidian and would say your use case is rather straightforward: "I just want some of my vault notes made public, I don't care how", Obsidian Publish might be a good option for you. My desired outcome was not as straightforward: I wanted my content to co-exist with other content on my personal website, and this is why I have pursued this custom path.

Conclusion

I am going to continue working on metamark, remark-obsidian-link, and my pipeline. I am hopeful that my work could lead to easy-to-use utilities that could help others, especially since it seems more and more people are adopting "plaintext-forward" knowledge management tools.

Thanks for reading! :D