Introduction
My website, gatlin.io, is sourcing some content from my Obsidian vault. It is an exciting setup, in my opinion, because I can create content in Obsidian and then choose if I want to make the note public or not.
It is by no means a perfect, but I think it is conceptually sound, and as such, I thought I'd share my technical setup for anyone who finds it interesting or wants to do the same.
Preliminary Points
- My website is a Nextjs website hosted on Vercel.
- On my local machine, my website repo and my Obsidian vault are sibling directories.
- My Obsidian vault has a flat structure; essentially all files are siblings.
TL;DR
In my website repo I have a script iterate through my vault files. Per file I parse the frontmatter for a public boolean, compile a public-set, then iterate through the public-set, this time turning the files into structured data via the unifiedjs ecosystem. I then write that data to a large contents.json
file in my repo that becomes the "source of truth" for that content.
Obsidian Frontmatter
To begin with, I tag my content using a typical approach of yaml frontmatter. Most notably, when a file is ready to be viewed, I mark it public: true
. Here's an example of some of the more important metadata items.
---
public: true
tags: [note]
created: ...
updated: ...
published: ...
---
I parse this data with gray-matter. If public: true
both exists and is true
, then I add the file name to a set of public files. This is all I do on the first pass.
The reason for a public-set
Why have a public-set? Why loop through files twice? Because, Obsidian is based on wiki links, and as such, one file will reference an arbitrary number of other files within Obsidian. How to handle these wiki links depends on whether those other files are public.
In short: you need to compile a manifest of public files before you actually start parsing files so you know what to do when you hit an arbitrary wiki link.
Parse public files
After I have collected the set of public files, I iterate through that set and parse each file. There's a lot going on here, so I ended up creating a tool called metamark to encapsulate all the various utilities. (It is not necessarily a robust tool at the moment, but I hope to grow it into a general-purpose tool useful for other people who want to "parse markdown, I don't care how". But, until v1, it will mostly have a bunch of too-strong opinions related to my personal setup.)
Metamark generates a variety of things. The TypeScript interface for a Mark
payload is shown below.
export interface Mark {
page: string;
slug: string;
toc: { title: string; depth: number; id: string }[];
firstParagraphText: string;
frontmatter: any;
html: string;
}
Finally, I take my list of generated data and write it to a contents.json
file. Nextjs will then read that in and it becomes a data source.
Here is a snippet of the build-script I'm currently using to generate my contents.json
file. (The metamark API might change in the future.)
#! /usr/bin/env node
import { readdirSync, readFileSync, writeFileSync } from 'fs'
import { Metamark } from 'metamark'
import path from 'path'
const dirPath = '../vault'
const dirEntries = readdirSync(dirPath, { withFileTypes: true })
const pageAllowSet = new Set()
const filePaths = []
dirEntries.forEach((dirEntry) => {
if (dirEntry.isFile()) {
const filePath = path.join(dirPath, dirEntry.name)
const { name: page } = path.parse(filePath)
const rawMd = readFileSync(filePath, 'utf8')
const frontmatter = Metamark.getFrontmatter(rawMd)
if (frontmatter?.public) {
pageAllowSet.add(page)
filePaths.push(filePath)
}
}
})
const marks = Metamark.getMarks(filePaths, pageAllowSet)
const jsonContents = JSON.stringify(marks, null, 2)
writeFileSync('./contents.json', jsonContents)
Serving the content in Nextjs
I will keep this section brief since I covered some of it before in a previous blog post, Nextjs Markdown Blog Setup.
- I serve all my content - blogs, guides, and notes - through a single
/content/[slug]
route, which is inspired by the path structure of Wikipedia. - I use
getStaticPaths
to read all the content slugs at compile time. - Those slugs get passed via
getStaticProps
. I use the slug as a "soft key" to find the rest of the content. I then prune the content down into whatever is needed for that particular page. This is also done at compile time.
The Upsides
- My content is decoupled from my codebase.
I was initially writing content within my website repo. I also had an Obsidian vault. Keeping them in-sync was not feasible. I had WIP content in one, experimental content in another, and copied content from both in the other one. It was unpleasant. Having my content all in one place is honestly relieving. It even feels good to type. Whew!
- I write all my notes, at any stage of development, in my Obsidian vault. I write all my code, at any stage of development, in my website repo.
I used to manage in-progress content and in-progress code changes all within my website repo. One does not write content the way one writes code. So, in brief, it was ghastly. Dirty working trees were the abysmal commons. But, now, I can just go to to my website repo when I want to change the code on the website. Then, when my code is in a good (enough) spot, I can just run my data-import script when I want to update content.
I don't have to manage works in progress. Everything can be a work in progress until I'm done with it. I can then simply flip public: false
to public: true
. Now, the next time I run my import script, it will regenerate contents.json
and all will be well in the world.
- Independent table of contents, as data
In the past I have disliked using css-fu or other complicated strategies to manipulate a ToC within a page. It was hard to correctly inject a ToC and it was hard to correctly manipulate the ToC display, i.e., get it where I wanted within the page.
In short: I wanted more control over what I did with my ToC. Turning it into an in-memory object allows me to manipulate the display of it however I want.
- Arbitrary access to metadata
I read in the entire frontmatter and turn it into JSON, which I can do anything with. In the short term I'd like to display the aliases for pages in a clever way. In the long term... well, I don't know, but it feels powerful to have potential!
The Downsides
- Realtime edits are not viewable in dev.
In my old setup, where I edited content within my website repo, a simple refresh showed my content "as the user would experience it". This was a fast feedback loop.
Now, I have to run my import script, restart my dev env, and then reload the page.
My old "editor hat" workflow was reading it on the website to get a sense of the reading experience, and from that making edits, small or large. I now lack fast-feedback access to that "reader experience". I will have to restructure my "editor hat" workflow to accomodate.
That said, the decoupling mentioned above is unquestionably worth it, in my opinion. It might just be a new problem to solve, perhaps via a watcher that reruns on vault changes.
- Wiki links are not "to spec"
Wiki links are not in the markdown specification. They can only be parsed by specialised tools, like Obsidian itself. This makes handling them non-trivial.
For example, how to handle non-public wiki links is an odd issue. I currently remove non-public links. But, that affects the reading experience. The way you write a sentence in anticipation of a wiki link is different than how you write a "normal" sentence. The way you write content for a website is different than the way you write content for a Personal Knowledge and Information Management Tool.
For example, the way this very blog post opens doesn't acknowledge the truth that if anyone besides me is reading this, they are doing it from my website. But, still, I talk about my website like neither my reader nor myself are currently on it! I could change the perspective, but I'm not sure I want to because of the effect it would have on my Obsidian writing experience. Hence the dilemma.
Another example is two paragraphs up, where I have weirdly capitalized the phrase Personal Knowledge and Information Management Tool. Why? Because it's a wiki link that got removed! The upside is that if I ever make that link public it will magically become a link. The downside is the odd grammatical errors. I could probably fix this one with a string manipulation improvement in my pipeline though...
One last point, I manipulate wiki links via metamark, which takes in a function to do so. Metamark itself passes that function to a remark lib I wrote with a similar API, remark-obsidian-link. I'm not in love with the API, but can't think of a cleaner approach at the moment.
- Other Obsidian features are also not "to spec"
For each current and future Obsidian syntactic feature, the processing pipeline would have to handle its incorporation or removal. (It would be interesting if Obsidian was open source so I could see how they do it with Obsidian Publish.) The core problem is that I will be forever chasing down Obsidian syntax and making sure my own pipeline can handle it appropriately. I fear my pipeline will balloon in complexity over time.
Obsidian embed syntax, !\[\[...\]\]
, is one of the more important features to incorporate. It should be as simple as an embedded iframe, but that isn't per-se a good reading experience, so I choose to only address links at this time.
Unified.js is the bees knees
The most important thing enabling all this functionality is the unified.js ecosystem, particularly remark.
A lot of the work I'm doing is building on top of the remark-wiki-links library, so shout-out to that lib for making my life easier.
I am tremendously grateful to the unifiedjs team for taking the time to chat with me about my approach. While it can be difficult to get started with the ecosystem, it is empowering to manipulate your own markdown abstract syntax trees. I encourage people to check it out.
Conclusion
I am going to continue working on metamark, remark-obsidian-link, and my pipeline. I like but don't love my current solution, but it is good enough for now. I am hopeful that my work could lead to easy-to-use utilities that could help others, especially since it seems more and more people are adopting "plaintext-forward" knowledge management tools.
Thanks for reading! :D
PS: Obsidian Publish seems nice
If you are using Obsidian and would say your use case is rather straightforward: "I just want some of my vault notes made public, I don't care how", Obsidian Publish might be a good option for you.