I love Notion, I envy their illustrations more than I care to admit… but thats besides the point. I think it’s a great tool which offers an incredible amount of flexibility to tackle a vast range of tasks. Add some collaboration in there with a sprinkle of organisation and you’ve got something special. I thought great let’s collaborate on some boring document work and we can just export it to markdown, upload it and revel in our mastery of the internets. However… A bit of a snag was hit.
Take a humble Notion export and look at it twice and you’ll probably notice some things you didn’t expect. One assumption I made was that a table would just become a simple markdown table… but no, that’s not the case. Instead, what you’re left with is a link to a .csv file. I mean, call me old fashioned but I put my table on the page for a reason, I don’t want to be clicking off to another page with a tiny table front and centre. So I thought, I know things, this can probably be fixed. Then comes a long Jason Lengstorf’s How to Modify Nodes in an Abstract Syntax Tree and my introduction to the AST.
Abstract Syntax Trees (AST) and Hypertext Abstract Syntax Trees (Hast) are pretty much the same, the only difference being we use “types” to identify elements in AST and “tagNames” in Hast. That’s it, easy right? As an overall concept, AST’s are generally used for turning Markdown content into HTML markup but that’s not all… we can take our tree, analyze it and transform/manipulate any of it we like. I want to give you a more visual understanding so I’ll throw together something more visual
Take a bit of markdown like so:
And from that our AST would look like the below:
As a Hast this would look like:
Note: We will be using Hast from here on out but have a play and see what you prefer. I couldn’t find one for Hast but check out the AST explorer, throw in some Markdown content and familiarise yourself with the AST equivelent.
Let’s keep our setup super simple and create a package.json using
We will also create a
script.js as our main script file and a plugins directory where we can add our plugins.
So you should have the below:
This simple setup gives us a good starting point to construct our script.
So what is Unified? Unified is a project that will do a tremendous amount of heavy lifting for us. Through the power of open source, they have created an easy to use interface to interact and manipulate syntax trees. It sits at the centre of Rehype (HTML), Remark (Markdown) and Retext (Natural Language… whatever that is) and it’s this project that allows MDX to add JSX to markdown files, which I didn’t know, but thought it was pretty cool.
In terms of a mental model think of unified as being the starting block in your lego construction, each piece of functionality can be attached to that block, but you need that block for everything to work. It’s the oven that brings all of the ingredients together. For us we will be attaching Remark (for our Markdown), Rehype (for our HTML) and our custom plugin to convert CSV links to simple tables.
Below is the basic structure of what we need and then we can start to flesh it out from there.
Here we read the markdown and pass it into the unified ecosystem. Then we add our HTML and markdown plugins: remark-parse to parse the markdown, remark-rehype to turn the markdown into an HTML tree and rehype-stringify to generate the HTML markup to eventually output.
So now we have added the basics we can start to add our plugin. First, we need to create a
link-to-table.js file in our plugins directory and then export a module
Notice how the tree is passed in as an argument we can use that in our plugin? Next, we need to import and use our plugin so we can add that to our main script file
So, we have added our custom plugins to our unified workflow, the problem is it doesn’t exactly do anything at the moment. So let’s add that.
Jumping back to our custom plugin we now need to traverse the tree and grab all of the a tags that link to a CSV file. To do this we need to use the
unist-util-visit-parents package, which is an unist utility to find nodes.
Here we are almost running a test on each node. First, we are checking if it’s an
a tag and secondly whether it contains
.csv in its href string. If it satisfies our requirements it gets passed to the callback, otherwise, it's ignored. And just like that we have all of the nodes we need to convert to tables.
Now we need to read the data from the CSV before we can create the table, we do this by using the npm package
parser and utilise nodes very own
We are reading the CSV data using readFileSync, parsing that data into an array and then destructuring the array and assigning tableHeaders and tableRows. Destructing at this point makes it a little easier for us later on.
Since we now have the CSV data nicely formatted for us and assigned to variables we can now map those and create our table markup. By using template literals here it makes it super easy for us to create the markup, imagine having to concatenate a load of strings (no thank you).
So now we have our table markup but we need to replace the a tag node with our table. At the moment we can’t do this because nodes need to be replaced with nodes, and at the moment our table is a string of HTML content. So let’s turn our table string into a node.
We are taking advantage of the
hast-util-from-parse5 packages here to create our node. Parse5 to parse our HTML string and
hast-util-from-parse5 to turn the HTML structure into a hast node.
Finally, we can directly manipulate the node attributes which replaces the link node with our newly created table node.
As a general rule you wouldn’t normally mutate the global state but in the context of AST’s it has become pretty much the standard, especially when some AST’s can be pretty huge!
Now if we run the script and check our outputted HTML you should see that our table has been outputted as we expected.
We have now achieved the status of ultimate AST tree wrangler.
And that’s all folks! So now you should have a somewhat solid understanding of ASTs and how we can use them to manipulate Markdown. The possibilities with this technique are pretty much endless and what that allows us to do is focus on creating simple, readable content in markdown and add the flair and finesse later on in the build process. I do hope this has helped in some shape or form and of course if you deem something untrue and in need of amending in this article let me know and as always thanks for taking the time to read this article.
As always here are a few links I found useful whilst writing this post: