While I was working on the ToC widget, I needed a way to explore a page’s markdown parse tree as returned by markdown.parseMarkdown(text)
. At the time I printed everything to the console and often got lost in nested squiggly brackets or confused by newlines printing literal newlines in the console.
Anyway, I cleaned up that code and shoved it in a github gist. Here’s the space-lua in case I mess with the gist permissions in the future:
util = util or {}
function printTree(tree, level)
-- recursive
local indentation = string.rep(" ", level * 2)
local response = "\n" .. indentation
local typesToSkipChildren = {
"EmphasisMark",
"CodeMark",
"LuaDirectiveMark",
"TaskMark",
"WikiLinkMark",
}
if tree.type then
response = response .. tree.type .. " (" .. tree.from .. " to " .. tree.to .. ")"
if not table.includes(typesToSkipChildren, tree.type) then
response = response .. ": "
end
else
response = response .. "Leaf: "
end
if tree.text then
local text = string.gsub(tree.text, "\n", "<newline>")
response = response .. text
end
if tree.children and not table.includes(typesToSkipChildren, tree.type) then
for child in tree.children do
response = response .. printTree(child, level + 1)
end
end
return response
end
util.prettyPrintMarkdown = function(page)
parsedPage = markdown.parseMarkdown(space.readPage(page))
return printTree(parsedPage)
end
It takes a page name as input and returns a pretty-enough text string to dump in a lua expression. Here’s a screenshot of ${util.prettyPrintMarkdown('test')}
:
I called the ends of the tree branches ‘Leaf’. I stopped recursion on a few of the tag types that end in “Mark” because they were just characters I needed to sanitize anyway. I substituted \n
with <newline>
. There still seem to be some special characters sneaking through to the output, but it’s good enough for me at this point.
Long code blocks don’t look good because they’re shoved into CodeText
.