Print a readable markdown ParseTree for a page

wbh · April 29, 2025, 6:50pm

While I was working on the ToC widget, I needed a way to explore a page’s markdown parse tree as returned by markdown.parseMarkdown(text). At the time I printed everything to the console and often got lost in nested squiggly brackets or confused by newlines printing literal newlines in the console.

Anyway, I cleaned up that code and shoved it in a github gist. Here’s the space-lua in case I mess with the gist permissions in the future:

util = util or {}

function printTree(tree, level)
  -- recursive
  local indentation = string.rep(" ", level * 2)
  local response = "\n" .. indentation
  local typesToSkipChildren = {
    "EmphasisMark",
    "CodeMark",
    "LuaDirectiveMark",
    "TaskMark",
    "WikiLinkMark",
  }
  
  if tree.type then
    response = response .. tree.type .. " (" .. tree.from ..  " to " .. tree.to .. ")"
    if not table.includes(typesToSkipChildren, tree.type) then
      response = response .. ": "
    end
  else
    response = response .. "Leaf: "
  end
  
  if tree.text then
    local text = string.gsub(tree.text, "\n", "<newline>")
    response = response ..  text
  end
  
  if tree.children and not table.includes(typesToSkipChildren, tree.type) then
    for child in tree.children do
      response = response .. printTree(child, level + 1)
    end
  end
  
  return response
end

util.prettyPrintMarkdown = function(page)
  parsedPage = markdown.parseMarkdown(space.readPage(page))
  return printTree(parsedPage)
end

It takes a page name as input and returns a pretty-enough text string to dump in a lua expression. Here’s a screenshot of ${util.prettyPrintMarkdown('test')}:

I called the ends of the tree branches ‘Leaf’. I stopped recursion on a few of the tag types that end in “Mark” because they were just characters I needed to sanitize anyway. I substituted \n with <newline>. There still seem to be some special characters sneaking through to the output, but it’s good enough for me at this point.

Long code blocks don’t look good because they’re shoved into CodeText.

zef · April 29, 2025, 8:50pm

Nice! There is also the Debug: Parse Document command which prints the AST to your browser’s JS console, but what you did here is nicer.

wbh · April 29, 2025, 10:53pm

Oh that’s much better than what I was doing. I should have known there was a command!

zef · April 30, 2025, 5:51am

Well. In your defense: it’s not documented anywhere other than in my head (I needed it)