Print a readable markdown ParseTree for a page

While I was working on the ToC widget, I needed a way to explore a page’s markdown parse tree as returned by markdown.parseMarkdown(text). At the time I printed everything to the console and often got lost in nested squiggly brackets or confused by newlines printing literal newlines in the console.

Anyway, I cleaned up that code and shoved it in a github gist. Here’s the space-lua in case I mess with the gist permissions in the future:

util = util or {}

function printTree(tree, level)
  -- recursive
  local indentation = string.rep(" ", level * 2)
  local response = "\n" .. indentation
  local typesToSkipChildren = {
    "EmphasisMark",
    "CodeMark",
    "LuaDirectiveMark",
    "TaskMark",
    "WikiLinkMark",
  }
  
  if tree.type then
    response = response .. tree.type .. " (" .. tree.from ..  " to " .. tree.to .. ")"
    if not table.includes(typesToSkipChildren, tree.type) then
      response = response .. ": "
    end
  else
    response = response .. "Leaf: "
  end
  
  if tree.text then
    local text = string.gsub(tree.text, "\n", "<newline>")
    response = response ..  text
  end
  
  if tree.children and not table.includes(typesToSkipChildren, tree.type) then
    for child in tree.children do
      response = response .. printTree(child, level + 1)
    end
  end
  
  return response
end

util.prettyPrintMarkdown = function(page)
  parsedPage = markdown.parseMarkdown(space.readPage(page))
  return printTree(parsedPage)
end

It takes a page name as input and returns a pretty-enough text string to dump in a lua expression. Here’s a screenshot of ${util.prettyPrintMarkdown('test')}:

I called the ends of the tree branches ‘Leaf’. I stopped recursion on a few of the tag types that end in “Mark” because they were just characters I needed to sanitize anyway. I substituted \n with <newline>. There still seem to be some special characters sneaking through to the output, but it’s good enough for me at this point.

Long code blocks don’t look good because they’re shoved into CodeText.

1 Like

Nice! There is also the Debug: Parse Document command which prints the AST to your browser’s JS console, but what you did here is nicer.

1 Like

Oh that’s much better than what I was doing. I should have known there was a command!

Well. In your defense: it’s not documented anywhere other than in my head (I needed it)