Logseq to SilverBullet conversion script

I'm moving my stuff over from Logseq. While both do markdown, there are some quirks in formatting to take care of. The job got done, and I'm sharing the python conversion script here in case someone else is looking to mass-convert their logseq notes to silverbullet. There are some things the script takes care of:

  1. Convert all tabs to 2 spaces
  2. Strip :LOGBOOK: blocks: Removes :LOGBOOK: ... :END: and everything between
  3. Convert #+BEGIN_LABEL/END_LABEL admonitions to > **Label** blockquotes, always at column 0, outline markers stripped from content. SilverBullet does not support indented admonitions, so they must be unindented. I recommend this for nice result.
  4. Strip {{embed ...}}: Removes the entire outline item. I don't use these enough to care about them so you may have to check yours
  5. Convert {{renderer ...}}: Replaces with just the URL found inside, preserving the outline item
  6. Strip HTML tags: Removes lines containing HTML (Logseq link previews, CSS artifacts)
  7. Strip Logseq properties: Removes lines like id:: ...
  8. TODO/DONE/DOING/LATER → checkboxes: DONE becomes [x], TODO/DOING/LATER become [ ]
  9. Strip - from top-level headings: - # Title becomes # Title (indented - # preserved)
  10. Fix double dashes: - - text becomes - text
  11. Dedent orphaned children after admonitions: Children capped at max 2 levels (2 spaces) after an unindented admonition. Higher levels right after admonition break structure in SilverBullet. This is the only thing that might somewhat change the structure of your notes, but until outlined admonitions are fixed, this must be done
  12. Strip empty outline items: Removes bare - or - lines
  13. Clean up: Collapses 3+ blank lines into 1, strips trailing whitespace, removes leading/trailing blank lines

Note that the script below was vibe-coded while making iterations over my personal notes. However, it seems to do the job adequately, assuming you don't utilize something I didn't consider. Check your outputs!

Logseq got me into organized notes for managing my projects and stuff, but the recent developments around it don't look good (abandoned mardown, zero communication, general direction, ...). Not only that, it is extremely slow and lacks self hosting.

#!/usr/bin/env python3
"""Convert a Logseq markdown note to a Silverbullet markdown note."""

import argparse
import re


def convert_tabs(text: str) -> str:
    """Convert all tabs into two spaces each."""
    return text.replace("\t", "  ")


def convert_logbook_blocks(text: str) -> str:
    """Strip :LOGBOOK: ... :END: blocks and everything between them."""
    # Handle optional leading whitespace on the markers
    return re.sub(r'^[ \t]*:LOGBOOK:\n(?:.*\n)*?[ \t]*:END:\n?', '', text, flags=re.MULTILINE)


def convert_admonitions(text: str) -> str:
    """Convert #+BEGIN_XYZ ... #+END_XYZ blocks to markdown admonitions.

    Handles two forms:
    1) Outline items: "- #+BEGIN_TYPE" ... "  #+END_TYPE" (no dash on END)
    2) Inline blocks:  "  #+BEGIN_TYPE" ... "  #+END_TYPE"
    """
    labels = {
        'WARNING': 'Warning',
        'TIP': 'Tip',
        'IMPORTANT': 'Important',
        'NOTE': 'Note',
    }

    def _build_admonition(block_type, inner_text, leading_indent):
        label = labels.get(block_type.upper(), block_type.upper())

        content_lines = inner_text.strip().splitlines()
        admonition_lines = [f"> **{label}** *{label}*"]
        for cl in content_lines:
            admonition_lines.append(f"> {cl.strip()}")

        # Admonitions are always completely unindented (column 0)
        return "\n".join(admonition_lines) + "\n"

    def _strip_outline_from_lines(raw):
        lines = []
        for line in raw.splitlines():
            stripped = re.sub(r'^[ \t]*- ', '', line)
            stripped = stripped.strip()
            if stripped:
                lines.append(stripped)
        return "\n".join(lines)

    # Pattern 1: Outline-style with "- #+BEGIN_TYPE" and "  #+END_TYPE"
    # The END marker does NOT have a "- " prefix in logseq format
    pattern1 = re.compile(
        r'^([ ]*)- #\+BEGIN_(\w+)\n'
        r'(.*?)'
        r'^([ ]*)#\+END_\w+\n?',
        re.MULTILINE | re.DOTALL,
    )

    def _replace1(m):
        indent = m.group(1)
        block_type = m.group(2)
        content = m.group(3)
        inner = _strip_outline_from_lines(content)
        return _build_admonition(block_type, inner, indent)

    text = pattern1.sub(_replace1, text)

    # Pattern 2: Inline admonitions (no "- " prefix on either marker)
    pattern2 = re.compile(
        r'^([ ]*)#\+BEGIN_(\w+)\n'
        r'(.*?)'
        r'^([ ]*)#\+END_\w+\n?',
        re.MULTILINE | re.DOTALL,
    )

    def _replace2(m):
        indent = m.group(1)
        block_type = m.group(2)
        content = m.group(3)
        inner = _strip_outline_from_lines(content)
        return _build_admonition(block_type, inner, indent)

    text = pattern2.sub(_replace2, text)

    return text


def convert_embeds(text: str) -> str:
    """Strip {{embed ...}} blocks entirely."""
    return re.sub(r'^[ \t]*- \{\{embed\s+.*?\}\}[ \t]*\n?', '', text, flags=re.MULTILINE | re.DOTALL)


def convert_renderers(text: str) -> str:
    """Strip {{renderer ...}} but keep the URL found inside."""
    def _extract_link(m):
        leading = m.group(1)
        inner = m.group(2)
        url_match = re.search(r'https?://\S+', inner)
        if url_match:
            return f"{leading}- {url_match.group(0)}\n"
        return ""

    # Match the renderer block including its leading indent and dash
    return re.sub(
        r'^([ \t]*)- \{\{renderer\s+(.*?)\}\}[ \t]*\n?',
        _extract_link,
        text,
        flags=re.MULTILINE | re.DOTALL,
    )


def strip_html_and_artifacts(text: str) -> str:
    """Remove lines containing HTML tags (Logseq link previews, etc.).

    Keeps lines that are pure markdown links/images.
    """
    lines = text.splitlines()
    result = []
    for line in lines:
        if re.search(r'<[^>]+>', line):
            continue
        if 'link_preview' in line or 'var(--ls-' in line:
            continue
        result.append(line)
    return "\n".join(result) + "\n"


def strip_logseq_properties(text: str) -> str:
    """Remove Logseq-specific properties like 'id:: ...' lines."""
    lines = text.splitlines()
    result = []
    for line in lines:
        stripped = line.strip()
        if re.match(r'^\w+::\s*\S', stripped):
            continue
        result.append(line)
    return "\n".join(result) + "\n"


def convert_todo_keywords(text: str) -> str:
    """Convert TODO/DONE/DOING/LATER keywords to checkbox syntax."""
    lines = text.splitlines()
    result = []
    for line in lines:
        m = re.match(r'^(\s*- )?(DONE|TODO|DOING|LATER)([ ]+.*)$', line)
        if m:
            prefix = m.group(1) or '- '
            keyword = m.group(2)
            rest = m.group(3).lstrip()
            checkbox = '[x]' if keyword == 'DONE' else '[ ]'
            result.append(f"{prefix}{checkbox} {rest}")
        else:
            result.append(line)
    return "\n".join(result)


def clean_outline_under_headings(text: str) -> str:
    """Remove the leading '- ' outline marker from top-level headings only.
    Indented headings (under outline items) keep their '- ' prefix."""
    lines = text.splitlines()
    result = []
    for line in lines:
        m = re.match(r'^- #(.*)$', line)
        if m:
            rest = m.group(1)
            result.append(f"#{rest}")
        else:
            result.append(line)
    return "\n".join(result)


def fix_dangling_dashes(text: str) -> str:
    """Fix lines that have double dashes like '  -  - text' -> '  - text'."""
    lines = text.splitlines()
    result = []
    for line in lines:
        fixed = re.sub(r'^(\s*)-+\s*-+\s+', r'\1- ', line)
        result.append(fixed)
    return "\n".join(result)


def fix_orphaned_children(text: str) -> str:
    """After admonition conversion, children of the original outline item
    may be orphaned. Dedent them so they're at most 2 levels deep (4 spaces)."""
    lines = text.splitlines()
    result = []
    i = 0
    while i < len(lines):
        line = lines[i]
        result.append(line)

        # Check if this line is the start of an admonition block
        if re.match(r'^> ', line):
            # Collect and append remaining admonition lines
            j = i + 1
            while j < len(lines) and re.match(r'^> ', lines[j]):
                result.append(lines[j])
                j += 1

            # Now process children that follow the admonition
            while j < len(lines):
                next_line = lines[j]
                m = re.match(r'^(\s*)- ', next_line)
                if m:
                    child_indent_len = len(m.group(1))
                    # Cap at 2 levels (2 spaces). Children at column 0 or
                    # 2 spaces are fine as-is.
                    if child_indent_len > 2:
                        dedented = "  " + next_line[child_indent_len:]
                        result.append(dedented)
                    else:
                        result.append(next_line)
                    j += 1
                else:
                    break
            i = j - 1

        i += 1

    return "\n".join(result)


def strip_empty_outline_items(text: str) -> str:
    """Remove outline items that have no content (just '-' or '- ' with nothing after)."""
    lines = text.splitlines()
    result = []
    for line in lines:
        if re.match(r'^\s*-\s*$', line):
            continue
        result.append(line)
    return "\n".join(result) + "\n"


def clean_empty_lines(text: str) -> str:
    """Remove excessive blank lines and trailing whitespace."""
    text = re.sub(r'\n{3,}', '\n\n', text)
    lines = text.splitlines()
    lines = [line.rstrip() for line in lines]
    while lines and lines[0].strip() == '':
        lines.pop(0)
    while lines and lines[-1].strip() == '':
        lines.pop()
    return "\n".join(lines) + "\n"


def convert(text: str) -> str:
    """Run all conversion passes."""
    # 1. Tabs first (everything else assumes spaces)
    text = convert_tabs(text)
    # 2. LOGBOOK blocks (before other processing)
    text = convert_logbook_blocks(text)
    # 3. Admonitions
    text = convert_admonitions(text)
    # 4. Embeds
    text = convert_embeds(text)
    # 5. Renderers (keep links)
    text = convert_renderers(text)
    # 6. HTML tags and artifacts
    text = strip_html_and_artifacts(text)
    # 7. Logseq properties (id::, etc.)
    text = strip_logseq_properties(text)
    # 8. TODO/DONE/DOING keywords
    text = convert_todo_keywords(text)
    # 9. Outline markers under headings
    text = clean_outline_under_headings(text)
    # 10. Fix double dashes
    text = fix_dangling_dashes(text)
    # 11. Fix orphaned children after admonitions
    text = fix_orphaned_children(text)
    # 12. Strip empty outline items
    text = strip_empty_outline_items(text)
    # 13. Cleanup
    text = clean_empty_lines(text)
    return text


def main():
    parser = argparse.ArgumentParser(
        description="Convert a Logseq markdown note to Silverbullet format."
    )
    parser.add_argument("input", help="Input Logseq markdown file")
    parser.add_argument("output", help="Output Silverbullet markdown file")
    args = parser.parse_args()

    with open(args.input, "r", encoding="utf-8") as f:
        text = f.read()

    result = convert(text)

    with open(args.output, "w", encoding="utf-8") as f:
        f.write(result)

    print(f"Converted {args.input} -> {args.output}")


if __name__ == "__main__":
    main()

I converted my Logseq files with a similar script and ran into the issue that I used Logseq block references a fair bit. I ended up manually changing them to use SB sub headings which, although not exactly the same concept, I found the use of sub headings better as they showed up in the automatic tables of content. I am loving space lua scripting for querying rather than Dataview or JavaScript in Logseq, much easier and I haven't yet found something I couldn't achieve.
Cheers