Docker image: included software

zef · March 8, 2025, 9:27am

Thus far the docker edition of SilverBullet is relatively bare-bones, it’s a Debian based image with Deno. At some point I also added git and an ssh-client to it (I needed that).

Yesterday I was experimenting with Pandoc which is a pretty nice CLI program to turn markdown documents into e.g. epubs, PDFs etc. Generating PDFs from your notes has been a requested feature for some time, and simply leveraging pandoc can make this very easy to implement.

My initial prototype implementation is literally this:

```space-lua

command.define {
  name = "Pandoc: Produce",
  run = function()
    local target = editor.prompt("File name")
    shell.run("pandoc", {"-o", target, editor.getCurrentPage() .. ".md"})
    editor.flashNotification "Done!"
  end
}
```

However, this begs the question: how “fat” should we make the docker image? To make this work in my main environment, I’d now have to include pandoc as well as texlive (for PDF generation) in the image as well, this adds perhaps 50MB or perhaps even 100MB (not easy to tell) to the docker image. Is that a problem? Is there a better way to manage this?

One alternative option is that you simply check if the tool is available every time, and apt install it on the if not. Perhaps that’d be better?

Opinions welcome.

nikolasdi · March 8, 2025, 11:36am

If it is only a matter of installing Pandoc, when it is not already installed, then it seems to me it is a very simple thing to do. Why package it with SilverBullet?

zef · March 8, 2025, 1:03pm

Yeah, even writing this down made the solution kind of obvious. The only annoying thing is that this would result in a lot of (re)installations, since the installed software won’t persist between SB versions. Since I’m on the edge builds and sometimes I merge stuff daily, that means every day all this stuff would have to redownload.

I’m likely an edge case (hah) but still.

zeus-web · March 8, 2025, 1:46pm

kind of off-topic:
If you convert pages with pandoc, do you use “raw” md content only? What about the results of queries or LUA outputs, etc. ?
At first I had in mind the publish PLUG, which exports a whole page to a complete md file.
Is something like that possible, too with your LUA command from above?

That would be even better (or at least simpler for me) for some of my current ideas:
) like publish some pages to webspace without using the PLUG (as in my first experiments it lacks a few features, like adding pictures to the export).

Another idea is just like your pandoc example, but I’m experimenting with generating Presentations with “Marp”.

nikolasdi · March 8, 2025, 2:33pm

Ah, yes, you are right. In that case, I don’t know, I have my SilverBullet space available on my pc and pandoc is already there, so it is just a terminal command away to generate pdf from md.

zef · March 8, 2025, 4:16pm

There are ways to render the page to “clean” markdown. Have to check how to easily use this functionality. It’s already being used for sharing for instance (Cmd-s/Ctrl-s).

MrMugame · March 8, 2025, 5:20pm

Couldn’t you make a directory, which is added to the PATH inside of the container and is mountable with docker? Something like “/apps” or “/programs”? I’m no docker expert but it should just work at least for pandoc. As it’s seems to be fully statically linked. And I would guess most of the stuff that finds use cases here can be (or is already) statically linked without bigger issues.

zef · March 8, 2025, 5:31pm

Sure, that’s already possible today. I just want to have something that’s easy to setup.

dmick1954 · March 9, 2025, 4:04pm

In my case, converting a Silverbullet page to a pdf doesn’t really need to be done inside of silverbullet. A simple terminal command does the trick. Why incorporate that functionality into Silverbullet at all? As I understand it, Silverbullet isn’t able to read PDFs at this time anyway. I see no purpose in doing this inside of Silverbullet. It has no real bearing on the functionality of the PKM or notes in Silverbullet from my perspective. An entry in the documentation with instructions how to do this should be enough. I think the KISS principle would apply here.

MrMugame · March 9, 2025, 4:32pm

I guess the workflow of just having a one key press command is a little faster than Download → Open Terminal → Think about and type pandoc command → PDF.
Tbh it doesn’t really matter in my opinion if pandoc is bundled in the docker image. I have docker images which are close to 1GB or even more and I’ve never really noticed big differences in working with them, let alone have space concerns on my sever.
My concern is primarily that having it directly bundled doesn’t really seem hackable or customizable for other apps which someone may need and it also isn’t really “scalable”.

aorith · March 9, 2025, 5:18pm

If the image size is an issue, you could have two tagged flavors of the docker image, being the one with pandoc and textlive tagged as “full”

username_smusername · March 11, 2025, 9:08am

~~Docker Compose~~ Docker Files! It’s still an installation of each piece of software per new version, but you don’t have to do it yourself

MrMugame · March 11, 2025, 11:13am

This seems excessively complicated. Completely hand building an image everytime. I don’t exactly know how zef goes about reinstalling, but I guess he modifies and existing image using docker exec and then docker commit to reuse it. Which is a lot easier than getting a docker file, modifying it, building and so on.

And about that

I think it is decently easy to set up (search executable, wget, add volume to docker-compose/command) and the big advantage is that it persists between images without any work and doesn’t change the image size

zef · March 11, 2025, 11:49am

This is a potential rat hole not worth going into, but if we want to solve the general case: when using package managers like APT to install software, bits and pieces fly all over the disk (/usr/bin, /usr/lib etc.) it’s not so easy to localize that and mount it in. There’s alternative package managers that are better suitable for this stuff, but again: this is not a business I have interest getting into.

Most users will not be upgrading SB every day, so my short-cut proposal is along the lines of:

Have a “ensure package” type API that checks if a particular package is installed, if not, installs it right then and there (think apt install -y <package>).
Call the package

Conceptually:

shell.ensurePackages("pandoc", "texlive")
shell.run("pandoc", ...)

That should do.

Matt · March 12, 2025, 6:15pm

You could also just hook into the browser if you want a PDF. Just call the browsers print function, then clean up the page a bit, safe as pdf, viola: PDF.

JS: window.print()
CSS (to clean up the page): @media print { do what you want to hide the top header stuff, etc}

As an example.

zef · March 12, 2025, 6:40pm

Yeah there is even a print button in the markdown preview if I remember correctly. I’m actually using pandoc to generate epubs.

ethan · March 12, 2025, 10:15pm

This approach is also used by linkding, which offers “latest” and “latest-plus” images (as well as experimental Alpine versions of both)

They use a simple Bash script and two Dockerfiles - one for the Debian images and another for the Alpine images - to build all four versions.

CaffeineFueled · March 14, 2025, 9:35pm

I’d love to have Pandoc in the Docker container. Provides great features that add value to it. Keeping it slim is fine, but I’d consider Pandoc essential.

Different Docker versions+tags with various functions could work, but not sure how much work it would be (pipeline + documentation). I’d stick to one image tho.

JPDVM2014 · March 16, 2025, 2:34am

I don’t know how it is implemented, but linuxserver.io has plugins (docker mods) for their images, so you can just add-on whatever you need. Might be more effort than it is worth though.

Maarrk · May 8, 2025, 2:49pm

I think that one of big reasons why Plugs were so nice to develop was that first-party code was also using plugs. I think that now we’re in an opposite situation - pandoc is included because specifically @zef is using it, but there’s no standard library features depending on it, documentation etc.

I would like to either:

Make pandoc a more integrated part of silverbullet to clarify the arbitrary choice: add this snippet to standard library, provide a command for PDF export, document both on website
Figure out a system for installing packages by users, and make the pandoc a local choice
Don’t try to integrate anything in the SilverBullet itself, but figure out a way to compose it with other containers

I wonder if it would be possible to use Lua Rocks instead of apt? The Docker image being based on Debian is just an implementation detail, while Lua is a key part of the application now. There seem to be a few options that can make PDFs, I don’t see anything for epub though.

Having briefly searched for “run a command in other container”, you might need to have your own, slightly modified pandoc container to run commands triggered by HTTP (for example with this Python library), but I feel like this would be the ultimate extensibility solution.