I’ve been using SilverSearch for a while now and I have to say that it works beautifully - except for some of the “Some advanced tips” parts.
ext:png → 0 results, while I do have a lot of documents in .png format (searching for plain “png” returns all the places where they’re used) - maybe document names are not indexed somehow?
path:“some path” does not return anything, ever
-exclusions took me a while to understand that it must be written as -“words I want to exclude” and not -exclusions “words I want to exclude”
The important part here is, that you always have to add some text to search for, so you can’t just get all files under a specific path, or all files with a specific extension. This is a restriction by the minisearch api, which has <insert long github comment> reasons. I could work around that, but it introduces some questions like “What would the sort order be?” and omnisearch doesn’t do it. If you really only want all files e.g. from a specific extension you could do ext:pdf pdf.
As for document names not being indexed: Right now a document for which Silversearch can’t retrieve content, will not be indexed. So if you want png files to be indexed you could do
event.listen {
name = "silversearch:index",
run = function (event)
local meta = event.data.meta
if (meta.contentType != "image/png") then
return
end
return { content = "" }
end
}
Which will just return an empty string as the content for png files. I thought about making an option which would do this by default, would that be useful?
Great plug, high quality search is very valuble, thank you for sharing!
I have a question: a string I have appears exactly this way in a certain page: problems with the existing setup. However, SilverSearch’s first 14 results are other pages which have substrings like problems and existing problems. The 15th result is the page with the entire string. Is this something I’m doing wrong or something that could be improved? I tried after freshly reindexing. Thanks again!
Separately, given it is all done within the browser, also wondering: how well would this scale to, say 10k .md files and say 30 MB of data on a mobile phone? Wondering if anyone has tried it out and has a reference point to share.
Probably not super great. I haven’t really tried pushing it yet, just because I haven’t really found good markdown data to test it with, but there are definitely a few things which are sup optimal to performance, so I don’t expect it to scale tooo well. Two big bottlenecks would probably be the search index, which for storage across sessions is written into a big JSON blob. So this is probably not very performant at scale. Second thing is that Silversearch stores all documents in RAM at runtime, to be able to generate these little excerpts. So if your space is 30MB (just markdown), it’s going to need 30MB of RAM at least (It’s not quite as easy as that as the documents are loaded lazily, but this is roughly correct, especially if you search a lot).
I have a question: a string I have appears exactly this way in a certain page: problems with the existing setup. However, SilverSearch’s first 14 results are other pages which have substrings like problems and existing problems. The 15th result is the page with the entire string. Is this something I’m doing wrong or something that could be improved? I tried after freshly reindexing. Thanks again!
I couldn’t reproduce this just now, but I also haven’t put in a lot of effort. There are definitely examples where the search makes decisions which don’t quite align with what the user expects. It’s often a thing about diversity of the matches, number of matches and stuff™, that gets weighted in a way that doesn’t always lead to the best results. If you can give me a minimal space where this occurs, I can look into it.