The Search function does not support Chinese characters

coffeemirror · April 6, 2024, 1:20pm

I self hosted the silverbullet on my centos, and using deployed syncthing to sync all files to my synology NAS (everything works well), so I want to replace my trilium note, but I suddenly found the silverbullet doesn’t support Chinese Characters searching[Ctrl-shift-f].

Maarrk · April 6, 2024, 2:05pm

I think the reason might be that the search is only matching full words (between spaces), which I assume doesn’t work the way you need for Chinese language. When I search a full phrase it does work for me (same with Polish diacritics).

I suppose the Search Plug could be changed to optionally do “dumb” search, just looking for a substring of characters without any of this tokenization logic.

coffeemirror · April 7, 2024, 6:15am

Thanks for your response. You’re right, it seems only works with whole sentence or phrase separated with space, comma,stop, et al.
As you mentioned the phrase"我不会说中文“, we need to search any one or two or more characters,such as “我”,“不会”,“我不会”,“会说中”, et al.
I have tried a few more note taking apps, such as dokuwiki, mediawiki, wiki.js, boostack, trilium, logseq, obsidian, siyuan, roamedit, remote, OneNote, Evernote, simplenote, synology note, et al. I prefer self hosted, webpage based, then I found silverbullet, the only thing bothered me is the search function not supported well in Chinese. I hope this issue could be solved.

daydaya · April 7, 2024, 6:52am

I have the same need, it would be great if I could configure the options myself, thank you very much

coffeemirror · April 7, 2024, 10:04am

I am not familiar with any codes, I paste the code of search function into Google Gemini, which also could not give any useful suggestion, but only tell me to transfer Chinese phrase to PinYin(similar to alphabet letters) for searching, which will be inaccurate and painful.

daydaya · July 7, 2024, 9:16am

Hi there,

I was wondering if there’s any chance that Chinese search support could be implemented in the future?

I’ve been using Silverbullet for a few months now and absolutely love it, but the lack of Chinese search functionality has been a bit of a challenge for me.

I even tried asking ChatGPT how to create such a feature myself. I put in a lot of effort, but since I’m not a programmer, I eventually had to give up.

Thank you!

Maarrk · July 19, 2024, 4:17am

I did a partial solution, where every character is treated as a separate word. I don’t speak Chinese, so can’t judge if this makes sense for your usage.

I will explain what I mean by using capital letters instead of actual characters, since that’s the keyboard I have. When you search for “ABC” (if they are ideograms) you would get in the results:

“ABC”
“CBA”
“AB”
“xxABxxxxxCx”
“xxCxxxBxxx”

This should keep the good performance of word-based search through the entire space at the cost of giving you many false positives. Probably could work if you search for rare keywords? Again, I don’t know the language to judge myself

daydaya · July 19, 2024, 5:07am

Thank you very much for your help!
I think a simple substring match will suffice for most of our needs.
For example, when I search for “ABC,” I would like to match all documents containing this substring, such as:

“ABC”
“ABCD”
“XXABCXX”

rather than matching each character as a separate word, such as:
“A B C”
“C B A”

Thanks again for your help!

Maarrk · July 29, 2024, 8:05pm

I wrote a plug that should help with this, see Grep Plug topic

Tested with this phrase again, and it works exactly as described here.

daydaya · July 30, 2024, 7:46am

Thank you for developing this plugin!

I have encountered an issue where the search functionality fails when filenames or folder paths contain Chinese characters.

However, the current search method meets the expectations of Chinese users. When the path and filenames do not contain Chinese characters, the Chinese content within the documents can be correctly searched.

I’ve reported this problem on GitHub for further investigation.

Maarrk · July 30, 2024, 6:40pm

Thanks for the report, it was also breaking it for my Polish diacritics. Version 2.1.0 works for me, you can try now with Plugs: Update command.

daydaya · July 31, 2024, 4:21am

Thank you very much, in this version, the problem is solved