Sync slow, conflicted files showing up

I run silverbullet pretty much as described in this guide. I use sync mode on three devices: My desktop, laptop, and phone. While the system works OK most of the time, conflicted versions of files keep showing up, as though some editors are fighting (though I’m not editing the same file across multiple browsers). It also appears that the indexing on the server (via docker-compose logs) is quite slow, as opposed to indexing on the client (via chrome’s JS console).

…it might not be helping that I have the following query on my homepage:

{{#let @badPages = {page where 
  name =~ "^Projects/" and 
  [
    "Active",
    "Ready",
    "Blocked",
    "Decided no",
    "Done"
  ] != status
  or name =~ "\.conflicted\."
}
}}
{{#if @badPages}}
{{#each @badPages}}
> **warning** Warning
> Page [[{{name}}]] (status "{{status}}").
{{/each}}
{{else}}
No broken pages, all good!
{{/if}}
{{/let}}

How should I go about debugging this?

Here’s a small section from the JS console (this bit gets repeated) - I suspect that I have some file causing the sync to get stuck.

sync_service.ts:264 Already syncing, aborting individual file sync for index.md
sync_service.ts:125 Sync without activity for too long, resetting
sync_service.ts:267 Syncing file index.md
sync.ts:50 [sync] Performing a full sync cycle...
sync_service.ts:264 Already syncing, aborting individual file sync for index.md
sync.ts:105 error Error syncing file Projects/Advent of Code.md signal timed out
sync.ts:275 [sync] File changed on both ends, potential conflict SETTINGS.md
sync.ts:299 [sync] Starting conflict resolution for SETTINGS.md
sync.ts:336 [sync] Going to create conflicting copy SETTINGS.conflicted.1726743411079.md
2
sync_service.ts:264 Already syncing, aborting individual file sync for index.md
sync.ts:322 [sync] Files are the same, no conflict
sync.ts:112 [sync] Sync complete, operations performed 0
client.ts:296 Full sync completed
space_index.ts:14 Current space index version 6
sync_service.ts:267 Syncing file index.md

Other potential culprits: I originally imported my Obsidian-based space, but after import had made some significant changes (e.g. moving everything from Daily to Journal/Day, fixing links). Because it appeared to be struggling to sync, I had attempted various ways of telling the clients to “delete everything; start over” - perhaps there are some stray service workers abound? :thinking:

As for resetting the client, there’s a Debug: Reset Client command that deletes all locally synced content and unregisters the service worker. So it should start you fresh.

As to your actual issue:

  • The “signal timed out” message is a bit suspicious. How is your connectivity between your server and the clients?
  • What are you running the server, what type of hardware, is the storage on a NAS or local? What file system?

Thanks for that. Resetting the various clients helped for a while. However, now I’m getting a fresh error, which I suspect may have been the cause before:

Access to fetch at 'https://lutzky.cloudflareaccess.com/cdn-cgi/access/login/MY_SILVERBULLET_DOMAIN_REDACTED?kid=LONG_REDACTED_STRING&redirect_url=%2FProjects%2FSilverBullet.md&meta=LONG_REDACTED_STRING' (redirected from 'https://MY_SILVERBULLET_DOMAIN_REDACTED/Projects/SilverBullet.md') from origin 'https://MY_SILVERBULLET_DOMAIN_REDACTED' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

That’s quite the error message! The UI doesn’t show it, the header simply becomes red. I can’t seem to make that browser “log in” again; I must’ve configured something wrong with cloudflare.

For completeness:

  • The connectivity between my server and clients is good when they are online, but they are intermittently either asleep or disconnected (as laptops and phones do)
  • The server is a Dell Optiplex with an intel i5-8500, running Ubuntu 22.04. The filesystem is ZFS with a zraid configuration of 3 2TB spinning-rust drives. It serves as a NAS, homeassistant and plex server.

:face_exhaling: there doesn’t appear to be a way to get the browser to “log back in to that domain on cloudflare”; even taking the client out of sync mode (to “online mode”), it gets a 503 from trying to GET /.ping or GET /index.json.

This might be a known issue, or we might be holding it wrong.

I’ll disable the Zero Trust protections for now to make it usable, I’ll see if this also helps with the conflicted files showing up.

Disabling Zero Trust has mostly, but not completely, eliminated conflicts showing up (when editing on a singular device).

I’ve been messing with this a bit more, and I think part of what’s going on is poor performance with my ZFS drives. I can’t quite make out why.

Firstly, some stats about my space:

$ find -type f | sed 's/.*\.//' | sort | uniq -c
      2 jpg
      1 js
    680 md
      2 pdf
     59 png
$ du -sh
70M	.

I wouldn’t think of that as huge. However, when removing .silverbullet.* and starting a fresh instance (via docker), on my server, indexing takes a long time (10 minutes and counting so far), and the logs show messages such as:

silverbullet_1  | [mq] Message ack timed out, requeueing {
silverbullet_1  |   id: "1726926278386-000084",
silverbullet_1  |   queue: "indexQueue",
silverbullet_1  |   body: "Journal/Day/2024-09-02.md",
silverbullet_1  |   ts: 1726926610918,
silverbullet_1  |   retries: 2
silverbullet_1  | }
silverbullet_1  | Indexing file Journal/Day/2024-09-02.md

This isn’t a raw read-speed thing; $ time find -type f -exec md5sum {} \; takes 1.28 seconds.

Moving the space off of spinning-rust and onto an NVME drive, indexing completes in about 20 seconds (though the above “md5sum all the files” thing takes roughly the same). I don’t want to run my production copy off this drive though, because ZFS gives me encryption and snapshots.

I get the feeling that something about this slowness exacerbates an otherwise-rare race condition in the sync mechanism.

…specifically, I think whenever a full reindex is triggered, and other clients are still trying to communicate with the server, some level of confusion occurs.

Ah, I believe I have a likely culprit! Various people complain about deno filesystem slowness, and apparently many of the paths lead to Max "op" size is 16384 which can be inefficient in user land · Issue #10157 · denoland/deno · GitHub, which was apparently resolved in Deno v1.21.1. If I’m reading the Dockerfile correctly, we’re using deno 1.46.1.

So that wouldn’t explain in then, right? I mean our version is waaaaay newer so shouldn’t have this issue? Indeed unless you run explicitly on a super old Deno version. I’m pretty rigorous upgrading the Deno version for the docker image about every single release.

Here I was, so proud of my analytical success, I managed to fail to compare two version numbers :man_facepalming:t2:
I’ll try throwing some strace at the problem.

1 Like

Here’s a bit of progress: With SB_DB_BACKEND=memory, everything is as fast as expected (even on the admittedly slower drive). This makes me believe that the bottleneck is repeated writes to sqlite3.

In Install/Configuration, under Memory database, you write that SB_DB_BACKEND=memory is “only useful for testing”. However, given that - IIUC - SilverBullet treats the Markdown files as source-of-truth, that effectively means that the kv store (sqlite3 in the disk case) is, effectively, a cache, right? There’s certainly a tradeoff here, where more RAM would be in use - but in my case it’s 500MB instead of 250MB, which makes sense as a tradeoff in my case. Would there be any other hesitation to avoid using SB_DB_BACKEND=memory in the case where storage is slow?

Another option may be using SB_SYNC_ONLY. In both of these cases (but even moreso with SB_SYNC_ONLY, the server starts up significantly faster. I suspect that a lot of the issues I had been seeing had to do with trying to communicate with the server while it was busy with lots of I/O to slow-disk sqlite. I’ll give this a try for a few days, and if it seems to work well, perhaps we can make it official advice for “people running with slow storage”?

Thanks for the help (and rubber-ducking!) solving this.

I’ve filed three issues for follow-up which might make this better.