Data schemas and "real" SQL

Let me drop an idea here, and see what you think — this is still early stage, but interested in some feedback on if this would be of interest to you.

My original design for Objects in SB was to keep it very loose and dynamic. You could add arbitrary attributes to objects, and validation would always be optional. The query language is nice and simple, but has limitations compared to a “proper” SQL implementation and likely always will, because implementing a full on SQL database is hard.

Nevertheless, I see people wanting to do more and more with the data that SilverBullet collects for them, or the data they collect in it. They want effectively do joins, aggregations and other standard SQL stuff. To some degree can implement these now in Lua, but this is going to be slow and a bit of a pain to write.

One obvious solution would be to let Objects be backed by a “proper” database, like SQLite. To get the true power of SQLite, you’d have to create proper tables. Those tables require a schema, and when we talk about schemas we also talk about data migrations when schemas change. This is a pain I don’t want to make part of the SilverBullet experience, which is why I never really considered it.

But now I’m thinking: what if we offer both.

What if we offer a way (probably a Lua API) to define schemas for tags. For all the tags with a defined schema (again, this would be optional), we then create SQLite tables, and as part of regular indexing insert data into those tables as well.

If you change the schema, you’d have to a SQLite reindex: drop all the SQLite tables, create them anew with the new schema, pull all the objects from the “regular” Objects database, and reinsert them (if they match the schema, otherwise not). There’d be ways for you to see what objects don’t match the schema and will not be indexed as a result.

And then, we can simply offer a Lua API to perform real SQL queries in your pages:

${sql[[
   select hello, avg(something) from sometag
   left join bla on dah
   group by something
]]}

etc.

I don’t think would even be that hard to do, but could offer some fancy new features.

One reason not to do this is added complexity: there’s now be two query languages, one with nice Lua integrations, and one just being plain SQL.

Thoughts?

5 Likes

So if I understand you correctly, it’s sqlite on top of the current datastore and using some command you can extract the data from the datastore into sqlite (With some way to filter that data)? This then poses the possibility to query from that database and get the feature/performance benefits.

Reading this I quickly jump to the question: Couldn’t this just be a plug? This seems like all that is required

  • Lua functions
  • Run sqlite using wasm, which seems to be officially supported
  • Commands
  • Access to the datastore/index
  • Config options

If I haven’t overlooked something critical this should be implementable as a plug and wouldn’t even be too hard, depending on how you configurable the “tag schema” is and how complex your datastore -> sqlite mapping is.
This would at least rule out the “added complexity argument” in my opinion, because it doesn’t even have to be part of the SB core.

(One problem that just came to mind is, that if you want to be able to inline lua into your sql query, you would have to completely parse the sql query, evaluate the lua and then stringify it back for sqlite. Unsure what sql parsers exist and what they allow and if there may even be syntax conflicts, but this could be a little harder to do.)

Yes, correct. I was thinking you’d do something along the lines of:

sql.define {
   -- tag name to sync with
   tag = "task",
   -- table name
   name = "task",
   schema = { -- or even plain SQL CREATE TABLE body
      id = "string",
      name = "int",
      done = "boolean"
   },
   -- optional custom mapping
   map = function(obj)
      return { 
         id = obj.ref,
         name = obj.name,
         done = obj.done
      }
   end
}

I didn’t think of the implementation but indeed I’d imagine this can be implemented as a plug and probably should. The plug can contain the WASM-compiled SQLite code, and just listen to the right events and expose the right syscalls. That would be pretty neat, because indeed that way this wouldn’t need to be in SB core at all.

1 Like

I would love this — the main reason, despite my hesitation around the “offline mode”, that I switched over to v2 is that Lua lets me implement some aggregation functionality I would normally execute in SQL, and so exposing more of that database functionality would be something that fits well with the “CRM-lite” use case I currently have hacked together with Lua & LIQ.

It’s hard to react more concretely without getting deeper into what is allowed vs what isn’t, but I would love to see where this goes and think it would be tremendously powerful! :saluting_face:

1 Like

As a developer I would absolutely love it, but it gives me pause when we have two solutions to a problem. Like what would be motivation for a savvy user to use SQL or the object based index? When do you use A and when do you use B? What can A do that B doesn’t? etc

This can help?

Ok, so I think this is a project that is worth doing but perhaps not as “core” development (as in: this shouldn’t ship by default).

Probably there should just be a silverbullet-sql plug developed that implements this for advanced users.

I may pick this up at some point, but if there’s any takers until then, feel free :slight_smile: We can use this topic to keep brainstorming on the specifics.

I can confirm this. I am trying to write it and it’s quite hard to implement because the SQL properties must hold the relational logic, proper ordering etc. Nice excercise, really. I tried several approaches to attack this area from Space Lua and I ended up with a class forming an executable plan internally via chaining of methods but still struggling edge cases etc. Although it is clearly doable in Lua, of course, native SQL support would be such a great extension!

Have you considered using PGLite, storing object data in jsonb? This would still allow the loose, schema-free nature of general front-matter, while also allowing for optimized joins and aggregations.

You’d also inherit lots of other pg goodies. I’m not sure pllua is a supported extension yet, but it probably wouldn’t be too hard to compile. That would allow db functions to be written in lua.

1 Like

Cool, never heard of that. Something to look at.

I think the ideas here are interesting and do sound like something I would use. Keeping it as part of a plug does also sound sensible.

At the same time, I’d really like to see the concept of a data schema (with column/field types, default values and constraints) be a core part of Silverbullet so that it can easily be reused across community contributions. Ideally, the table definitions here would support this abstracted schema rather than raw sql so that it doesn’t require unnecessary duplication.

You already mentioned one other use case that probably doesn’t need to be part of the sql plug, a view displaying all instances of a tag that don’t match the schema. I’d also like to see improvements to enforcing/encouraging schema compliance during data entry, for example highlighting, autocomplete and automatic insertion of attributes. This would likely have a somewhat different experience/implementation per context, ex: page frontmatter, ad hoc tags within a page or a table of records all expected to conform. I don’t think any of this necessarily needs to be in Silverbullet core, but I think a stable api for reuse across community contributions does make sense.

2 Likes

I am unfamiliar with these terms, but I have used some other note taking apps (Tana、Obsidian、SiYuan、Notion、Logseq、RoamEdit、Workflowy) and browsed some people’s blogs, and watched a smart person’s speech (by the way, Jeff Huber inspires me to draft a Sublime Text accurate + fuzzy search plugin).

I think @Zef can try SQLite based meilisearch (eg: Eya Lecturer Fan) or chroma (eg: Jupyter notebook demo) to enhance the full text/database search and the introduction of semantic search of SilverBullet.

It seems that both provide local executables or packages.