String.match() in lua api is seriously wrong

Juts played a bit with string matching and it doesn’t really give you correct value.

snippets to play with to see my point:

${
string.match("2025-04-30", "%d%d%d%d-(%d%d)-%d%d" )
}

should match the month

${
string.match("2025-04-30", "(%d%d%d%d)-(%d%d)-(%d%d)" )
}

should give you three components but it doesn’t, it returns 20 instead.

I never anticipated how much of a pain it would be implement a spec compliant Lua API. There’s a test suite for the string.* APIs here: silverbullet/common/space_lua/stdlib/string_test.lua at main · silverbulletmd/silverbullet · GitHub

You can run them with deno task test common/space_lua/lua.test.ts if you could add (failing) tests for the cases you find that’d help (and please check that they do work in “standard” Lua). If you want to be super adventurous, the implementation of these APIs are here: silverbullet/common/space_lua/stdlib/string.ts at main · silverbulletmd/silverbullet · GitHub

2 Likes

TLDR: escape your hyphens (%-)

I wrote a whole response about how - is a special character in lua patterns, but then I discovered that lua patterns do not support repeat characters on captures. TIL. You shouldn’t have to escape - if it does not follow a character class, so your second example is valid and should be returning year, month, day. You can still escape the - with %- to get the correct results.

Your first example is returning the correct result (no match) by accident, though. Your first example parsed as a lua pattern is:

  • %d%d%d%d-: match three to infinity digits
  • (%d%d): capture two digits (effectively making it match 5 to infinity digits, capture the last two)
  • -: followed by a hyphen because hyphen as a repeat does not apply to the preceding capture
  • %d%d: followed by two digits

So it is effectively matching 5 to infinity digits (capture the last two) followed by a hyphen and two digits. It should match the string 20255-04-30 and return 55, and it should not match 2025-04-30. In silverbullet it actually matches 20255 in the former string for some reason I haven’t looked into.

I spent more time than I should have trying to figure out what Lua patterns do and how they might be implemented more closely in Space Lua. I gave up and opened a PR for an addition to the API page about how Space Lua patterns are a bit different from standard Lua.

1 Like

Hey Will, Thanks for your investigation, I personally don’t have such capability to investigate so deep into the SB’s source code and string related stuff

I can confirm that with the %- escape I can get the correct year, which it returns 2025. Yet I think it should return three components, which is still not the behavior we anticipate. I also tried the example you show in the updated silverbullet doc. That one is also having the same performance.

No further investigation I was trying to request, just want to confirm if the following pattern is returning only 2025 on your instance as well?

${
string.match("2025-04-30", "(%d%d%d%d)%-(%d%d)%-(%d%d)" )
}

I noticed this as well, and it will require some investigation to see if it is a bug or not in the way string.match values are returned. I can confirm it is returning the three values (but only printing the first):

${string.match("2024-03-14", "(%d+)%-(%d+)%-(%d+)")}
-- prints "2024"
${table.pack(string.match("2024-03-14", "(%d+)%-(%d+)%-(%d+)"))}
-- prints:
-- * 2024
-- * 03
-- * 14

This is because the ${} syntax only renders the first value even if more are returned.

Ah I see. But it can render a list from the split. That is why I was a bit confused by the result. Maybe we should unify the behavior? I think printing out all the returned value should be preferable.