Hi guys, sorry for asking before doing my diligenc...
# community-help
a
Hi guys, sorry for asking before doing my diligence, but I'm moving from Meilisearch to Typesense and would like to know this: On MS, you can ask for info about the matches in the retrieved results (position in the indexed string, length, etc...), is there a similar feature in TS ?
k
Typesense response JSON will contain the matched tokens. So far we have not had any use case where the position of a match is useful but happy to learn more.
a
Sure, check out this:
message has been deleted
It's quite useful if you run your own UI and have your own code for highlighting.
k
I see. So you use those indices to wrap the text with your own highlighting?
And, any reason for preferring that over specifying custom highlight start and end tag options that Typesense provides?
a
In my particular case, on the UI, they are separate span elements with some events attached (i.e. they are clickable, etc..).
k
Can you give me a quick example snippet of how the HTML will look?
a
I could prob. do the same with parsing, but it's easier that way.
Wait, let me find one.
message has been deleted
k
Sorry I meant the HTML used for those highlights.
a
HTML looks like:
message has been deleted
But those <span>s have events attached
That I attach w/ Javascript, I don't actually output HTML, but rather create the elements and append them to the parent div.
k
Okay, got it. I'm trying to see if there is an easy work around for now. One thing you can do is set the
highlight_start_tag
as
<mark class="foo">
and then attach the event handlers to the
.foo
element?
a
Yes, I thought of a similar thing as well.
Anyway, if you could provide such info in the future it would be nice, it's probably not a big deal to put it there as your engine already knows the locations.
I'll give typesearch a try now and let you know if something comes out, thanks :D
k
The actual reason for not exposing offsets is because it can get confusing when unicode is involved, especially for non-latin languages. Whether the encoding is UTF-8 or UTF-16 etc.
a
I actually ran into that issue and asked them (MS) about it but got ignored. So, what you can do is just provide the offsets on the "buffer" of the string.
k
So took a middle ground of exposing just the matched tokens, as find and replace seemed easier since you then don't have to worry about byte offset vs unicode points etc.
👍 1
a
Yeah, that's true.
k
Typesense uses UTF-8 so we can expose those offsets, but depending on the language and client used, response parsed encoding is not certain. Seemed like a can of worms 🙂
a
If you provide the offset using the raw buffer as reference, is super easy to split the text (in a browser, using javascript)
Just for reference, it's something like:
new TextEncoder('utf-8').encode(<the text>).slice(from, to);
And it works.
k
Got it. Thanks. I will add to our backlog. I think we should be able to expose it.
a
Thanks to you Kishore, you've been very kind.
I have another small question, now that I know you are the co-founder 😄
k
No problem. Sometimes, we think hard about whether a new feature can sufficiently be replaced with another (highlight tags in this case), since each feature adds a bit of bloat and complexity.
👍 1
a
Do you plan to add backups to hosted typesearch?
That would be a killer feature for me 😮
k
We already have a snapshot end-point that can back up the whole Typesense DB. So it should not be too difficult to add that. How would your backup related workflow look like? Do you for e.g. want to take a snapshot and then shut down the instance and then restart it? Or, do you want to be able to export the data?
a
Ok, but if the instance dies, for whatever reason, will the snapshot go down with it?
k
No, it will be stored externally.
a
What I would like to have is some sort of daily backup/snapshot, and being able to rollback/restore to it.
Maybe a daily one, stored for a month.
k
From a compliance or HA perspective?
a
You nailed it, compliance.
k
The wounds from implementing SOC-2 at a previous job is still raw 🙂
a
Haha
It's not SOC-2 here, but similar, yes
If you provide that I wouldn't mind paying extra, at all
k
I will be happy to priortize this feature. We just launched a UI for browsing collections and searching on data. Next on the roadmap is self-serve for version upgrades, which is currently not available on the self-serve interface. We will be happy to tackle backups as part of that. Maybe begin with offering snapshots, and then add a way to restore it as well.
a
That would be super nice!
Well, thanks for the convo. Kishore, I'll start my workday now. Best wishes, I'll keep an eye on typesense.
k
You're welcome. I've bookmarked this thread and I will ping you when we have snapshots ready! 👋
🙌 1