#community-help

Discussing Access to Lower-Level APIs to Access Index in Typesense

TLDR Janaka inquired if lower-level APIs can access index tokens in a document for similarity calculation in Typesense. Jason advised extracting keywords for a regular search, informing that indices are only stored in memory. The thread concluded with a discussion about prospective Typesense extensions.

Powered by Struct AI

1

13
25mo
Solved
Join the chat
Sep 29, 2021 (26 months ago)
Janaka
Photo of md5-3ecfadbfb82a962691e2d6cb42f876b4
Janaka
10:50 PM
Hi, I think the answer is no, but are there lower level APIs to access the index? To get at the tokens of a documents and meta like frequency so I can run doc similarity calculations. My end goal is to find documents indexed in typesense that are similar to an external document (like a random Web page on the Internet).
10:52
Janaka
10:52 PM
Is any of what I'm after persisted or just in mem? I've not poked around the raw files yet.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:53 PM
Not at the moment, but this is an interesting use case. Random suggestion without too much context: are you able to extract keywords / topics from the candidate web page, and then do a regular keyword search for them in Typesense and use the results?
10:53
Jason
10:53 PM
> Is any of what I'm after persisted or just in mem?
Indices are only stored in memory
Janaka
Photo of md5-3ecfadbfb82a962691e2d6cb42f876b4
Janaka
10:56 PM
Yeah that would be possible I guess.
10:59
Janaka
10:59 PM
Might start my experiment that way.
11:01
Janaka
11:01 PM
Back up plan for similarity is to tokenise etc myself and store the document vector in typesense. I'll need that code to run against the external document anyway.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:02 PM
Right
Janaka
Photo of md5-3ecfadbfb82a962691e2d6cb42f876b4
Janaka
11:02 PM
A typesense API to the index could be an interesting extension mechanism.
11:03
Janaka
11:03 PM
I'll let you know how I get on.

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:04 PM
We had this ask earlier this year to be able to find similar docs from within Typesense: https://github.com/typesense/typesense/issues/207

That should help in your use-case as well (when we have it 😄 )
Oct 06, 2021 (25 months ago)
Janaka
Photo of md5-3ecfadbfb82a962691e2d6cb42f876b4
Janaka
05:27 PM
Jason only just saw your reply. issue #207 and #130 are similar to what I'm trying to do. If I understood correctly, the doc they want to do a similar search on is one that is already indexed. In my case, it will not be indexed (the web page). Is that right?

If so exposing the index via an API would allow both cases to be solved and not tightly couple the similarity algorithm to the core of Typesense. In theory that should also make it easier for other to collaborate on those algos 🙂
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:28 PM
Without having looked into it too much just yet, I'd imagine that you would have to convert the doc (or webpage) to at least structurally resemble the docs you've already indexed (similar fields) to find similar docs.