Hi, I think the answer is no, but are there lower ...
# community-help
j
Hi, I think the answer is no, but are there lower level APIs to access the index? To get at the tokens of a documents and meta like frequency so I can run doc similarity calculations. My end goal is to find documents indexed in typesense that are similar to an external document (like a random Web page on the Internet).
Is any of what I'm after persisted or just in mem? I've not poked around the raw files yet.
j
Not at the moment, but this is an interesting use case. Random suggestion without too much context: are you able to extract keywords / topics from the candidate web page, and then do a regular keyword search for them in Typesense and use the results?
Is any of what I'm after persisted or just in mem?
Indices are only stored in memory
j
Yeah that would be possible I guess.
Might start my experiment that way.
Back up plan for similarity is to tokenise etc myself and store the document vector in typesense. I'll need that code to run against the external document anyway.
j
Right
j
A typesense API to the index could be an interesting extension mechanism.
I'll let you know how I get on.
👍 1
j
We had this ask earlier this year to be able to find similar docs from within Typesense: https://github.com/typesense/typesense/issues/207 That should help in your use-case as well (when we have it 😄 )
j
@Jason Bosco only just saw your reply. issue #207 and #130 are similar to what I'm trying to do. If I understood correctly, the doc they want to do a similar search on is one that is already indexed. In my case, it will not be indexed (the web page). Is that right? If so exposing the index via an API would allow both cases to be solved and not tightly couple the similarity algorithm to the core of Typesense. In theory that should also make it easier for other to collaborate on those algos 🙂
j
Without having looked into it too much just yet, I'd imagine that you would have to convert the doc (or webpage) to at least structurally resemble the docs you've already indexed (similar fields) to find similar docs.