typesense

So this will be really helpful for:
1. importing documents faster
2. scaling search request volume
is that about right?

Exactly, specifically when using built-in models

right. and I guess it would also enable adding some of the other, larger models as well?

One of our typescript collections contains documents for entire research PDFs (in text format). e5-small embedding size is 512? - so about a written page. Ideally we can choose a larger token length for this collection than the others.

Most embedding models only support token lengths of 512 because more than that, the meaning of the embedding gets diluted. For e.g. you have only a few hundred dimensions to encode the semantic meaning of the data.

So to encode large text you have to split them up.

E.g. 1 page per document and have a `parent_doc_id` so that the results can be grouped at query time.