Solving Keyword Search in Document Chunks with Typesense
TLDR Dima was struggling with keyword search in divided document chunks. Kishore Nallan resolved the issue by suggesting adding a 'para_num' integer field to sorting criteria and trying the updated 0.24 RC builds.
Jan 12, 2023 (9 months ago)
I have a large text documents with title and content, which I divided into managable chunk (~200 words each). I’m running the search against title and content and then group by results by document_id, so I can get no more than one chunk for each document. In case when keyword in content it’s working pretty good — I got only this chunk in the response. But when only title contains keyword I got random chunk from document, usually last loaded into index. How can I make sure that when no keyword in content found I got the first chunk from document, and not random?
I have also tried to add chunk number as a sorting parameter, but faced that sort_by isn’t working for 3+ parameters in it: https://github.com/typesense/typesense/issues/634
Kishore Nallan09:09 AM
fields: - name: object_id type: string index: true - name: document_id type: string index: true facet: true - name: title type: string index: true - name: pageviews type: int32 index: true sort: true```
Kishore Nallan09:22 AM
fields: - name: object_id type: string index: true - name: document_id type: string index: true facet: true - name: title type: string index: true - name: pageviews type: int32 index: true sort: true - name: content type: string index: true```
Kishore Nallan09:23 AM
Kishore Nallan09:23 AM
> But when only title contains keyword I got random chunk from document, usually last loaded into index.
Kishore Nallan09:25 AM
Kishore Nallan09:27 AM
para_numinteger field and add this to the sorting criteria. The 3-way sorting issue that you have posted on the issue is fixed on recent 0.24 RC builds. You can use them (many people already use on production). We will soon be releasing it fully.
Kishore Nallan09:31 AM
Kishore Nallan10:36 AM
Indexed 2779 threads (79% resolved)
Solving Conflicts in Searching and Ordering Data with Typesense
SamHendley faced an issue with search result order in Typesense. Kishore Nallan explained two behaviors that affected the ranking and pledged to change these, while also considering an additional suggestion from SamHendley. These changes were implemented in version `0.24.0.rcn39`.
Methods for Fetching, Querying, and Modifying Collections in Typesense
Bill inquired about performing OR queries, querying empty arrays and modifying collections in Typesense. Kishore Nallan explained the current limitations and provided workarounds and recommendations for each case. The conversation also touched upon the usage of cache in Typesense and the workings of the _eval function.
Issue with Null Values in TypeSense Document Import
Peter is having issues with document import erroring due to null values. Kishore Nallan tries to help and advises several troubleshooting steps and potential fixes. The issue remains unresolved.
Docsearch Scrapper Metadata Configuration and Filter Problem
Marcos faced issues with Docsearch scrapper not adding metadata attributes and filtering out documents without content. Jason helped fix the issue by updating the scraper and providing filtering instructions.
Discussing Indexing and Filter Applications
Tugay and Kishore Nallan debated over latest RC build progress with several queries about field definitions and effect of filters on performance. A bug concerning multiple document matches was discovered and fixed.