Solving Keyword Search in Document Chunks with Typesense
TLDR Dima was struggling with keyword search in divided document chunks. Kishore Nallan resolved the issue by suggesting adding a 'para_num' integer field to sorting criteria and trying the updated 0.24 RC builds.
2
Jan 12, 2023 (9 months ago)
Dima
08:55 AMI have a large text documents with title and content, which I divided into managable chunk (~200 words each). I’m running the search against title and content and then group by results by document_id, so I can get no more than one chunk for each document. In case when keyword in content it’s working pretty good — I got only this chunk in the response. But when only title contains keyword I got random chunk from document, usually last loaded into index. How can I make sure that when no keyword in content found I got the first chunk from document, and not random?
Dima
09:01 AMI have also tried to add chunk number as a sorting parameter, but faced that sort_by isn’t working for 3+ parameters in it: https://github.com/typesense/typesense/issues/634
Kishore Nallan
09:09 AMDima
09:21 AMfields:
- name: object_id
type: string
index: true
- name: document_id
type: string
index: true
facet: true
- name: title
type: string
index: true
- name: pageviews
type: int32
index: true
sort: true```
"query_by": "title,content","sort_by": "_text_match:desc,pageviews:desc",
"group_by": "document_id",
"group_limit": 1,
"highlight_full_fields": "title,content,path",```
Kishore Nallan
09:22 AMcontent
is missing?Dima
09:22 AMFixed one:
fields:
- name: object_id
type: string
index: true
- name: document_id
type: string
index: true
facet: true
- name: title
type: string
index: true
- name: pageviews
type: int32
index: true
sort: true
- name: content
type: string
index: true```
"query_by": "title,content","sort_by": "_text_match:desc,pageviews:desc",
"group_by": "document_id",
"group_limit": 1,
"highlight_full_fields": "title,content",```
Kishore Nallan
09:23 AMKishore Nallan
09:23 AM> But when only title contains keyword I got random chunk from document, usually last loaded into index.
Dima
09:24 AMKishore Nallan
09:25 AM1
Dima
09:25 AMKishore Nallan
09:27 AMpara_num
integer field and add this to the sorting criteria. The 3-way sorting issue that you have posted on the issue is fixed on recent 0.24 RC builds. You can use them (many people already use on production). We will soon be releasing it fully.Dima
09:29 AMKishore Nallan
09:31 AMDima
10:34 AMKishore Nallan
10:36 AM
Dima
10:36 AM1
Typesense
Indexed 2779 threads (79% resolved)
Similar Threads
Solving Conflicts in Searching and Ordering Data with Typesense
SamHendley faced an issue with search result order in Typesense. Kishore Nallan explained two behaviors that affected the ranking and pledged to change these, while also considering an additional suggestion from SamHendley. These changes were implemented in version `0.24.0.rcn39`.
Methods for Fetching, Querying, and Modifying Collections in Typesense
Bill inquired about performing OR queries, querying empty arrays and modifying collections in Typesense. Kishore Nallan explained the current limitations and provided workarounds and recommendations for each case. The conversation also touched upon the usage of cache in Typesense and the workings of the _eval function.
Issue with Null Values in TypeSense Document Import
Peter is having issues with document import erroring due to null values. Kishore Nallan tries to help and advises several troubleshooting steps and potential fixes. The issue remains unresolved.
Docsearch Scrapper Metadata Configuration and Filter Problem
Marcos faced issues with Docsearch scrapper not adding metadata attributes and filtering out documents without content. Jason helped fix the issue by updating the scraper and providing filtering instructions.
Discussing Indexing and Filter Applications
Tugay and Kishore Nallan debated over latest RC build progress with several queries about field definitions and effect of filters on performance. A bug concerning multiple document matches was discovered and fixed.