Typesense Filter Bug Involving Similar Facets
TLDR SamHendley reported a bug in Typesense where filtering by facet returns wrong documents, providing a reproduction case. Jason and Kishore Nallan recognized the issue, tracked it on GitHub, and implemented a fix in a new Docker build.
1
Nov 17, 2022 (11 months ago)
SamHendley
04:21 PMfilter_by
for a facet sometimes gives me the wrong documents if it has the right combination of similar facets.SamHendley
04:22 PMcurl "" \
-X DELETE \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
curl "" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"name": "companies",
"fields": [
{"name": "company_name", "type": "string" },
{"name": "capability", "type": "string[]", "facet": true}
]
}'
curl "" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-H "Content-Type: text/plain" \
-X POST \
-d '
{"id": "125","company_name": "Company1", "capability": ["Encoding capabilities for network communications", "Obfuscation capabilities"]}
{"id": "126","company_name": "Company2", "capability": ["Encoding capabilities for network communications"]}
{"id": "127","company_name": "Company3", "capability": ["Obfuscation capabilities"]}
{"id": "128","company_name": "Company4", "capability": ["Encoding capabilities"]}
'
# for this search only Company4 that actually has our expected facet but this
# search returns Company1 as well despite it not having the filter value.
# It appears to need both of these incorrect facet values, removing either one stops the bug from occuring
curl "" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"searches": [
{
"collection": "companies",
"query_by": "company_name",
"q": "",
"filter_by": "capability:=[`Encoding capabilities`]",
"facet_by": "capability"
}
]
}'
Jason
04:24 PMCould you try
"filter_by": "capability:=[Encoding capabilities]"
just to see if the bug is with the backtick escaping mechanism?Jason
04:24 PMSamHendley
04:25 PMSamHendley
04:25 PMJason
04:25 PMSamHendley
04:29 PMJason
04:33 PMSamHendley
05:27 PMJason
05:30 PMJason
05:33 PMSamHendley
05:35 PMJason
05:36 PMSamHendley
05:38 PM1
Nov 23, 2022 (11 months ago)
Kishore Nallan
11:16 AMtypesense/typesense:0.24.0.rcn35
Docker build.SamHendley
05:00 PMTypesense
Indexed 2779 threads (79% resolved)
Similar Threads
Fixing Multiple Document Retrieval in Typesense
Phil needed an efficient way to retrieve multiple documents by id. Kishore Nallan proposed a solution available in a pre-release build. After some bug fixing regarding id matching by Jason and Kishore Nallan, Phil successfully tested the solution.
Discussing Indexing and Filter Applications
Tugay and Kishore Nallan debated over latest RC build progress with several queries about field definitions and effect of filters on performance. A bug concerning multiple document matches was discovered and fixed.
Troubleshooting Issues with DocSearch Hits and Scraper Configuration
Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.
Using Aliases in Typesense Search Queries
Babin had a question about using aliases in Typesense search queries, specifically when passing collection names. Jason helped identify a misconfiguration and provided guidance, resolving the issue.
Issues with Importing Typesense Collection to Different Server
Kevin had problems migrating a Typesense collection between Docusaurus sites on different machines. Jason advised them on JSONL format, handling server hosting, and creating a collection schema before importing documents, leading to successful import.