Production Typesense Issue with Unexpected Filter Behavior
TLDR Ankit flagged a problem with a specific filter on the production server of Typesense. After several exchanges regarding optimisation and version checks, Kishore Nallan provided latest builds to troubleshoot. The filtering within facets issue persists and potential edge cases are being investigated.
4
1
Aug 02, 2023 (4 months ago)
Ankit
12:04 PMThe index has 600K records, with 30+ facets and takes up about 4GB on RAM when restarted/reindexed.
There is a field of type
string[]
, optional, facet, index all true. Suddenly filtering on this one specific field stopped working as expected. This array field would have only 3 values expected and the responses just using these 3 options were inconsistent. All other array-type filters were working as expected. The only thing that helped fixed was restarting the Typesense service which lead to a reindex.We also tried replicating it on a smaller dataset and weren't able to do so. The Production server has been live for 2-3 months and hadn't been restarted.
Have there been other cases like that of some kind of data/index corruption leading to filters not working? Is a regular restart recommended?
Let me know if any other information is needed to debug this as it's been hard to recreate it on our end.
Kishore Nallan
12:16 PMAnkit
12:17 PMHas facet optimizations made it to any 0.25 RCs?
Kishore Nallan
12:18 PMAnkit
12:20 PMBut have you heard of any such issue before or needed restarts to fix it?
Kishore Nallan
12:21 PMAnkit
12:22 PMAlso, any other updates on facet optimizations when using a high number of facets like our use case?
Aug 03, 2023 (4 months ago)
Kishore Nallan
06:10 AM0.26.0.rc9
-- this also contains the facet optimizations.Ankit
10:40 AMWill try it out.
Ankit
01:01 PMKishore Nallan
01:02 PMAnkit
01:06 PMcurl 'URL' \
-H 'authority: URL' \
-H 'accept: application/json, text/plain, */*' \
-H 'accept-language: en-US,en-IN;q=0.9,en;q=0.8' \
-H 'content-type: text/plain' \
-H 'origin: ' \
-H 'referer: ' \
-H 'sec-ch-ua: "Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: cross-site' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36' \
--data-raw '{"searches":[{"query_by":"full_name","min_len_2typo":8,"split_join_tokens":"always","sort_by":"update_time:asc","per_page":0,"highlight_full_fields":"full_name","collection":"people_v2","q":"*","facet_by":"location_state","max_facet_values":20,"page":1,"facet_query":"location_state:new"}]}' \
--compressed
Response:
[
{
"facet_counts": [
{
"counts": [],
"field_name": "location_state",
"sampled": false,
"stats": {
"total_values": 0
}
}
],
"found": 4517283,
"hits": [],
"out_of": 4517283,
"page": 1,
"request_params": {
"collection_name": "people_v2",
"per_page": 0,
"q": "*"
},
"search_cutoff": false,
"search_time_ms": 1772
}
]
Searching for "New" in state. Should return values.
Kishore Nallan
01:12 PMAnkit
01:12 PMKishore Nallan
01:13 PMAnkit
01:13 PMAnkit
02:04 PM0.26.0.rc1
The facet_query works as expected.Kishore Nallan
02:16 PM1
Ankit
02:19 PMKishore Nallan
02:22 PM1
Ankit
02:23 PM0.26.0.rc1
or should we switch to a 0.25 rc version?Kishore Nallan
02:24 PMAnkit
02:24 PMAnkit
02:24 PMKishore Nallan
02:25 PMKishore Nallan
02:25 PMAnkit
02:26 PMKishore Nallan
02:26 PMAug 04, 2023 (4 months ago)
Kishore Nallan
01:32 PM0.26.0.rc10
Ankit
01:47 PM1
Ankit
03:11 PMBut the facet_query still doesn't work. It's the same behavior as rc9.
Kishore Nallan
03:12 PMAnkit
03:15 PMThis should be reproducible on any data set. Do you still need me to share specific data?
Kishore Nallan
03:20 PMAnkit
03:23 PMWhat would be the best way to do that?
We have a typesense cluster on cloud as well do you want to upgrade that and I can create the scenario there?
Or I can share my schema, sample data and the curl request here written out.
Kishore Nallan
03:25 PMAnkit
03:43 PMINDEX_SCHEMA = {
"enable_nested_fields": True,
"fields": [{
"name": "location_state",
"type": "string",
"optional": True,
"index": True,
"facet": True
}]
}
// DATA
[
{
"location_state": "New Hampshire"
},
{
"location_state": "New Jersey"
},
{
"location_state": "New Mexico"
},
{
"location_state": "New York"
}
]
//REQUEST
curl 'XXX' \
-H 'authority: XXX' \
-H 'accept: application/json, text/plain, */*' \
-H 'accept-language: en-US,en-IN;q=0.9,en;q=0.8' \
-H 'content-type: text/plain' \
-H 'origin: XXX' \
-H 'referer: XXX' \
-H 'sec-ch-ua: "Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: cross-site' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36' \
--data-raw '{"searches":[{"exhaustive_search":true,"query_by":"location_state","highlight_full_fields":"location_state","collection":"facet_state_test","q":"*","facet_by":"location_state","max_facet_values":10,"page":1,"per_page":0,"facet_query":"location_state:New"}]}' \
--compressed
//RESPONSE
{
"results": [
{
"facet_counts": [
{
"counts": [],
"field_name": "location_state",
"sampled": false,
"stats": {
"total_values": 0
}
}
],
"found": 4,
"hits": [],
"out_of": 4,
"page": 1,
"request_params": {
"collection_name": "facet_state_test",
"per_page": 0,
"q": "*"
},
"search_cutoff": false,
"search_time_ms": 0
}
]
}
Ankit
03:43 PMKishore Nallan
03:52 PM1
Aug 11, 2023 (4 months ago)
Ankit
11:42 AMKishore Nallan
11:43 AMAnkit
11:43 AMKishore Nallan
12:03 PM0.26.0.rc12
please try with this build.Ankit
01:34 PMAnkit
02:53 PMAnkit
08:10 PMAug 12, 2023 (4 months ago)
Kishore Nallan
12:30 AMAug 14, 2023 (3 months ago)
Kishore Nallan
11:12 AMAnkit
11:30 AMKishore Nallan
11:30 AMAnkit
12:46 PMI can share the host and an admin key with you in DM to check. What other information would you need to help debug?
Kishore Nallan
12:55 PMAnkit
12:59 PM1
Typesense
Indexed 3015 threads (79% resolved)
Similar Threads
Fixing Multiple Document Retrieval in Typesense
Phil needed an efficient way to retrieve multiple documents by id. Kishore Nallan proposed a solution available in a pre-release build. After some bug fixing regarding id matching by Jason and Kishore Nallan, Phil successfully tested the solution.
Diacritics Support in Instantsearch.js RefinementList
Jan queries about enabling special characters in instantsearch.js refinementList. Kishore Nallan admits diacritics support exists for text searches, promising to look into supporting it in query fields. After claiming a fix, Jan later reports issues post-upgrade, which Jason & Kishore Nallan promise to resolve.
Slow Performance of Faceted Query with Increased Fields
John experienced slowdown in faceted queries after increasing to 1000 fields. Upgrading from 0.24.0.rcn21 to 0.24.0 improved performance. The issue was attributed to John using an ARM build on an M1 Mac.
Query on Facet Values, `max_facet_values` , and `facet_query_num_typos`.
Jan asked about sorting facet values, managing `max_facet_values` and issues with `facet_query_num_typos`. Jason clarified the details on instantsearch widget handling of `max_facet_values` and identified a bug on the Typesense Server. Jason suggested a solution to the sorting issue.
Resolving Typesense Cloud Cluster Issue with Cron Job
Issei reported a problem with an unhealthy Typesense Cloud cluster. With the particular help of Jason and Kishore Nallan, they discovered that a problematic cron job was responsible. A solution, using a different endpoint for data export, was agreed on and implemented.