#community-help

Debugging Search Query Issues in Large Document Collection

TLDR Sinan is having trouble with a search query on a large document collection. Jason suggests trying '0.24.0.rcn15' and possibly sharing the dataset for further investigation.

Powered by Struct AI
Oct 17, 2022 (14 months ago)
Sinan
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
Sinan
01:53 PM
Does anyone have idea about this issue
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:56 PM
Sinan Could you share your collection’s schema and also the exact search query with all the search parameters you’re using?
04:56
Jason
04:56 PM
If you can share it in this format, that would be great:

https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b
Oct 18, 2022 (14 months ago)
Sinan
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
Sinan
06:40 AM
Hi Jason
06:40
Sinan
06:40 AM
curl --location -g --request POST '{{url}}/collections' \
--header 'Content-Type: application/json' \
--header 'X-TYPESENSE-API-KEY: {{TYPESENSE_API_KEY}}' \
--data-raw '{
"name": "loantest99000K",
"fields": [
{"name": "id", "type": "string" },
{"name": "LoanNumber", "type": "int64" },
{"name": "DateApproved", "type": "string" },
{"name": "SBAOfficeCode", "type": "int32" },
{"name": "ProcessingMethod", "type": "string" },
{"name": "BorrowerName", "type": "string" },
{"name": "BorrowerAddress", "type": "string"},
{"name": "BorrowerCity", "type": "string" },
{"name": "BorrowerState", "type": "string" },
{"name": "BorrowerZip", "type": "string"},
{"name": "LoanStatusDate", "type": "string" },
{"name": "LoanStatus", "type": "string" },
{"name": "Term", "type": "int32"},
{"name": "SBAGuarantyPercentage", "type": "int32" },
{"name": "InitialApprovalAmount", "type": "float" },
{"name": "CurrentApprovalAmount", "type": "float"},
{"name": "UndisbursedAmount", "type": "float" },
{"name": "FranchiseName", "type": "string" },
{"name": "ServicingLenderLocationID", "type": "int32"},
{"name": "ServicingLenderName", "type": "string" },
{"name": "ServicingLenderAddress", "type": "string" },
{"name": "ServicingLenderCity", "type": "string"},
{"name": "ServicingLenderState", "type": "string" },
{"name": "ServicingLenderZip", "type": "string" },
{"name": "RuralUrbanIndicator", "type": "string"},
{"name": "HubzoneIndicator", "type": "string" },
{"name": "LMIIndicator", "type": "string" },
{"name": "BusinessAgeDescription", "type": "string"},
{"name": "ProjectCity", "type": "string" },
{"name": "ProjectCountyName", "type": "string" },
{"name": "ProjectState", "type": "string"},
{"name": "ProjectZip", "type": "string" },
{"name": "CD", "type": "string" },
{"name": "JobsReported", "type": "float"},
{"name": "NAICSCode", "type": "int32" },
{"name": "Race", "type": "string" },
{"name": "Ethnicity", "type": "string"},
{"name": "UTILITIES_PROCEED", "type": "float" },
{"name": "PAYROLL_PROCEED", "type": "float" },
{"name": "MORTGAGE_INTEREST_PROCEED", "type": "float"},
{"name": "RENT_PROCEED", "type": "float"},
{"name": "REFINANCE_EIDL_PROCEED", "type": "float"},
{"name": "HEALTH_CARE_PROCEED", "type": "float"},
{"name": "DEBT_INTEREST_PROCEED", "type": "float"},
{"name": "BusinessType", "type": "string"},
{"name": "OriginatingLenderLocationID", "type": "int32"},
{"name": "OriginatingLender", "type": "string" },
{"name": "OriginatingLenderCity", "type": "string" },
{"name": "OriginatingLenderState", "type": "string"},
{"name": "Gender", "type": "string" },
{"name": "Veteran", "type": "string" },
{"name": "NonProfit", "type": "string"},
{"name": "ForgivenessAmount", "type": "float" },
{"name": "ForgivenessDate", "type": "string" },
{"name": "ApprovalDiff", "type": "float"},
{"name": "NotForgivenAmount", "type": "float" },
{"name": "ForgivenPercentage", "type": "float" },
{"name": "PROCEED_Diff", "type": "float"},
{"name": "TOTAL_PROCEED", "type": "float"},
{"name": "UTILITIES_PROCEED_pct", "type": "string" },
{"name": "PAYROLL_PROCEED_pct", "type": "string" },
{"name": "MORTGAGE_INTEREST_PROCEED_pct", "type": "string"},
{"name": "RENT_PROCEED_pct", "type": "string" },
{"name": "REFINANCE_EIDL_PROCEED_pct", "type": "string" },
{"name": "HEALTH_CARE_PROCEED_pct", "type": "string"},
{"name": "DEBT_INTEREST_PROCEED_pct", "type": "string" },
{"name": "PROCEED_Per_Job", "type": "string" },
{"name": "Fraud", "type": "bool"}
],
"default_sorting_field": "LoanNumber"
}'
06:41
Sinan
06:41 AM
and the search query is
06:41
Sinan
06:41 AM
curl --location -g --request GET '{{url}}/collections/{{collection_name}}/documents/search?q=north%20carlina&query_by=BorrowerAddress&sort_by=CurrentApprovalAmount:desc&filter_by=CurrentApprovalAmount:%3C%201464733' \
--header 'X-TYPESENSE-API-KEY: {{TYPESENSE_API_KEY}}'
06:42
Sinan
06:42 AM
This collection has a number of about 5M documents.
06:42
Sinan
06:42 AM
on the disk all csv' s occupy ~6.85 GB
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:33 PM
Nothing stands out to me here on first glance… Could you also try this on 0.24.0.rcn15 to see if the issue persists?
06:35
Jason
06:35 PM
If it does exist on that RC build as well, then to debug this further, it would be great if we can have access to this dataset. Could you email a link to it to [email protected], mentioning this Slack conversation?
Oct 19, 2022 (14 months ago)
Sinan
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
Sinan
08:25 AM
Do you have any rpm packages for 0.24.0.rcn15
08:25
Sinan
08:25 AM
I do not use docker images for my tests
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:12 PM
We have a DEB package for that RC version, but not RPM yet. CC: Kishore Nallan ^

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community

Similar Threads

Memory Leak Issue in Typesense Server v0.21.0

Jim was experiencing a potential memory leak while load testing an API. Jason and Kishore Nallan suggest it may be a known issue in v0.22.0 RC builds which is resolved in later versions. Jim identifies a specific search filter escalating the issue. Kishore Nallan confirms a fix in the 0.22 RC builds.

4

30
27mo

Discussing Search API Limitations and Solutions

Sidharth had problems with search API response limitations and sorting issues. Kishore Nallan suggested multi_search query and provided links for an updated version. After installation, some timeout and performance issues were encountered, partially resolved by adjusting client timeout values.

1

45
14mo

Optimizing Typesense Implementation for Large Collections

Oskar faced performance issues with his document collection in Typesense due to filter additions. Jason suggested trying a newer Typesense build and potentially partitioning the data into country-wise collections. They also discussed reducing network latency with CDN solutions.

5

67
11mo
Solved

Querying Issues with Newly Created Collection

User Dima reported issues with missed search results in a new collection he created. User Kishore Nallan offered solutions but the problem persists. Dima cannot share the data, which makes it harder to resolve the issue.

1

26
4mo

Resolving Typesense Cloud Cluster Issue with Cron Job

Issei reported a problem with an unhealthy Typesense Cloud cluster. With the particular help of Jason and Kishore Nallan, they discovered that a problematic cron job was responsible. A solution, using a different endpoint for data export, was agreed on and implemented.

5

65
31mo
Solved