Optimizing Bulk Indexing and Reducing RAM Usage in Typesense
TLDR Timon experienced issues with Typesense becoming unresponsive during bulk indexing and sought advice. Jason recommended larger import requests and adjusting the client-side timeout allowance, revealing a need to increase RAM allocation for Docker. Kishore Nallan undertook to find ways to optimize memory usage, particularly for geopoint indexing.
1
1
Dec 15, 2021 (23 months ago)
Timon
05:07 PMJason
07:00 PMJason
07:00 PMJason
07:01 PMTimon
07:57 PMJason
08:00 PMTimon
08:20 PMJason
08:20 PMTimon
08:23 PMJason
08:24 PM1
Timon
09:04 PMJason
09:19 PMTimon
09:26 PM1
Dec 16, 2021 (23 months ago)
Timon
01:00 PMTimon
01:10 PMTimon
01:10 PMKishore Nallan
01:13 PMTimon
01:14 PMKishore Nallan
01:14 PMKishore Nallan
01:15 PMTimon
01:16 PM{
"name": "geo-objects",
"fields": [
{
"name": "address",
"type": "string",
"facet": false,
"optional": false,
"index": true
},
{
"name": "geo_point",
"type": "geopoint",
"facet": false,
"optional": false,
"index": true
}
],
"default_sorting_field": ""
`}``
Timon
01:23 PMKishore Nallan
01:24 PMTimon
01:38 PM/collections/geo-objects/documents/search/?q=Highway 77&query_by=address&sort_by=geo_point(34.995200, -80.976930):asc
classic query: search addresses near a point (e.g. users position)Kishore Nallan
01:42 PMBut I do wonder how much percentage of the 10 GB RAM is on account of the geo index. Since the geo index is fairly new, there are probably things we can do to optimize that if that takes a much large portion than the text index.
Timon
01:59 PMTimon
02:01 PMKishore Nallan
02:01 PMTimon
02:57 PMKishore Nallan
02:58 PMKishore Nallan
02:58 PMTimon
02:59 PMKishore Nallan
03:00 PMKishore Nallan
03:01 PMTimon
03:05 PMTimon
03:06 PMKishore Nallan
03:06 PMKishore Nallan
03:07 PMTimon
03:07 PMKishore Nallan
03:07 PMTimon
03:07 PMTimon
03:08 PMKishore Nallan
03:08 PMTimon
03:08 PMKishore Nallan
03:09 PMDec 30, 2021 (22 months ago)
Timon
12:22 PMKishore Nallan
12:56 PMDec 31, 2021 (22 months ago)
Timon
10:55 AMTypesense
Indexed 2779 threads (79% resolved)
Similar Threads
Discussing Document Indexing Speeds and Typesense Features
Thomas asks about the speed of indexing and associated factors. The conversation reveals that larger batch sizes and NVMe disk usage can improve speed, but the index size is limited by RAM. Jason shares plans on supporting nested fields, and they explore a solution for products in multiple categories and catalogs.
Discussion on Typesense's Dataset Indexing Limitations
Timon queries about Typesense's data size limit, Jason and Kishore Nallan explained that it stores index in memory for optimum performance, also suggested a trial with 10% of Timon's data which is approximately 1500GB.
Optimizing Typesense Implementation for Large Collections
Oskar faced performance issues with his document collection in Typesense due to filter additions. Jason suggested trying a newer Typesense build and potentially partitioning the data into country-wise collections. They also discussed reducing network latency with CDN solutions.
Understanding Indexing and Search-As-You-Type In Typesense
Steven had queries about indexing and search-as-you-type in Typesense. Jason clarified that bulk updates are faster and search-as-you-type is resource intensive but worth it. The discussion also included querying benchmarks and Typesense's drop_tokens_threshold parameter, with participation from bnfd.
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.