#community-help

Discussing Denormalization and Performance in Typesense Collections

TLDR Viji inquired about the performance implications of denormalizing Typesense collections, and also self-hosting. Jason explained that denormalized data is more performant and clarified how typesense handles queries, latency, and provided an insight about their index structure.

Powered by Struct AI
9
19mo
Solved
Join the chat
Jun 03, 2022 (19 months ago)
Viji
Photo of md5-d2def4ce72082649c7191218a9e73146
Viji
10:06 PM
Hello, we are trying to determine if it is better to denormalize our collections to avoid doing a two-stage search or if we should keep the collections in typesense normalized. The trade-off is between record size and performance. What if we self-hosted, will the performance issue be mitigated? Are there any other considerations we are missing?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:13 PM
In general storing data in a denormalized way is more performant in Typesense. We only store a given field's value once in the index, so it is very memory efficient especially if you have repeated values in your docs.

re: self-hosted vs Cloud, performance-wise there should be no difference. In fact we run the same open source version of Typesense to power Typesense Cloud.
Viji
Photo of md5-d2def4ce72082649c7191218a9e73146
Viji
10:24 PM
Thanks Jason - this helps regarding how to build our collections!

The performance issue I was referring to is to do with latency rather than of Typesense Cloud
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:28 PM
Typesense Cloud has a presence in 20 geo regions, and coupled with the Search Delivery Network feature, you should be able to get pretty low latencies, especially if you're sending queries directly from the browser/app to Typesense.

If you're sending queries from your frontend to your backend and then to Typesense, then running Typesense in the same network as your backend will have the lowest latency. You could pick a region that's closest to your backend, and that will add may be 10-20ms of latency.
10:29
Jason
10:29 PM
This is just network latency btw. Search (CPU) processing time will be identical
Viji
Photo of md5-d2def4ce72082649c7191218a9e73146
Viji
10:32 PM
Thank you - this helps! Where can I get some documentation on
we only store a given field's value once in the index
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:33 PM
That particular part is not documented anywhere, but it's essentially an inverted index data structure: https://en.wikipedia.org/wiki/Inverted_index
Viji
Photo of md5-d2def4ce72082649c7191218a9e73146
Viji
10:39 PM
Thank you Jason! Appreciate your quick help on this!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:39 PM
Happy to help!

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community

Similar Threads

Optimizing Typesense Implementation for Large Collections

Oskar faced performance issues with his document collection in Typesense due to filter additions. Jason suggested trying a newer Typesense build and potentially partitioning the data into country-wise collections. They also discussed reducing network latency with CDN solutions.

5

67
11mo
Solved

Revisiting Typesense for Efficient DB Indexing and Querying

kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.

1

46
9mo
Solved

Troubleshooting Typesense Document Import Error

Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.

3

30
10mo
Solved

Improving Typesense Query Performance

Jonathan queried about slower than expected typesense query performance. Jason and Kishore Nallan offered solutions and explanations. After a series of tests, Jonathan found other queries returned results quickly, indicating the issue was specific to the original query.

3

26
13mo
Solved

Discussing Document Indexing Speeds and Typesense Features

Thomas asks about the speed of indexing and associated factors. The conversation reveals that larger batch sizes and NVMe disk usage can improve speed, but the index size is limited by RAM. Jason shares plans on supporting nested fields, and they explore a solution for products in multiple categories and catalogs.

5

63
23mo
Solved