#community-help

Discussing Data Retrieval in Typesense Cloud Tool

TLDR Ricardo inquired about the impact of using non-searched fields in data records with Typesense. Jason explained that all fields are fetched from the disk, even if unindexed, pointing out it might not affect performance, with the benefit of reducing separate database API calls.

Powered by Struct AI

1

Apr 30, 2021 (33 months ago)
Ricardo
Photo of md5-914a8b39b82fd99b8ecd985427660deb
Ricardo
07:56 PM
the consequence of this I assume is that my data will take longer to retrieve if I want to retrieve those fields (not search through them)? Anything else I should be aware of?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:57 PM
That's pretty much it. We do use RocksDB to store these docs on disk, so it should still be fast to retrieve. But if you notice any performance issues, using SSDs would help
Ricardo
Photo of md5-914a8b39b82fd99b8ecd985427660deb
Ricardo
08:01 PM
so question, why would one use this? dataset too big and can't fit on ram?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:07 PM
This would be useful for cases where you don't need to search through all the data in your record, but still want to use the data from the record for say display purposes. So instead of having to make an API call to Typesense first to search through the data and then separately make an API call to your database to fetch other fields for the record, you can put all related data in Typesense to reduce multiple API calls
08:09
Jason
08:09 PM
For eg: let's say you're storing metadata about videos in your records. You want to allow users to search by title and author of the video, and you also want to link to say the Youtube link in your search UI.

Though you're not searching directly in the Youtube link field, you can still store it in the record, so you can use to render your search UI efficiently, with just the response from Typesense
Ricardo
Photo of md5-914a8b39b82fd99b8ecd985427660deb
Ricardo
08:20 PM
thanks for the detailed explanation
08:21
Ricardo
08:21 PM
that's what I was going to use it for. but I'm concerned it might impact the results if they all have this extra field that I don't search through. that said thinking about it, the extra database call, wouldn't be any better.
08:21
Ricardo
08:21 PM
This isn't a concern now I'm just getting started
08:22
Ricardo
08:22 PM
but is there a way to keep it in memory but not be searchable?
08:22
Ricardo
08:22 PM
ignore that I define what fields get searched anyway, so it doesn't matter
08:23
Ricardo
08:23 PM
thanks ๐Ÿ™‚
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:42 PM
I'd recommend benchmarking to see how much performance impact there is if you add non-indexed fields to the document. I'd suspect it's minimal based on what I've seen. For eg, in the songs showcase I do exactly what I mentioned above - the URLs for each song are unindexed and stored on disk and it still seems pretty fast.
May 01, 2021 (33 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:01 AM
Correction to what I said earlier: once the list of document IDs are determined from in-memory indices, we actually fetch all fields from disk (unless specified otherwise in the include_fields param) to assemble the final result document, regardless of whether it's indexed or not. So performance will be identical.
Ricardo
Photo of md5-914a8b39b82fd99b8ecd985427660deb
Ricardo
05:31 AM
thanks
May 02, 2021 (33 months ago)
Ricardo
Photo of md5-914a8b39b82fd99b8ecd985427660deb
Ricardo
06:12 AM
Jason just a follow up on previous statement. So even if all my fields are in memory indices, everything (or what's specificied in include_fields) still gets fetched from the disk?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:15 PM
Thatโ€™s correct
03:17
Jason
03:17 PM
Everything = the final documents that will be returned as part of the response, so 10 documents by default since thatโ€™s the default per_page value

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Enhancing Vector Search Performance and Response Time using Multi-Search Feature

Bill faced performance issues with vector search using multi_search feature. Jason and Kishore Nallan suggested running models on a GPU and excluding large fields from the search. Through discussion, it was established that adding more CPUs and enabling server-side caching could enhance performance. The thread concluded with the user reaching a resolution.

3

140
1mo

Discussing Typesense Search Request Performance

Al experienced longer-than-reported times for Typesense search requests, sparking a detailed examination of json parsing, response times and data transfer. Jason and Kishore Nallan helped solve the issue.

2

37
33mo

Integrating Semantic Search with Typesense

Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.

6

75
11mo

Discussions on Typesense, Collections, and Dynamic Fields

Tugay shares plans to use Typesense for their SaaS platform and asks about collection sizes and sharding. Jason clarifies Typesense's capabilities and shares a beta feature. They discuss using unique collections per customer and new improvements. Kishore Nallan and Gabe comment on threading and data protection respectively.

3

45
35mo

Typesense Performance with Large Datasets & Custom Sort

krok inquires about Typesense's performance on large datasets and custom sorting. Kishore Nallan explains that Typesense is optimized for this scenario using pagination and text relevance.

1

35
10mo