Hi All, we're using OpenAI for embeddings but we n...
# community-help
c
Hi All, we're using OpenAI for embeddings but we noticed that all search queries coming from embeddings limits the returned hits to 100? like we always get fewer 100, we're not 100% on this one, like it's possible that the hits really are just less than 100, but just want to know if there's a set limit to search via embeddings?
k
Is your per_page set to 100?
c
@Kishore Nallan nope, per_page is set to 15
k
Can you post the exact query parameters used for a sample query?
c
@Kishore Nallan
Copy code
search_parameters = {
            'collection': collection_name,
            'q': 'Bauwesen',
            'filter_by' : <filter list>,
            'include_fields': 'id',
            'per_page': 15,
            'page': 0,
            'prefix': False,
            'max_facet_values': 500,
            'facet_strategy': 'top_values',
            'query_by': 'title, description_de, embedded_title_de',
            'vector_query': <distance_threshold=.70>,
            'query_by_weights': 10,4,4
        }
other keywords: • Stahl • Bauwesen • Apotheke • Automatisierung • Zahnräder • Personalbüro
k
It might because there is a distance threshold, the hnsw search needs to be more exhaustive to find more matches. See the
ef
search parameter here: https://typesense.org/docs/27.0/api/vector-search.html#configuring-hnsw-parameters You can try increasing it to 100
c
@Kishore Nallan also, we have multiple embedding columns for different languages (e.g embedding_en, embedding_nl( from title_nl field with dutch titles), embedding_fr ect), when doing search on these non-english embeddings, does it perform the same process as when searching on english embeddings? I mean does it consider that the embedding fields are from non-english texts?
k
Embeddings are just stored as vectors, so the language does not matter.
c
@Kishore Nallan yea I mean, if I search
Apotheke
(German for pharmacy), it will search on the embeddings from title_de field, will it return pharmacy-related hits?even though the titles are German language sorry for these clarifications
k
That depends on the multi-lingual capabilities of the embedding model. Whether they can project similar words in different languages in the same semantic search space.
c
@Kishore Nallan i see, regarding the
ef
search parameters, thank you for this, will take a look but not sure if this is the reason since the issue only occurs when searching using non-english keywords but works fine on english keywords, maybe this answers that
Copy code
That depends on the multi-lingual capabilities of the embedding model. Whether they can project similar words in different languages in the same semantic search space.
👍 1