Auto embedding generation within Typesense is a great featur typesense #community-help

Auto-embedding generation within Typesense is a gr...

new_in_town

08/11/2024, 2:38 PM

Auto-embedding generation within Typesense is a great feature and it works. Still there are some questions. There are lot of parameters on schema- and field- level and after reading documentation - it is still unclear how these parameters impact auto-embedding generation. 1. On schema level:

symbols_to_index

token_separators

2. On field level:

stem

- as far as I understand: make no sense to use both stemming and embeddings/LLM. Is it correct? 3. HTML Content In such field definition:

Copy code

{
  "name": "embedding",
  "type": "float[]",
  "embed": {
    "from": [
      "title",
      "content"
    ],
    "model_config": {
      "model_name": "ts/e5-large-v2"
    }
  }
}

should I remove HTML tags from fields "title" and "content" ? 4. Highlighting I am doing Hybrid Search, and on the client side i set this:

Copy code

'query_by': 'title, content, embedding, organization.name',
      'vector_query': 'embedding:([], alpha: 0.19, distance_threshold:0.25)',

As I understand it: the highlight snippets are generated only in case of keyword match. In case a document found by semantic search - there is no highlight. Is it correct?

new_in_town

08/11/2024, 5:21 PM

And by the way: How stopwords and synonyms are related to auto-embedding generation ?

Kishore Nallan

08/12/2024, 7:51 AM

symbols_to_index, token_separators, stemming are used to process the input query first. The transformed query is used for both keyword search and embedding. It does not make sense to use stemming for embedding, but since it's a common pre-processing step, it's done. Yes remove HTML tags. Even for semantic search we will highlight if any token in query is found within the text fields in the documents found and returned in response.

new_in_town

08/12/2024, 10:08 AM

The transformed query is used for both keyword search and embedding.

Thanks, Kishore! And stopwords and synonyms ?

Kishore Nallan

08/12/2024, 10:08 AM

Stopwords are dropped from query before embedding. Synonyms are used only for keyword search.

✅ 1

new_in_town

08/12/2024, 10:13 AM

Got it! I would say, just update documentation with this info and it will be great ))

👍 1

Open in Slack

Previous Next