Hey TypeSense builders and Typsense community! :wa...
# community-help
i
Hey TypeSense builders and Typsense community! šŸ‘‹ I was just wondering if anyone had any guidance or tips as to how I could further optimize my hybrid search experience for my ecommerce website, https://trypricepilot.com. Products that I have in my database would be items that are sold on Best Buy US, Walmart, Amazon etc. I have used the Cookbook model SBERT to do my embeddings on the 24,000 products I have in my DB. Problems • I feel that the results returned for many queries don't always show products that I would consider most relevant at the top of the results grid. I am currently using a hybrid search with alpha set to 0.4 • Main example: ā—¦ I should get 0 results for certain long-tailed searches as I don't have those products, e.g.
red light therapy mask
. I end up getting ~2100 results across a wide range of categories. What I need support with • As a solo-builder-preneur who is new to search and I suppose expected "better" relevance out of the box, what optimizations could I make for my ecommerce context? ā—¦ I'm also learning that of course, improving search relevance is an extremely iterative process and isn't easy • I have a hunch that I somehow need to detect broad vs ambiguous vs narrow searches and that there are likely a whole host of other things I have no idea about! • I realize there's also paid help, so would be open to discussing this further if that's a more appropriate request for this. Thanks in advance! šŸ™ === Algorithm
Copy code
const baseSearchParams = {
  enable_nested_fields: true,
  prioritize_exact_match: true,
  prioritize_token_position: true,
  typo_tokens_threshold: 1,
  min_len_1typo: 4,
  min_len_2typo: 8,
  prefix: true, // Enable prefix search for partial word matches
  exhaustive_search: true,
  split_join_tokens: 'always',
  highlight_affix_num_tokens: 4,
  drop_tokens_threshold: 0,
  min_hit_score: 3.0,
  exclude_fields: '',
  text_match_type: 'phrase',

  sort_by: "_eval([ \
    (categoryNames.lvl1:Appliance Parts & Accessories):-1, \
    (categoryNames.lvl1:Camcorder Accessories):-1, \
    (categoryNames.lvl1:Cell Phone Accessories):-1, \
    (categoryNames.lvl1:Tablet Accessories):-1, \
    (categoryNames.lvl1:Home Audio Accessories):-1, \
    (categoryNames.lvl3:Turntables & Accessories):-1 \
  ]):desc,_text_match:desc",
};

// Vector search parameters
const vectorSearchParams = {
  ...baseSearchParams,
  query_by: 'name,name_ngram,brand,modelNumber,upc,description,categoryNames.lvl1,categoryNames.lvl2,categoryNames.lvl3,categoryNames.lvl4,product_embedding',
  query_by_weights: '15,12,10,8,6,5,4,3,2,1,0',
  // Match num_typos to each field in query_by
  num_typos: '1,1,1,0,0,2,1,1,1,1,0',
  vector_query: 'product_embedding:([], alpha: 0.4)',
};
Product Schema
Copy code
fields: [
    { name: 'id', type: 'string' },
    { name: 'name', type: 'string', enable_nested_fields: true },
    { 
      name: 'name_ngram', 
      type: 'string[]', 
      symbols_to_index: ['*'],
      token_separators: [' ', '-', '.'],
      enable_nested_fields: true 
    },
    { name: 'brand', type: 'string', facet: true },
    { name: 'modelNumber', type: 'string', facet: true },
    { name: 'upc', type: 'string' },
    { 
      name: 'description', 
      type: 'string', 
      optional: true,
      max_length: 125
    },
    { name: 'available_in_stores', type: 'int32', facet: true },
    { name: 'averageRating', type: 'float', facet: true, optional: true, sort: true },
    { name: 'condition', type: 'string', facet: true, optional: true },
    { name: 'image', type: 'string', optional: true },
    { name: 'url', type: 'string', optional: true },
    { name: 'categoryNames', type: 'string[]', facet: true, optional: true },
    { name: 'spec.*', type: 'string[]', facet: true, optional: true },
    { name: 'customerPrice', type: 'float', facet: true, optional: true, sort: true },
    { name: 'regularPrice', type: 'float', optional: true, sort: true },
    { name: 'onSale', type: 'bool', facet: true, optional: true, sort: true },
    { name: 'slug', type: 'string', facet: true },
    { name: 'categoryNames.lvl0', type: 'string[]', facet: true, optional: true },
    { name: 'categoryNames.lvl1', type: 'string[]', facet: true, optional: true },
    { name: 'categoryNames.lvl2', type: 'string[]', facet: true, optional: true },
    { name: 'categoryNames.lvl3', type: 'string[]', facet: true, optional: true },
    { name: 'categoryNames.lvl4', type: 'string[]', facet: true, optional: true },
   
  ],
  token_separators: ['-', '_', ' '],
  symbols_to_index: ['-', '_'],
  enable_nested_fields: true
};
k
Relevancy depends a lot on the embedding model used. I recommend using something like
e5-small-v2
to compare that relevancy with what you are using now.
I don't have those products, e.g.
red light therapy mask
. I end up getting ~2100 results across a wide range of categories.
You can set up a
distance_threshold
for the vector_query to only retrieve results that match certain similarity score: https://typesense.org/docs/27.1/api/vector-search.html#distance-threshold
šŸ™ 1
i
Hey Kilshore. Thanks for your response. I will look into this after work. Cheers!
a
Hi Ivan, I'm wondering if you found a good way to get meaningful results from hybrid / vector search for ecommerce?
i
Hey @Alex K. Semantic and hybrid search are back in my backlog as I decided to pursue building out other parts of my site. I'll probably be getting back to this within the next month. Did you get anywhere with it?