Hi there! I have a document like this { "titl...
# community-help
m
Hi there! I have a document like this { "title":"office bag", "img_link":"some dummy url", "img_links":[ "list of dummy url" ] } document may have one or more images, and I want to do a hybrid search on product title and image embeddings. Can I have a list of embedding for a document? Or I should create separate collection for images embedding and do a join for hybrid search?
j
You can use the CLIP model to generate embeddings for both your image and text together in a single embedding field
m
Thank you so much, but actually my real problem is how to handle embedding for multiple images
j
I would recommend creating one document per image in Typesense, so you can create one embedding per image per document
And then you could potentially use group_by if the product has product IDs for eg
m
So you are saying it's better to create a separate collection and insert one embedding per image per document, and do a hybrid search on image embedding and title, and finally group_by title or IDs, am I right?
j
That's correct... Although you won't be able to use Typesense's built-in models for hybrid search, given that these are image embeddings. You would have to generate these embeddings with the CLIP model outside of Typesense, and then combine keyword search (
q
) with the
vector_query
parameter
m
Thank you so much. Actually I'm doing that already, because I'm using Google Siglip model for it's better accuracy.
👍 1
Hello again Jason, I thought about your recommendation and I realized that this will not solve my problem, let ask it in another word. Imagine I have product like this
Copy code
{
    "title": "فر کننده مو بابلس مک استایلر  MAC STYLER titanium curling iron keratin ",
    "subtitle": "",
    "page_unique": "1885",
    "current_price": "2496000",
    "old_price": "3337000",
    "availability": "instock",
    "category_name": "لوازم برقی",
    "image_link": "<https://oss.sazito.com/apiuploads/offerie/uploads/image/rootimage/5195/8ea5bfd3b9d3f895935993a14f23ddb7.png>",
    "image_links": [
        "<https://oss.sazito.com/apiuploads/offerie/uploads/image/rootimage/5196/16fd3eb11980f587383cde28392beb5b.png>",
        "<https://oss.sazito.com/apiuploads/offerie/uploads/image/rootimage/5203/77894fd9802618351e9cc67fdeb72122.webp>",
        "<https://oss.sazito.com/apiuploads/offerie/uploads/image/rootimage/5197/caa48b74de6db19bf12bf6ccc497f735.png>"
    ],
    "page_url": "<https://offerie.ir/product/فر-کننده-مو-بابلس-مک-استایلر-MAC-STYLER-titanium-curling-iron-keratin>",
    "short_desc": "",
    "guarantee": "",
    "registry": "",
    "spec": {
        "سایز": "۱۹ میل طلایی"
    }
},
and I want to have all the search features like filter, sort, ...
but also I want to do hybrid search on image embedding and title, but the problem is that the product may have multiple images, creating multiple document for this product will cause duplicating all the product data
j
If you group by say
page_url
(or preferably some SKU or product ID field), then facet counts, filtering, etc will produce de-duplicated data
m
Thank you, but I think I couldn't clarify my question very well, actually I'm looking for something like a sub query in SQL, I want to do my hybrid search in a separate collection and then do other stuff like, filtering, sorting,... in the main collection