new_in_town
10/13/2024, 10:11 AM{
"name": "published",
"type": "int64",
"index": true
},
{
"name": "published_str",
"type": "string",
"index": false,
"store": true
},
{
"name": "title",
"type": "string"
},
{
"name": "content_ind",
"type": "string"
},
{
"name": "content",
"type": "string",
"index": false
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"title",
"content_ind"
],
"model_config": {
"model_name": "ts/distiluse-base-multilingual-cased-v2"
}
}
}
]
}
fields like published
, published_str
(string representation of date), title
, are quite short. There are a few other, similar short fields not shown here.
About content
and `content_ind`: almost the same text there; the only difference: formatting/presence of HTML tags; "`_ind`" means "indexed, without HTML".
Why two fields: this is a workaround to show in browser the content
(with all the HTML tags/formatting)
and to index the content_ind
without HTML tags/formatting.
Also I managed to show good-looking highlighting in hybrid search. The `content`/`content_ind` field in usually quite long, like half of screen/one screen.
@Kishore Nallan - How wrong is this trick with two fields? any known side effects?