#community-help

Partial Word Search in Datasets

TLDR Tatu inquired about partial word search in datasets. Kishore Nallan and Jason proposed solutions, with Tatu later submitting a GitHub issue for tracking.

Powered by Struct AI
pray2
+11
8
24mo
Solved
Join the chat
Sep 28, 2021 (24 months ago)
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
02:16 PM
Is there currently any way to search for partial matches in words? Eg. a product with code 157B2210 should be findable with query B2210, but currently that doesn't seem possible as only prefix queries are supported?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:19 PM
We've just had a couple of other customers ask for this feature. This requires suffix searching, which is not an easy problem to solve. It requires building a suffix tree, which is really complex and requires a high memory foot print or requires just doing a fast brute force search on the strings. It's a memory vs speed trade off.

How big is your dataset?
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
02:31 PM
Around 300k records, using ~700MB of memory
02:34
Tatu
02:34 PM
Perhaps a possible brute force search could be enabled on a field by field basis? For example when searching through product codes, the strings are short and probably quickly searchable.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:39 PM
Yes that should be doable.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:14 PM
One workaround in the meantime is to split each character combination into an array field and search on that.
+11
05:50
Jason
05:50 PM
Tatu Mind creating a github issue for this, so we can track it?