Hi all, I am currently working on a document searc...
# community-help
h
Hi all, I am currently working on a document search using Typesense and need support for boolean operators. As far as I know, this is only possible via the "filter_by" parameter, so I set "q" to a wildcard and used only filters:
{'q': '*', 'query_by': 'title', 'filter_by': '(ocr_text:Lorem)', 'per_page': 30, 'page': 1, 'sort_by': ''}
But for large documents (> 40 pages PDF), the filter parameter is not working very well. Full word searches return almost no results. Unfortunately, the PDFs have poor OCR quality, so typo-tolerance is essential. Has anyone run into something similar and/or has suggestions?
a
Hi, Typo-tolerance is a feature only present in the
q
parameter. So you want to find a way to leave the
filter_by
to the boolean searches and then proceed with the search on
q
. You definitely want to break these large PDFs into smaller chunks and index each chunk as a seperate document in Typesense to improve relevance. You can, for example, create a
pdf_name
field that stores which document this is related to and when the search is successful, use this
pdf_name
to fetch the complete PDF.