HL F
05/15/2025, 4:36 PM{'q': '*', 'query_by': 'title', 'filter_by': '(ocr_text:Lorem)', 'per_page': 30, 'page': 1, 'sort_by': ''}
But for large documents (> 40 pages PDF), the filter parameter is not working very well. Full word searches return almost no results. Unfortunately, the PDFs have poor OCR quality, so typo-tolerance is essential.
Has anyone run into something similar and/or has suggestions?Alan Martini
05/15/2025, 4:53 PMq
parameter. So you want to find a way to leave the filter_by
to the boolean searches and then proceed with the search on q
.
You definitely want to break these large PDFs into smaller chunks and index each chunk as a seperate document in Typesense to improve relevance. You can, for example, create a pdf_name
field that stores which document this is related to and when the search is successful, use this pdf_name
to fetch the complete PDF.