#community-help

Creating a Custom Search Engine with TypeSense

TLDR Tim asked if TypeSense could be used to make a custom search engine. Jason explained that it's possible for JSON-documents, but they'd need a separate crawler. They suggested a Gatsby plugin as a potential solution.

Powered by Struct AI

2

5
30mo
Solved
Join the chat
May 16, 2021 (30 months ago)
Tim
Photo of md5-f270bb2a84f6aca680c86761eda6a983
Tim
12:21 AM
One question: I'm looking to functionally create something akin to a custom search engine, focusing only on spidering and searching inside a set of say 3,000 to 30,000 websites we have identified already, akin to what Google calls a "Programatic Search engine' but I want far more control over indexing than Google offers... can TypeSense be used for this?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:23 AM
Hi Tim, Typesense is a JSON-document based search engine. So as long as you're able to extract the data into JSON objects and push it to Typesense, you can search for it.
01:24
Jason
01:24 AM
One thing to note though is that Typesense does not have a built-in crawler. So you would have to use a separate crawler, that parses the webpages, transforms them into JSON and pushes them to Typesense

1

Tim
Photo of md5-f270bb2a84f6aca680c86761eda6a983
Tim
01:33 AM
Has anyone else built a web crawler that works with Typesense this way?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:34 AM
We have a Gatsby plugin, that does probably 70% of the crawling, parsing & indexing: https://github.com/typesense/gatsby-plugin-typesense/blob/master/gatsby-node.js

But given that it's a Gatsby plugin, it goes off of a local build directory of static HTML files.

It could be used to build a more generic crawling use case on top

1