One question: I'm looking to functionally create s...
# community-help
t
One question: I'm looking to functionally create something akin to a custom search engine, focusing only on spidering and searching inside a set of say 3,000 to 30,000 websites we have identified already, akin to what Google calls a "Programatic Search engine' but I want far more control over indexing than Google offers... can TypeSense be used for this?
j
Hi Tim, Typesense is a JSON-document based search engine. So as long as you're able to extract the data into JSON objects and push it to Typesense, you can search for it.
One thing to note though is that Typesense does not have a built-in crawler. So you would have to use a separate crawler, that parses the webpages, transforms them into JSON and pushes them to Typesense
👍 1
t
Has anyone else built a web crawler that works with Typesense this way?
j
We have a Gatsby plugin, that does probably 70% of the crawling, parsing & indexing: https://github.com/typesense/gatsby-plugin-typesense/blob/master/gatsby-node.js But given that it's a Gatsby plugin, it goes off of a local build directory of static HTML files. It could be used to build a more generic crawling use case on top
👍 1