Issue Integrating Typesense with Docusaurus Documentation Website
TLDR Oz had issues integrating Typesense with their Docusaurus documentation website, Jason suggested running the docsearch-scraper and making adjustments to the start URLs to resolve the issue.
1
May 31, 2022 (19 months ago)
Oz
03:20 PMI’m trying to add Typesense to my Docusaurus documentation website. I’m using Typesense Cloud. I couldn’t figure how to make the cluster index my website, and all of the requests coming from my website are returning an error
{code: 404, error: "Could not find a field named 'hierarchy.lvl0' in the schema."}
Anyone can help?
Jason
03:21 PMJun 01, 2022 (19 months ago)
Oz
11:53 AM1
Oz
02:45 PMstart_urls
and sitemap_urls
. This is the result, and I’m not sure whydocker run -it --env-file=./.env -e "CONFIG=$(cat configs/public/config.json | jq -r tostring)" typesense/docsearch-scraper
INFO:scrapy.utils.log:Scrapy 2.2.1 started (bot: scrapybot)
INFO:scrapy.utils.log:Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.6.9 (default, Dec 8 2021, 21:08:43) - [GCC 8.4.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021), cryptography 3.4.7, Platform Linux-5.10.47-linuxkit-x86_64-with-Ubuntu-18.04-bionic
DEBUG:scrapy.utils.log:Using reactor: twisted.internet.epollreactor.EPollReactor
INFO:scrapy.crawler:Overridden settings:
{'DUPEFILTER_CLASS': 'src.custom_dupefilter.CustomDupeFilter',
'LOG_ENABLED': '1',
'LOG_LEVEL': 'ERROR',
'TELNETCONSOLE_ENABLED': False,
'USER_AGENT': 'Algolia DocSearch Crawler'}
INFO:scrapy.middleware:Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
INFO:scrapy.middleware:Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats',
'src.custom_downloader_middleware.CustomDownloaderMiddleware']
INFO:scrapy.middleware:Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
INFO:scrapy.middleware:Enabled item pipelines:
[]
INFO:scrapy.core.engine:Spider opened
INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG:scrapy.core.engine:Crawled (200) <GET https://docs.blinkops.com> (referer: None)
DEBUG:scrapy.core.engine:Crawled (200) <GET https://docs.blinkops.com/sitemap.xml> (referer: None)
> DocSearch: https://docs.blinkops.com 0 records)
INFO:scrapy.core.engine:Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:
{'downloader/request_bytes': 427,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 2873,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'elapsed_time_seconds': 0.513301,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 674386),
'memusage/max': 64651264,
'memusage/startup': 64651264,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 161085)}
INFO:scrapy.core.engine:Spider closed (finished)
Crawling issue: nbHits 0 for docusaurus-2
Jason
02:47 PMJason
02:47 PMOz
02:49 PMBut I re-deployed again for testing purposes
Oz
02:49 PMJason
03:14 PMhttps://docs.blinkops.com/docs/documentation
https://docs.blinkops.com/docs/Integrations/
Typesense
Indexed 3015 threads (79% resolved)
Similar Threads
Configuring Docusaurus and Typesense for a Documentation Site
Apoorv had trouble adding search functionality to a Docusaurus documentation website with Typesense. Jason worked through several troubleshooting steps, identified issues with Apoorv's setup, and ultimately provided solutions that successfully implemented the search bar function.
Typesense Integration Issue in Docusaurus
Benjamin experienced an error implementing Typesense (TS) in Docusaurus. Jason identified the correct placement of the 'typesense' key within the 'themeConfig' within the docusaurus config file, resolving the issue.
Solving Typesense Docsearch Scraper Issues
Sandeep was having issues with Typesense's docsearch scraper and getting fewer results than with Algolia's scraper. Jason helped by sharing the query they use and advised checking the running version of the scraper. The issue was resolved when Sandeep ran the non-base regular docker image.
Troubleshooting Issues with DocSearch Hits and Scraper Configuration
Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.
Crawler Deleting Old Collection and Creating New Name
James faced issues with Typesense as the crawler changed collection names, breaking their production website. Jason suggested changing "index_name" in their config file to their desired name and explained the reason behind the name combination differences.