Hi everyone, I’m trying to add Typesense to my Doc...
# community-help
o
Hi everyone, I’m trying to add Typesense to my Docusaurus documentation website. I’m using Typesense Cloud. I couldn’t figure how to make the cluster index my website, and all of the requests coming from my website are returning an error
{code: 404, error: "Could not find a field named 'hierarchy.lvl0' in the schema."}
Anyone can help?
j
@Oz Heymann Did you already run the docsearch-scraper against your site? The scraper is the one that indexes your site into Typesense and creates the schema for you
o
For some reason I thought that with Typesense Cloud this happens automatically. Will do it now, thank you
👍 1
@Jason Bosco I’ve tried running the scraper now, using the recommended config for Docusaurus exactly except the
start_urls
and
sitemap_urls
. This is the result, and I’m not sure why
Copy code
docker run -it --env-file=./.env -e "CONFIG=$(cat configs/public/config.json | jq -r tostring)" typesense/docsearch-scraper
INFO:scrapy.utils.log:Scrapy 2.2.1 started (bot: scrapybot)
INFO:scrapy.utils.log:Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.6.9 (default, Dec  8 2021, 21:08:43) - [GCC 8.4.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k  25 Mar 2021), cryptography 3.4.7, Platform Linux-5.10.47-linuxkit-x86_64-with-Ubuntu-18.04-bionic
DEBUG:scrapy.utils.log:Using reactor: twisted.internet.epollreactor.EPollReactor
INFO:scrapy.crawler:Overridden settings:
{'DUPEFILTER_CLASS': 'src.custom_dupefilter.CustomDupeFilter',
 'LOG_ENABLED': '1',
 'LOG_LEVEL': 'ERROR',
 'TELNETCONSOLE_ENABLED': False,
 'USER_AGENT': 'Algolia DocSearch Crawler'}
INFO:scrapy.middleware:Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
INFO:scrapy.middleware:Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats',
 'src.custom_downloader_middleware.CustomDownloaderMiddleware']
INFO:scrapy.middleware:Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
INFO:scrapy.middleware:Enabled item pipelines:
[]
INFO:scrapy.core.engine:Spider opened
INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG:scrapy.core.engine:Crawled (200) <GET <https://docs.blinkops.com>> (referer: None)
DEBUG:scrapy.core.engine:Crawled (200) <GET <https://docs.blinkops.com/sitemap.xml>> (referer: None)
> DocSearch: <https://docs.blinkops.com> 0 records)
INFO:scrapy.core.engine:Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:
{'downloader/request_bytes': 427,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 2873,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'elapsed_time_seconds': 0.513301,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 674386),
 'memusage/max': 64651264,
 'memusage/startup': 64651264,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2022, 6, 1, 14, 38, 18, 161085)}
INFO:scrapy.core.engine:Spider closed (finished)

Crawling issue: nbHits 0 for docusaurus-2
j
I see a 404 when I visit https://docs.blinkops.com
Is that expected?
o
Yep, it was up when I tried it - I removed the deployment because the website is still in work But I re-deployed again for testing purposes
It’s up now - same output
j
@Oz Heymann Looks like there's a client-side redirect when you visit https://docs.blinkops.com/.... Could you try setting your start URLs to: https://docs.blinkops.com/docs/documentation https://docs.blinkops.com/docs/Integrations/