#community-help

Resolving Typesense Errors in Docusaurus Site

TLDR Kevin encounters problems with typesense on their Docusaurus site. Jason assists with variable and port number issues, but scrappy limitations lead to unresolved troubleshooting.

Powered by Struct AI

1

Jun 30, 2022 (18 months ago)
Kevin
Photo of md5-45adb9b82f35de95f4d380095b6d7505
Kevin
01:04 PM
👋 Hi everyone! I don't know if this is the right place, but I have just discovered typesense and am trying to get it to work. Would anyone know where to look? I have described a particular problem on Stack Overflow. Thanks!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:28 PM
Kevin I don't see any errors in the logs you posted. That's just the scraper doing it's thing and indexing docs in Typesense.

You want to wait for it to fully complete before trying to search
Jul 01, 2022 (18 months ago)
Kevin
Photo of md5-45adb9b82f35de95f4d380095b6d7505
Kevin
06:19 AM
Thanks Jason.
06:21
Kevin
06:21 AM
You wouldn't have any comments on the env and config files, would you? Thanks again.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:55 PM
The env and config files look fine to me
Jul 04, 2022 (18 months ago)
Kevin
Photo of md5-45adb9b82f35de95f4d380095b6d7505
Kevin
12:00 PM
Then would you have any idea why the search box on our Docusaurus site just hangs there? Thanks!
01:48
Kevin
01:48 PM
Actually I figued this out. The typesenseCollectionName: parameter was not set. When this parameter is set, then the search box will not hang.

1

03:36
Kevin
03:36 PM
There is a new problem - how to run the docsearch scraper on a URL that contains a port number? Is there something called 'allowed_domains' that can be modified to accept URLs with ports?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:52 PM
The scraper should work with port numbers already…
Jul 05, 2022 (18 months ago)
Kevin
Photo of md5-45adb9b82f35de95f4d380095b6d7505
Kevin
03:02 PM
It seems that it works by default only with ports 80 and 443. Port 80 has to forwarded to port 3000 in order for docsearch to scrape port 3000. There is a discussion of this on github here - https://github.com/typesense/typesense/issues/628#issuecomment-1174040085.
03:21
Kevin
03:21 PM
The heart of the problem is that docsearch 'finds' the different web pages in sitemap.xml but when it attempts to crawl through them it substitutes the IP address of the localhost with the URL of the organization. It cannot find the docusaurus pages under the URL for the reason that they are just not there! If this substitution could be prevented or compensated for, then I could run docsearch on my development site,
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:02 PM
Ah my bad, I didn't realize this was a limitation of scrappy, which is the underlying library the docsearch-scraper uses...
Jul 06, 2022 (18 months ago)
Kevin
Photo of md5-45adb9b82f35de95f4d380095b6d7505
Kevin
07:55 AM
Thanks. How did you know that this was a limitation of scrappy?
09:45
Kevin
09:45 AM
And would you know of a workaround? Kind regards
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:08 PM
I figured that out from the stacktrace posted in the issue:

WARNING:py.warnings:/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/
site-packages/scrapy/spidermiddlewares/offsite.py:69: PortWarning: allowed_domains 
accepts only domains without ports. Ignoring entry host.docker.internal:3000 in 
allowed_domains.  warnings.warn(message, PortWarning)

The file that's throwing that exception is inside the scrapy package. I also searched their source code for that error message, and confirmed that the error is coming from within scrappy.

re: workaround, the IP Tables based approach mentioned in the Github issue shared seems to work.
Jul 07, 2022 (18 months ago)
Kevin
Photo of md5-45adb9b82f35de95f4d380095b6d7505
Kevin
07:33 AM
Redirecting the port helped resolve one problem, but unfortunately scrapy 'converted' the IP address that was passed in docker to the URL of the the organization. A new set of error messages appeared. DEBUG:scrapy.core.engine:Crawled (404) &lt;GET <https://www.algotrader.com/docs/virtual_spot_positions>&gt; (referer: <http://host.docker.internal:3000/sitemap.xml>)
07:55
Kevin
07:55 AM
scrappy can read the sitemap.xml file via the redirected port, since it was able to detect virtual_spot_positions which is the name of a page in the docusaurus site. Unfortunately, it subsitutes the IP address implied by host.docker.interal with the URL of the organization. And it can't find virtual_spot_positions of course. Arrgh. Redirecting the port does solve one problem but led in this case to another. I will attempt to duplicate the environment that was described in the GitHub posting. In the case where I encountered this problem, the docusaurus site was running in Windows whereas typesense was running in docker run on WSL Ubuntu on the same physical machine. In the configuration where port redirection succeeded, the docusaurus ssite was running on ubuntu. This might explain why it works in one situation but not another.
Jul 08, 2022 (18 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:21 PM
Another shotgun approach to this could be run something like ngrok to create a tunnel endpoint for your localhost site, so it's accessible via https on 443, and then set the docsearch-scraper to run against that ngrok tunnel endpoint

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Trouble with DocSearch Scraper and Pipenv Across Multiple OSs

James ran into errors when trying to build Typesense DocSearch Scraper from scratch, and believes it’s because of a bad Pipfile.lock. Jason attempted to replicate the error, and spent hours trying to isolate the issue but ultimately fixed the problem and copied his bash history for future reference. The conversation touches briefly on the subject of using a virtual machine for testing.

7

161
10mo

Configuring Docusaurus and Typesense for a Documentation Site

Apoorv had trouble adding search functionality to a Docusaurus documentation website with Typesense. Jason worked through several troubleshooting steps, identified issues with Apoorv's setup, and ultimately provided solutions that successfully implemented the search bar function.

1

69
29mo

Troubleshooting Typesense Docsearch Scraper Setup Issue

Vinicius experienced issues setting up typesense-docsearch-scraper locally. Jason identified a misconfiguration with the Typesense server after checking the .env file, and recommended using ngrok or port forwarding for development purposes. Vinicius successfully resolved the issue with port forwarding.

2

12
5mo

Solving Typesense Docsearch Scraper Issues

Sandeep was having issues with Typesense's docsearch scraper and getting fewer results than with Algolia's scraper. Jason helped by sharing the query they use and advised checking the running version of the scraper. The issue was resolved when Sandeep ran the non-base regular docker image.

28
24mo

Troubleshooting TypeSense Errors with Docusaurus Website

James experienced errors integrating TypeSense with a Docusaurus website. Kishore Nallan and Jason provided support, resolving issues with SSL, quoting in the ini file, and a missing index.

27
10mo