I am working on a project which uses docusaurus fo...
# community-help
a
I am working on a project which uses docusaurus for documentation and I am trying to build a search bar for the documentation site. I am done till the step 2 from https://typesense.org/docs/0.21.0/guide/docsearch.html#step-2-run-the-scraper and confused about how to proceed with the step 3.
k
šŸ‘‹ What specific query do you have on Step 3?
a
Step 3 requires us to add the snippet to the docs. navigation right? But my project is using Docusaurus and the navigation is handled through the config file. So I am not sure where to paste the snippet.
j
@Apoorv Tiwari Docusaurus uses a custom react component to add search into the UI. Let me take a closer look at this tomorrow and get back to you with an update on how to set this up with Docusaurus.
a
Sure šŸ˜€
j
@Apoorv Tiwari Quick update on this: it looks like I'm going to have to fork some Docusaurus components and make tweaks to them to get them to work out of the box with Typesense. I've been meaning to do this for a while, but now that you asked I'll make these updates and let you know. I'll try and share something with you late next week...
a
Thanks for the update Jason šŸ™‚
j
@Apoorv Tiwari Good news! I managed to get the Docusaurus search plugin to work with Typesense. Here's an alpha version of the plugin with some quick instructions: https://www.npmjs.com/package/docusaurus-theme-search-typesense I also had to make some tiny tweaks to the scraper to make it work with docusaurus sites. So you'd want to delete the existing collection you have already created and re-run the docsearch scraper after pulling the latest docker image. Let me know how it goes!
a
Hi @Jason Bosco I did follow the steps mentioned and I was able to add a search bar to my documentation site but the search feature is not working. This is how my docsearch.config.json look like. I have copied all the configuration from Docusaurus-2.json file and changed index_name and start_url. I am also attaching output of running the scraper ,docusaurus.config.js. Also the search option is disabled for cloud typesense.
j
@Apoorv Tiwari could you share copy-pasteable version of the docsearch config in say a github gist?
I can then look deeper
a
j
@Apoorv Tiwari The issue is that docsearch expects
start_urls
to be the base URL for all pages in the documentation. So if you set it to
<https://docs.tooljet.io/docs/intro>
it expects all pages in your documentation to have that as the base URL. You'd ideally want to set the base URL (
start_urls
) to
["<https://docs.tooljet.io/docs/>"]
. But then if you visit "https://docs.tooljet.io/docs/" it does an infinite redirect. If you can fix that and then update
start_urls
, I think it should work after that...
a
Yeah this needs to fixed. However, docs is working fine in the localhost so let me try setting the start_url to ā€œhttp://localhost:3001/docs/ā€ .
I am getting a connection refused error when I am trying to run with my localhost. Is there any workaround for it?
j
Ah that’s because the scraper is running inside a docket container, so localhost refers to the docker container itself. Could you try using your computer’s private IP instead? 192.168 or 10. one?
a
Yeah I tried with http://0.0.0.0:3000/docs/ still the same error
j
You want to do
ifconfig
and use the 192.168.x.x IP or the 10.x.x.x IP from there
Another option is to use something like ngrok to open a tunnel to your local web server, and use that tunnel host name in the scraper config
a
So I tried using ngrok and this is url i get https://6380-61-2-246-175.ngrok.io . Now my documentation home page is docs/intro. Now I have tried https://6380-61-2-246-175.ngrok.io/docs as start_url but there are 0 nbhits and if I try https://6380-61-2-246-175.ngrok.io it throws an error.
j
Let me take a look now... Could you keep the tunnel running?
a
Yeah sure
Is it because of 404 on https://6380-61-2-246-175.ngrok.io ?
j
I don't see a 404 there when I visit from the browser
a
j
Ah yes, it's a client-side 404... So it's not able to find any links there
Can you redirect /docs/ to /docs/intro on the client-side?
a
There is this plugin https://docusaurus.io/docs/next/api/plugins/@docusaurus/plugin-client-redirects but it does not work in development environment šŸ˜‘
j
How do you currently redirect the root https://6380-61-2-246-175.ngrok.io/ to /docs/intro?
Actually it doesn't matter if it's server-side or client-side. As long as /docs takes you to /docs/intro
Btw, on a slightly unrelated note, I noticed that the sitemap has references to links which all throw a 404: https://docs.tooljet.io/sitemap.xml
a
I have asked the owner of the project how he has redirected it. Waiting for a response
šŸ‘ 1
We are redirecting ā€˜/’ to /docs/intro via index.html in the static folder
j
Could you try the same for /docs to /docs/intro?
a
Couldn’t do the same for ā€˜/docs/’ to ā€˜/docs/intro’. I think it had to be handled through server side and hosting setting. Asked the relevant person in the project. Let’s see
Hi Jason, we added a simple md file and now we have a page on https://docs.tooljet.io/docs/ and I tried to run the scraper and it crawls the pages but found 0 records. Do you think is there something wrong with the config file https://gist.github.com/apoorv1316/7585b00a0cf5ce94edb3f1a8558a1c07 as I just copy pasted it from docusaurs2 config and just changed index_name and start_url
j
@Apoorv Tiwari is the docs site source public? Would be great if I can run it on my local machine to debug the scraper
a
Yes it’s public. Btw do you think everything is right with the config file?
j
Oh wait you need to remove info from start url
a
Oh that i removed.. sorry the gist was older
yes this is the start_url https://docs.tooljet.io/docs/
j
The config file looks ok to me… but let me try debugging locally
a
j
It’s getting late night for me, so I’ll look into this tomorrow and get back to you
a
Sure
j
@Apoorv Tiwari After hours of stepping through the scrapper code, I finally figured out what's wrong! It turns out that these two characters in your docusaurus config are tripping up the docsearch scraper: https://github.com/ToolJet/ToolJet/blob/12d8c4af282a9e2ab97f0522ce4c4f3b27fa44a8/docs/docusaurus.config.js#L15-L16. If you change those two to something else that doesn't start with
\00
it works fine!! In fact, since you have
display: none
set further down, you can just remove those two lines, and it still hides the dark/light mode icons if that's your intention. Also, you want to set
start_urls
as
["<https://docs.tooljet.io/docs/>", "<https://docs.tooljet.io/docs/intro/>"]
. Finally, if you're trying this on localhost, you want to use ngrok even when running locally, since the scraper apparently doesn't work with port numbers.
a
@Jason Bosco It worked perfectly for the typesense cloud platform but not working on my local docs site. There is something wrong with the collection name as per the error. I am using ā€˜tooljet_docs’ as index name in docsearch.config and as typesenseCollectionName in docusaurus.config.js but the collection name on the typesense cloud is automatically generated to ā€˜tooljet_docs_1630430294’ and in error log it’s showing collection_name as ā€˜tooljet_docs_1630429560’.
j
Hmm, so this error happens when you scrape your local docs site and index to a Typesense cloud cluster?
a
Yes
j
And are you using ngrok to create a host name for your local machine and using that in the docsearch config?
Because port numbers don’t work with the scraper apparently
a
Yes I’m using ngrok
j
Ok cool, just making sure. Did you happen to delete any old collections manually from the Typesense Cloud dashboard?
That’s what could have caused the error you see.
I’d recommend just deleting all collections, and then running the scraper. You shouldn’t see that error after that
a
Yes, as I run the scraper multiple times so I used to delete the old collections from the cloud platform
Tried again. Deleted the old collection and then run the scraper. Same error with same collection_name reported in log.
j
Oh forgot one more thing, you want to delete ALL collections and also delete all the aliases (another tab in cloud dashboard) and try again.
It’s actually the alias pointing to a non-existent old collection that’s causing the issue. So we’re trying to start from a clean slate
a
Deleted the alias and now not getting an error but the search is still not working. index_name is ā€˜tooljet_docs’, typesenseCollectionName: ā€˜tooljet_docs’ and collection_name on cloud is tooljet_docs_1630431866.
j
@Apoorv Tiwari Could you commit your local docs so far to a new branch and push it to origin? I can then look closely to see why the search is not returning results.
a
j
Could you also share the latest docsearch scraper config you're using @Apoorv Tiwari?
a
j
Ok, I'll keep you posted
@Apoorv Tiwari I just tried it out and it seems to work. Here's the diff that shows the config changes needed: https://github.com/ToolJet/ToolJet/compare/ToolJet:93bb91a...jasonbosco:b101acb Also, when you're running it locally, you want to run
yarn build
and then
yarn serve
and then run the scraper. When you just run
yarn start
the content is client-side rendered via JS, so the scraper doesn't pick it up. When you run
yarn build
the site gets statically built with the full HTML and then
yarn serve
just serves the build directory.
@Apoorv Tiwari Did this work? ^