I am working on a project which uses docusaurus for document typesense #community-help

I am working on a project which uses docusaurus fo...

Apoorv Tiwari

08/17/2021, 4:48 AM

I am working on a project which uses docusaurus for documentation and I am trying to build a search bar for the documentation site. I am done till the step 2 from https://typesense.org/docs/0.21.0/guide/docsearch.html#step-2-run-the-scraper and confused about how to proceed with the step 3.

Kishore Nallan

08/17/2021, 4:56 AM

👋 What specific query do you have on Step 3?

Apoorv Tiwari

08/17/2021, 6:00 AM

Step 3 requires us to add the snippet to the docs. navigation right? But my project is using Docusaurus and the navigation is handled through the config file. So I am not sure where to paste the snippet.

Jason Bosco

08/17/2021, 6:14 AM

@Apoorv Tiwari Docusaurus uses a custom react component to add search into the UI. Let me take a closer look at this tomorrow and get back to you with an update on how to set this up with Docusaurus.

Apoorv Tiwari

08/17/2021, 8:25 AM

Sure 😀

Jason Bosco

08/18/2021, 2:34 AM

@Apoorv Tiwari Quick update on this: it looks like I'm going to have to fork some Docusaurus components and make tweaks to them to get them to work out of the box with Typesense. I've been meaning to do this for a while, but now that you asked I'll make these updates and let you know. I'll try and share something with you late next week...

Apoorv Tiwari

08/18/2021, 3:04 AM

Thanks for the update Jason 🙂

Jason Bosco

08/25/2021, 6:53 AM

@Apoorv Tiwari Good news! I managed to get the Docusaurus search plugin to work with Typesense. Here's an alpha version of the plugin with some quick instructions: https://www.npmjs.com/package/docusaurus-theme-search-typesense I also had to make some tiny tweaks to the scraper to make it work with docusaurus sites. So you'd want to delete the existing collection you have already created and re-run the docsearch scraper after pulling the latest docker image. Let me know how it goes!

Apoorv Tiwari

08/26/2021, 6:54 AM

Hi @Jason Bosco I did follow the steps mentioned and I was able to add a search bar to my documentation site but the search feature is not working. This is how my docsearch.config.json look like. I have copied all the configuration from Docusaurus-2.json file and changed index_name and start_url. I am also attaching output of running the scraper ,docusaurus.config.js. Also the search option is disabled for cloud typesense.

Jason Bosco

08/26/2021, 7:38 AM

@Apoorv Tiwari could you share copy-pasteable version of the docsearch config in say a github gist?

Jason Bosco

08/26/2021, 7:39 AM

I can then look deeper

Apoorv Tiwari

08/26/2021, 7:40 AM

Here you go https://gist.github.com/apoorv1316/7585b00a0cf5ce94edb3f1a8558a1c07

Jason Bosco

08/26/2021, 11:55 PM

@Apoorv Tiwari The issue is that docsearch expects

start_urls

to be the base URL for all pages in the documentation. So if you set it to

<https://docs.tooljet.io/docs/intro>

it expects all pages in your documentation to have that as the base URL. You'd ideally want to set the base URL (

start_urls

) to

["<https://docs.tooljet.io/docs/>"]

. But then if you visit "https://docs.tooljet.io/docs/" it does an infinite redirect. If you can fix that and then update

start_urls

, I think it should work after that...

Apoorv Tiwari

08/27/2021, 3:07 AM

Yeah this needs to fixed. However, docs is working fine in the localhost so let me try setting the start_url to “http://localhost:3001/docs/” .

Apoorv Tiwari

08/27/2021, 3:20 AM

I am getting a connection refused error when I am trying to run with my localhost. Is there any workaround for it?

Jason Bosco

08/27/2021, 3:22 AM

Ah that’s because the scraper is running inside a docket container, so localhost refers to the docker container itself. Could you try using your computer’s private IP instead? 192.168 or 10. one?

Apoorv Tiwari

08/27/2021, 3:22 AM

Yeah I tried with http://0.0.0.0:3000/docs/ still the same error

Jason Bosco

08/27/2021, 3:24 AM

You want to do

ifconfig

and use the 192.168.x.x IP or the 10.x.x.x IP from there

Jason Bosco

08/27/2021, 3:25 AM

Another option is to use something like ngrok to open a tunnel to your local web server, and use that tunnel host name in the scraper config

Apoorv Tiwari

08/27/2021, 4:42 AM

So I tried using ngrok and this is url i get https://6380-61-2-246-175.ngrok.io . Now my documentation home page is docs/intro. Now I have tried https://6380-61-2-246-175.ngrok.io/docs as start_url but there are 0 nbhits and if I try https://6380-61-2-246-175.ngrok.io it throws an error.

Jason Bosco

08/27/2021, 4:50 AM

Let me take a look now... Could you keep the tunnel running?

Apoorv Tiwari

08/27/2021, 4:51 AM

Yeah sure

Apoorv Tiwari

08/27/2021, 4:52 AM

Is it because of 404 on https://6380-61-2-246-175.ngrok.io ?

Jason Bosco

08/27/2021, 4:53 AM

I don't see a 404 there when I visit from the browser

Apoorv Tiwari

08/27/2021, 4:54 AM

oh sorry 404 is on https://6380-61-2-246-175.ngrok.io/docs

Jason Bosco

08/27/2021, 4:55 AM

Ah yes, it's a client-side 404... So it's not able to find any links there

Jason Bosco

08/27/2021, 4:55 AM

Can you redirect /docs/ to /docs/intro on the client-side?

Apoorv Tiwari

08/27/2021, 5:44 AM

There is this plugin https://docusaurus.io/docs/next/api/plugins/@docusaurus/plugin-client-redirects but it does not work in development environment 😑

Jason Bosco

08/27/2021, 5:45 AM

How do you currently redirect the root https://6380-61-2-246-175.ngrok.io/ to /docs/intro?

Jason Bosco

08/27/2021, 5:45 AM

Actually it doesn't matter if it's server-side or client-side. As long as /docs takes you to /docs/intro

Jason Bosco

08/27/2021, 5:47 AM

Btw, on a slightly unrelated note, I noticed that the sitemap has references to links which all throw a 404: https://docs.tooljet.io/sitemap.xml

Apoorv Tiwari

08/27/2021, 5:52 AM

I have asked the owner of the project how he has redirected it. Waiting for a response

👍 1

Apoorv Tiwari

08/27/2021, 6:10 AM

We are redirecting ‘/’ to /docs/intro via index.html in the static folder

Jason Bosco

08/27/2021, 6:28 AM

Could you try the same for /docs to /docs/intro?

Apoorv Tiwari

08/27/2021, 8:17 AM

Couldn’t do the same for ‘/docs/’ to ‘/docs/intro’. I think it had to be handled through server side and hosting setting. Asked the relevant person in the project. Let’s see

Apoorv Tiwari

08/30/2021, 5:41 AM

Hi Jason, we added a simple md file and now we have a page on https://docs.tooljet.io/docs/ and I tried to run the scraper and it crawls the pages but found 0 records. Do you think is there something wrong with the config file https://gist.github.com/apoorv1316/7585b00a0cf5ce94edb3f1a8558a1c07 as I just copy pasted it from docusaurs2 config and just changed index_name and start_url

Jason Bosco

08/30/2021, 6:13 AM

@Apoorv Tiwari is the docs site source public? Would be great if I can run it on my local machine to debug the scraper

Apoorv Tiwari

08/30/2021, 6:14 AM

Yes it’s public. Btw do you think everything is right with the config file?

Jason Bosco

08/30/2021, 6:15 AM

Oh wait you need to remove info from start url

Jason Bosco

08/30/2021, 6:15 AM

https://typesense-community.slack.com/archives/C01P749MET0/p1630022135010200?thread_ts=1629175712.123700&channel=C01P749MET0&message_ts=1630022135.010200

Apoorv Tiwari

08/30/2021, 6:16 AM

Oh that i removed.. sorry the gist was older

Apoorv Tiwari

08/30/2021, 6:16 AM

yes this is the start_url https://docs.tooljet.io/docs/

Apoorv Tiwari

08/30/2021, 6:17 AM

link for repository https://github.com/ToolJet/ToolJet

Jason Bosco

08/30/2021, 6:18 AM

The config file looks ok to me… but let me try debugging locally

Apoorv Tiwari

08/30/2021, 6:20 AM

this will help you in setup https://docs.tooljet.io/docs/contributing-guide/setup/Mac%20OS

Jason Bosco

08/30/2021, 6:32 AM

It’s getting late night for me, so I’ll look into this tomorrow and get back to you

Apoorv Tiwari

08/30/2021, 6:38 AM

Sure

Jason Bosco

08/31/2021, 7:51 AM

@Apoorv Tiwari After hours of stepping through the scrapper code, I finally figured out what's wrong! It turns out that these two characters in your docusaurus config are tripping up the docsearch scraper: https://github.com/ToolJet/ToolJet/blob/12d8c4af282a9e2ab97f0522ce4c4f3b27fa44a8/docs/docusaurus.config.js#L15-L16. If you change those two to something else that doesn't start with

\00

it works fine!! In fact, since you have

display: none

set further down, you can just remove those two lines, and it still hides the dark/light mode icons if that's your intention. Also, you want to set

start_urls

["<https://docs.tooljet.io/docs/>", "<https://docs.tooljet.io/docs/intro/>"]

. Finally, if you're trying this on localhost, you want to use ngrok even when running locally, since the scraper apparently doesn't work with port numbers.

Apoorv Tiwari

08/31/2021, 5:27 PM

@Jason Bosco It worked perfectly for the typesense cloud platform but not working on my local docs site. There is something wrong with the collection name as per the error. I am using ‘tooljet_docs’ as index name in docsearch.config and as typesenseCollectionName in docusaurus.config.js but the collection name on the typesense cloud is automatically generated to ‘tooljet_docs_1630430294’ and in error log it’s showing collection_name as ‘tooljet_docs_1630429560’.

Jason Bosco

08/31/2021, 5:30 PM

Hmm, so this error happens when you scrape your local docs site and index to a Typesense cloud cluster?

Apoorv Tiwari

08/31/2021, 5:30 PM

Yes

Jason Bosco

08/31/2021, 5:32 PM

And are you using ngrok to create a host name for your local machine and using that in the docsearch config?

Jason Bosco

08/31/2021, 5:32 PM

Because port numbers don’t work with the scraper apparently

Apoorv Tiwari

08/31/2021, 5:32 PM

Yes I’m using ngrok

Jason Bosco

08/31/2021, 5:33 PM

Ok cool, just making sure. Did you happen to delete any old collections manually from the Typesense Cloud dashboard?

Jason Bosco

08/31/2021, 5:33 PM

That’s what could have caused the error you see.

Jason Bosco

08/31/2021, 5:33 PM

I’d recommend just deleting all collections, and then running the scraper. You shouldn’t see that error after that

Apoorv Tiwari

08/31/2021, 5:34 PM

Yes, as I run the scraper multiple times so I used to delete the old collections from the cloud platform

Apoorv Tiwari

08/31/2021, 5:39 PM

Tried again. Deleted the old collection and then run the scraper. Same error with same collection_name reported in log.

Jason Bosco

08/31/2021, 5:41 PM

Oh forgot one more thing, you want to delete ALL collections and also delete all the aliases (another tab in cloud dashboard) and try again.

Jason Bosco

08/31/2021, 5:42 PM

It’s actually the alias pointing to a non-existent old collection that’s causing the issue. So we’re trying to start from a clean slate

Apoorv Tiwari

08/31/2021, 5:52 PM

Deleted the alias and now not getting an error but the search is still not working. index_name is ‘tooljet_docs’, typesenseCollectionName: ‘tooljet_docs’ and collection_name on cloud is tooljet_docs_1630431866.

Jason Bosco

08/31/2021, 6:12 PM

@Apoorv Tiwari Could you commit your local docs so far to a new branch and push it to origin? I can then look closely to see why the search is not returning results.

Apoorv Tiwari

08/31/2021, 6:18 PM

Here is the latest commit with the updated settings https://github.com/apoorv1316/ToolJet/commit/cff80a8e57278d2fb9397440ae5cef100dd8abe1

Jason Bosco

08/31/2021, 6:19 PM

Could you also share the latest docsearch scraper config you're using @Apoorv Tiwari?

Apoorv Tiwari

08/31/2021, 6:24 PM

Here I am using docusaurus-2 config https://github.com/algolia/docsearch-configs/blob/master/configs/docusaurus-2.json

Jason Bosco

08/31/2021, 6:25 PM

Ok, I'll keep you posted

Jason Bosco

09/01/2021, 3:08 AM

@Apoorv Tiwari I just tried it out and it seems to work. Here's the diff that shows the config changes needed: https://github.com/ToolJet/ToolJet/compare/ToolJet:93bb91a...jasonbosco:b101acb Also, when you're running it locally, you want to run

yarn build

and then

yarn serve

and then run the scraper. When you just run

yarn start

the content is client-side rendered via JS, so the scraper doesn't pick it up. When you run

yarn build

the site gets statically built with the full HTML and then

yarn serve

just serves the build directory.

Jason Bosco

09/03/2021, 5:32 AM

@Apoorv Tiwari Did this work? ^

2 Views

Open in Slack

Previous Next