Hi all! We are using typesense in our project, wh...
# community-help
a
Hi all! We are using typesense in our project, where we have almost 5M users, we are fetching the users in a home page with pagination and with search query for parameters like min and max age and location, latitude, longitude. It loads them very slow, almost taking 3-4 seconds. However, when we loaded few thousands profile it was very fast. We are using 16 GB RAM right now and also added caching. But still it loads slow. Can someone help if someone has faced same for heavy data. Did we miss any settings in configuration or do we need to do some extra query or sharding.
j
Could you share a curl request with all the search parameters you're using, minus your API Key and Hostname
a
Thanks for quick response @Jason Bosco. Here is the curl curl --location 'https://api.datinglab.com/v1/recommendedProfiles/?page=1&limit=30' \ --header 'Authorization: Bearer eyJhbGciOiJSU0EtT0FFUCIsImN0eSI6IkpXVCIsImVuYyI6IkExMjhHQ00iLCJ0eXAiOiJKV1QifQ.OV8FBpQ3zvkCWRc3gaumNZRFuWUbfvTYAgSFXsgaQHhcEg50W2oe9JfIM9JQ3kmLRnO6OBDlvM3YN_eRSsRNBgSWG3dZFTKme6vqT7PC6YUi5DyV2iefAvZj0d_6-_WJAU7my2izsFMccKdMbWo2d8wLKfjTOPG5a6FRf0BvlFg.Jiby38yeFBE0pChZ.gGINYtpOsLJMiU35UYOMVdzDVA9okZ5u23KYGwvICxG9PcqPST-zKURZUCfymRNwFGhAK0c5CcB-Ja22R89p75DoLTRoIs3vnspYQiPWrA41Pi-khFIQAPyEpZ9E4EdfHN5hSIS98AEMAnn6BsgqeF3mAy2bedsIINLXVIxLWXPcCwiz9ENnZsB0ytmAzDxJOl4TcBGzy2tLA9HB8-Gly25NKRI5pTETv8BhAk1KGFpyus4YokEqjdVqKudKfENirTkl9wV0nRPqLEfzC83fyXNtew1KO8oRo237ZKbPSZFJGpoQ7nR3g5CZWsScfMHxgMmbeDnGl3gzPKIdyq72DcK75-fJHBbBGXfTQsRnl_d4jvjasQ7SJ-2szKYinGhZyD1UBAz4DNK5ADo99KUzDcLk_T9mPQBjeaPBz1XU1erzj__LSICz5u778T9mjQ8loQ8LeXpZlYrFRKUV5Vws_VVpQJxZ_CLa01zeAbScOqmoEM5FAEhgmdhpsM6tGgav77RKVOKerHc8jKFy8GtBpdziDm0owvkD0NqjKs_DSTettUi8r-k6VTJOZTOw3EqQJoqQB_StkYnwfH-Thxu5prjRvrtd02WiOfk0raIqhouTfkKl5JY4lLYTBw05HYCHSmIOn8-2NiDA3DELwkLLvCRY-NRAiP3Ofm-UQZuPwfyhsu60__44rdd2I6a0XjcTQxhGSuI5iF7kFJn5OX3jqKGYy8L-4UHmNOCeR_SzbEP2eAd9CBJzg2qAgkP4h1cX3yxG4uAjrpyr4D3imiYwOEkf_1wvHbF_1ZJ662YM_7xyYV70QCNf1BcBl2hs_5TSrHK4fKJYqFLm-ytnZ2SKxPBUB4mgZY-JLbuUU-Q69ihd_hJHg1Pn6rnljkLfMIxdO6GdC-nkxNAPo28HBZ-MmHN9_ZRYgx3NC2_MimCRg-3-UdZSp5lGf9povdWhQBdxgOFm1HI3mW9ho2iRkMrLS-B1tT6Nj3iyZ4R14f_jI_B78ni_SN7K9SrzXVbUP3ekYq94JD06XGOzquH2GhvUgDrdw07BZkWJEg6Gvg14qaQlN78NlUGS5HuYY21Dr366_xUlyoE4HKk7ukItPQX5o8byRBBIzjOIr0Z1JcKUR3JZXxrYaJn8DXRcZkBNWmwyT1FrRDOMfGDVlWGeet8bix9BJh8-CNd70yZNA1eeOg_OV5EUw2H4uwCF-oo9DM0.OiePPcn5LOh-5yYpWvBBhA'
{'q': '*', 'filter_by': 'id:!=675ab1e4126d3859c2264bf4 && age:>=18 && age:<=80 && gender:=[Man,Woman,Nonbinary] && geoLocation:(30.7046486, 76.71787259999999, 1609344.0 km) && searchPreferences.location.country:=India', 'sort_by': 'isPopularUser:desc, creationTs:desc', 'per_page': 30, 'include_fields': 'id,username,dateOfBirth,firstName,lastName,gender,country,location,searchPreferences,userLocationDetails,isometricChatUserId,genderPronoun,showDistanceOnProfile,distance,profilePic,profileVerified,subscription,accountType,countryCodeName,isPopularUser,isOnline,lastOnlineTimestamp', 'page': 1}
@Fanis Tharropoulos @Jason Bosco please check if you are able to help me in this. Seems response time is 3-4 seconds.
j
You want to add
range_index: true
to the
age
field in the collection schema, since you're using it with the
>
and
<
operator. This will improve performance of that operation. I'd also recommend using the range operator. So instead of
age:>=18 && age:<=80
, you want to change that to
age:[18..80]
. The other heavy operation is the geo search on a 1.6 million kilometer radius
geoLocation:(30.7046486, 76.71787259999999, 1609344.0 km)
. Any reason you have such a large area? For context, the earth's circumference itself is only 40K kilometers. If you reduce that value to a smaller number, then the query should speed up
a
Thanks, I will try this
Hi @Jason Bosco, The geo_location is working well and returning accurate results. However, the response time is still around 2-3 seconds, which we understand may be due to the large database of 5 million users. Do you have any suggestions for improving the response time? Would implementing indexing or sharding in Typesense help? Please advise. Thank you!
j
Could you share a curl request of the query that's now slow?
Because when I ran the query you shared above with the changes I mentioned above, it completed in less than 400ms IIRC
a
Hi @Jason Bosco For 5 M data we have kept the ram size 16 gb in typsense , is it enough, we find the response takes 2-3 seconds. Here is the query which I send from flutter app final searchParameters = { 'q': '*', 'filter_by': [ if (selfUserId.isNotEmpty) 'id:!=$selfUserId', statusCodeQuery, 'isPopularUser:!=true', 'profileVerified:=$verifiedProfilesOnly', 'age:[$minimumAge..$maximumAge]', if (searchAttributes.isNotEmpty) searchAttributeQuery, if (whatGenderOfMatchesAreYouInterestedIn.isNotEmpty) 'gender:=$whatGenderOfMatchesAreYouInterestedIn', 'geoLocation:($latitude, $longitude, ${distance * (i + 1)} km)', ].join(' && '), 'sort_by': 'geoLocation($latitude, $longitude):asc', 'per_page': '45', 'page': page.toString(), };
j
Could you share the exact curl command, without any placeholder variables?
@Jason Bosco please check this, Thanks for your help.
j
This is the same issue again. The geo search filter covers a radius of 6437 kilometers:
Copy code
geoLocation:(12.9715987, 77.5945627, 6437.36 km)
The area of a circle with a diameter of 6437.36kms is like 13 times the area of the entire United States. Such a large area of geosearch is bound to be slow
You'd need to restrict the radius to smaller practical values
a
@Jason Bosco Thanks for quick response. We have requirement where we have to show feeds if we do not find it nearby, let’s suppose first we will search for profiles in India and if it runs out we increase the radius further to find people in different countries. What should be the right approach, also we tried with small distance still it give response little slow.
@Jason Bosco Any thoughts on this ?
j
You could try breaking out the search into multiple smaller radius searches and do multiple parallel searches when expanding out
Could you give me the curl request for the “small distance” query you mentioned?
@Jason Bosco Please check this one , response time near to 2 seconds curl --location 'https://71vf8usilhjkac6gp-1.a1.typesense.net/collections/users/documents/search?q=%2A&amp;filter_by=id%3A%21%3D674a833de7a564b77c9ef2da+%26%26+statusCode%3A%3D1+%26%26+isPopularUser%3A%21%3Dtrue+%26%26+profileVerified%3A%3Dfalse+%26%26+age%3A%5B18..80%5D+%26%26+gender%3A%3D%5BMan%2C+Woman%2C+Non-Binary%5D+%26%26+geoLocation%3A%2851.5072%2C+0.1276%2C+10+km%29&amp;sort_by=geoLocation%2851.5072%2C+0.1276%29%3Aasc&amp;per_page=30&amp;page=1' \ --header 'user-agent: Dart/3.6 (dart:io)' \ --header 'content-type: application/json' \ --header 'x-typesense-api-key: dzU0Y11dkbF3rKQbTBJVEwJkPJ89r8pJ' \ --header 'accept-encoding: gzip' \ --header 'content-length: 0' \ --header 'host: 71vf8usilhjkac6gp-1.a1.typesense.net'
j
In the 2nd curl request you shared, that actual query consistently takes around 800ms for me (you want to look at search_time_ms in the response):
Copy code
curl -s '<https://71vf8usilhjkac6gp-1.a1.typesense.net/multi_search>' \
  -X POST \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "searches": [
      {
        "collection": "users",
        "q": "*",
        "filter_by": "id:!=674a833de7a564b77c9ef2da && statusCode:=1 && isPopularUser:!=true && profileVerified:=false && age:[18..80] && gender:=[Man, Woman, Non-Binary] && geoLocation:(51.5072, 0.1276, 10 km)",
        "sort_by": "geoLocation(51.5072, 0.1276):asc",
        "page": 1,
        "per_page": 30
      }
    ]
  }' | jq '.results[].search_time_ms'

798
To speed this up more, you want to add
range_index: true
to the
age
field in the collection schema, given the type of numeric comparison you're doing in the
filter_by
. You also want to use
gender:[...]
instead of
gender:=[]
. If you search the docs for exact vs non-exact match, you'll find what this does