#community-help

Understanding Document ID Fields and Rectifying Duplicate Document Error

TLDR John queried about understanding document ID fields and fixed a duplication issue with guidance from Jason. They discovered a bug preventing document deletion due to a URL encodable character, with John opening a GitHub issue for it.

Powered by Struct AI

1

1

49
21mo
Solved
Join the chat
Apr 08, 2022 (21 months ago)
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:17 AM
Typesense rocks my friends! 😃

Now, I just got a question regarding the id field. I read that if the field is a string and does not contain characters that would require URL encoding, the id field would actually be used to identify a document. Correct?

It just happened that in order to update a document I upserted a document with the same exact id. And it seems like it created a second entry...

How do I know if the id field is actually used as the actual document identifier?

What would you recommend in my case?

Currently running with these versions:
https://cdn.jsdelivr.net/npm/[email protected]/dist/algoliasearch-lite.umd.js
https://cdn.jsdelivr.net/npm/[email protected]/dist/instantsearch.production.min.js
https://cdn.jsdelivr.net/npm/typesense-instantse[email protected]/dist/typesense-instantsearch-adapter.min.js
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:19 AM
> field is a string and does not contain characters that would require URL encoding, the id field would actually be used to identify a document. Correct?
That's correct.

Could you do one or more search(es) that will return these two items? That will include the ID field in the response
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:26 AM
Does it need to be through the API or could I do it manually in the cloud db interface?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:27 AM
Sure, you can do it in the cloud search interface as well
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:27 AM
OK!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:27 AM
You can then look at the network requests in the browser to the see the exact API response
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:28 AM
OK perfect
12:28
John
12:28 AM
And there it will be possible to see if the id is the identifier?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:29 AM
Yeah it will show you the IDs used or assigned
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:30 AM
👍
12:39
John
12:39 AM
This is the duplication. Where do I see the network request?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:40 AM
In the browser's dev tools, under the network tab
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:43 AM
😅Sure, and in the API response, where do I look to check the actual identifier?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:45 AM
If you search for an API call to multi_search, then click on the "Preview" for that response, and expand results, then hits and then the document object inside each hit
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:48 AM
OK, I FOUND IT. It's a space that I forgot to trim at the end of the ID!!!

1

12:49
John
12:49 AM
But how would it look if the document identifier would be another one? There would be another field there?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:50 AM
Didn't get you... Could you give me an example?
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:54 AM
Trying to delete the dupe there is an error - what do you suggest?
12:57
John
12:57 AM
Let's try to explain better: in case the id field is not used as the identifier (it is possible, correct?), the document identifier would actually be a auto-generated id, right? Would I see this identifier in each document? What would its name be?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:58 AM
> the document identifier would actually be a auto-generated id, right?
Correct.

> Would I see this identifier in each document?
Yup.

> What would its name be?
It will be called id and it will be an incrementing number
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:59 AM
So we would have 2 fields named "id" then?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:59 AM
No, if there is a field called id in the document that will be used as the identifier field
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
12:59 AM
Ok then!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:00 AM
> Trying to delete the dupe there is an error - what do you suggest?
Could you post the full JSON of this document you're trying to delete?
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
01:01 AM
Does this help?
01:02
John
01:02 AM
I did not try to delete through the API, only through the cloud interface...
01:03
John
01:03 AM
Request failed with HTTP code 404 | Server said: Could not find a document with id: ART41554-4742%20
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:03 AM
Hmm, I think this might be a bug. To confirm, could you try deleting this document via the API?
01:04
Jason
01:04 AM
So send a DELETE request to /collections/<name>/documents/<url encoded document_id>
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
01:04 AM
Let's see... but it seems like we hit the actual problem of having a char that is URL encodable... the space character!
01:07
John
01:07 AM
API replied "Could not find a document with id: ART41554-4742%20"
01:07
John
01:07 AM
:thinking_face:
01:14
John
01:14 AM
OK, but what it the start of the URL to send the DELETE request to?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:14 AM
Could you open a Github issue for this, with a minimal replicateable example?
01:14
Jason
01:14 AM
Your Typesense cluster hostname
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
01:14 AM
OK
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:15 AM
Something like <https://xxx-1.a1.typesense.net>
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
01:17 AM
curl -X DELETE https://cloud.typesense.org/clusters/1ow83v9khrefbpump/collections/arts/documents/ART41554-4742%20

But I guess I need to identify, 'cause the above did not work
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
01:19 AM
OK got it
01:30
John
01:30 AM
Seems like a tricky one Jason... reply from the DELETE request:

{"message": "Could not find a document with id: ART41554-4742%20"}
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:14 AM
Yeah, this is a bug. Could you create a Github issue for it?

1

John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
02:51 PM
OK I will
Apr 10, 2022 (21 months ago)
Apr 15, 2022 (21 months ago)
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
11:26 PM
Hi Jason, just received this notification:

Fixed in 0.23.0.rc54

Could you let me know where I can find the fix for the issue? Sorry I could not find it...
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:27 PM
Since you're using Typesense Cloud, could you email [email protected] with your cluster ID? We can then upgrade your cluster to this RC build
11:27
Jason
11:27 PM
Do you have a dev/staging environment that you can first test with?
John
Photo of md5-7a0ab48aa8979a59e1d8c3919797c1f8
John
11:27 PM
No, don't have a dev/staging anymore
11:28
John
11:28 PM
But OK, I can email support

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community

Similar Threads

Cold Start Problem with Dynamic Collections

Adrian reported cold start issues with dynamic collections. Jason suggested using wildcard `*` for query_by parameters, upgrading to `0.25.0.rc34`, and clarified conventions. Adrian's issues were resolved but they reported a limitation that will potentially be addressed.

6

39
6mo
Solved

Handling Kinesis Stream Event Batching with Typesense

Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.

8

91
24mo

Issue with Typesense Schema and Ruby Client

Mateo faces issues while creating a schema and using Ruby client for Typesense. Jason suggests using a new field instead of 'id' and provides assistance for Ruby client errors.

33
10mo

Phrase Search Relevancy and Weights Fix

Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

6

111
8mo
Solved

Fixing Multiple Document Retrieval in Typesense

Phil needed an efficient way to retrieve multiple documents by id. Kishore Nallan proposed a solution available in a pre-release build. After some bug fixing regarding id matching by Jason and Kishore Nallan, Phil successfully tested the solution.

4

26
26mo
Solved