#community-help

Discussing Ability to Sort by a Text Field

TLDR Aljosa asks about sorting by text fields and proposes a workaround. Kishore Nallan discusses database implications and possible workarounds, informing that changes may require reindexing. Andrew expresses interest in string sorting.

Powered by Struct AI
23
25mo
Solved
Join the chat
Oct 23, 2021 (25 months ago)
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
03:48 PM
Good morning :man-raising-hand: - Is there a technical reason for the inability to sort by a text field? I imagine dynamically doing it is much slower but I thought the index would handle that at indexing time.

Or perhaps I'm crazy and you can do alphabetical sorts ?
03:48
Aljosa
03:48 PM
My workaround would be to add a numerical field which is equivalent to the alphabetical sort
03:49
Aljosa
03:49 PM
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:49 PM
03:50
Kishore Nallan
03:50 PM
Certainly want to support sorting on fixed string lengths soon. Will be useful for model numbers etc.
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
03:51 PM
Would it be possible, technically speaking, to sort at indexing time and create a sort of "virtual" numerical field on the item which is used as a proxy to sort alphabetically ?
03:51
Aljosa
03:51 PM
Meaning not as a workaround but as a product feature (not a feature request 😄 , just curious)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:54 PM
One way to do this with existing feature set will be to convert your string to a lexographically ordered integer. For e.g. given a string "apple", you can create an int64 value that just concatenates the 8-but ascii representation of the individual letters: a, p, p, l, e
03:55
Kishore Nallan
03:55 PM
Then you can sort_by that numerical field. Will support sorting strings of upto 8 chars. If you want 16 chars, just add another field for the next 8 chars and then sort_by=field1:asc,field2:asc
03:56
Kishore Nallan
03:56 PM
For indexing time sorting, yes, one can technically order the entire data set on a given string field's sorted order and then index the records in the same order.
03:57
Kishore Nallan
03:57 PM
If you had a numerical field with a running sequence of the original sort order for each document, then just sorting by that number will give you sort by string.
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
03:58 PM
had already written this so I'm sending it 🙂 but mostly for confirmation as I think you answered my question

I feel like I'm missing something in my understanding. In our case we have a regular "name" field which is maybe 30 characters at most. We could call it a "title" field to better illustrate the type of field.

So before actually sending the documents to typesense for indexing, is there anything preventing me from pre-sorting in my code and creating a numerical value from 1 to X based on the output of that sort ?
03:59
Aljosa
03:59 PM
I think that's basically what you were saying in your last two messages
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:00 PM
Yes, you can just pre-sort and index that way. In fact you can even pre-sort multiple fields as long as you have 1 corresponding numerical value that represents the document's position in that particular sort order.
04:01
Kishore Nallan
04:01 PM
But downside is that any update of any single title: poof, the magic is gone 🙂
04:03
Kishore Nallan
04:03 PM
Still not decided whether we should support sorting on fixed string lengths, which is a bit meh, or do it for all lengths, which is prone to be abused and will be very slow beyond a point.
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
04:08 PM
Ah there's the downside! 😄 Hadn't thought of that but it's true. Any update to the title field requires all items to be reindexed.

In our case we push items to the database and have a listener/subscriber on @AfterInsert and @AfterUpdate events which keeps Typesense in sync. We'd need to add a trigger which updates the sort on all items if the title field is modified on one of them.

It works better for our case because despite having millions of documents, they are segregated cleanly into about 5-10k chunks.


As for supporting sorting on any string length, it doesn't seem worth it. Algolia certainly discourage people from doing it except in rare circumstances and even then as a separate index or something.

Sorting by a fixed length title sort of field could be interesting but I'm not sure how to determine what an optimal length is. 8 characters is probably good enough because at a glance I doubt most people can mentally sort past a certain number of characters, although in titles starting with the same word it would be easier to identify something that's not sorted correctly past the first 8 characters.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:13 PM
Yes, that's what I think too. Technically English ASCII is just 7 bits, so with 64 bits you get 9 chars 🙂 It is mostly useful for eye-balling data as you said.
04:13
Kishore Nallan
04:13 PM
The larger consideration is for other languages which use several unicode points for representing data.
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
04:15 PM
Good point! Makes it quite a bit more difficult in those scenarios. Either way, I believe you have quite a few things on your roadmap with a higher priority than this. Fun problem to solve though 👌
04:16
Aljosa
04:16 PM
Thank you once again for your input 🙂
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:16 PM
:yw:
Andrew
Photo of md5-88d88db4789daa0e3abef8c3ca27772b
Andrew
05:43 PM
We are looking forward to string sorting. (Need to sort the names of German courts alphabetically)