Hey hey Could someone help me troubleshoot GPU usage Details typesense #community-help

Join Slack

Hey-hey! Could someone help me troubleshoot GPU us...

# community-help

Dima

12/04/2024, 4:34 PM

Hey-hey! Could someone help me troubleshoot GPU usage? Details in the thread

Dima

12/04/2024, 4:34 PM

I have an instance of typesense 27.1 built on top of CUDA 11.8:

Copy code

FROM    nvidia/cuda:11.8.0-runtime-ubuntu22.04

RUN     set -ex; \
        apt-get update; \
        apt-get install -y curl; \
        curl -s typesense-server.tar.gz <https://dl.typesense.org/releases/${TYPESENSE_VERSION}/typesense-server-${TYPESENSE_VERSION}-linux-amd64.tar.gz> | tar xvz typesense-server; \
        mv typesense-server /bin/typesense-server; \
        echo "${TYPESENSE_CHECKSUM} /bin/typesense-server" | md5sum -c -; \
        curl -O <https://dl.typesense.org/releases/${TYPESENSE_VERSION}/typesense-gpu-deps-${TYPESENSE_VERSION}-amd64.deb>; \
        dpkg -i ./typesense-gpu-deps-${TYPESENSE_VERSION}-amd64.deb;

One of the fields has auto-embedding with local model

ts/e5-small-v2

. When instance starts it produces no “onnx shared libs off” logs, but indexing seems indefinitely stuck, no progress is reported in logs as well. GPU usage is also 0%. What should I check to ensure typesense can use GPU?

Fanis Tharropoulos

12/05/2024, 8:13 AM

There was a similar CUDA question a while back and the user seemed to find the solution (the GPU utilization wasn't showing in top but was actually being utilized). They also posted their Dockerfile there: https://threads.typesense.org/2J28e88 Could this apply to your use-case as well?

Dima

12/05/2024, 9:32 AM

Not sure 🤔 I’m using essentially the same build, but even if GPU usage is 0% by some tricky container-host mistake, indexing is not happening anyway

Fanis Tharropoulos

12/05/2024, 9:47 AM

Is your

/health

endpoint responding with an

ok: true

response?

Dima

12/05/2024, 9:48 AM

No, because indexation “is in progress”

Copy code

E20241205 09:47:00.267133   162 raft_server.cpp:762] Node not ready yet (known_applied_index is 0).                                                I20241205 09:47:03.268337   162 raft_server.cpp:706] Term: 49, pending_queue: 0, last_index: 0, committed: 0, known_applied: 0, applying: 0, pendi
I20241205 09:47:03.268383   162 raft_server.cpp:1067] Snapshot timer is active, current_ts: 1733392023, last_snapshot_ts: 1733391423               
I20241205 09:47:03.268395   162 node.cpp:943] node default_group:172.26.164.24:8107:8108 starts to do snapshot                            
E20241205 09:47:03.268494   194 raft_server.cpp:1157] Timed snapshot failed, error: Is loading another snapshot, code: 16

Fanis Tharropoulos

12/05/2024, 9:53 AM

How many records are you indexing at that point? Also are you using the import endpoint or are you sending out multiple single document indexing requests?

Dima

12/05/2024, 9:57 AM

I have 6 collections in total, 5 of them are quite small (<100k) without local auto-embedding and they were indexed fine. 6th is about 300k and has auto-embedding field. Usually on restart node spends about 5 minutes to reindex everything in CPU mode. With GPU it indexes 5 collections without embedding in the first minute or so, and then stuck indefinitely in this state for the last collection with auto-embedding enabled

Dima

12/05/2024, 9:57 AM

In CPU mode typesense also reports the progress (something like

loaded XXX documents so far

) but with GPU I see nothing (left it to run at night as well)

Dima

12/05/2024, 9:59 AM

Also are you using the import endpoint or are you sending out multiple single document indexing requests?

It is indexation on restart 🙏

Ozan Armağan

12/05/2024, 10:37 AM

@Dima Can you try removing CUDA completely and having a fresh install of CUDA libraries?

👀 1

Ozan Armağan

12/05/2024, 10:38 AM

If this doesn't work we probably will need some sample data from your collections if possible.

Dima

12/05/2024, 10:58 AM

It is a lil bit painful to do in the docker (it is 8gb of dependencies needs to be downloaded and built), but I will try to

Dima

12/05/2024, 11:09 AM

Found the issue. I was using

runtime

version of CUDA and should have used

devel

one

Kishore Nallan

12/05/2024, 11:33 AM

Is there somewhere we can document this so that someone else can avoid the mistake?

Dima

12/05/2024, 11:35 AM

🤔 Maybe something like “if you’re going to use CUDA docker images, use

devel

, e.g. `cuda:11.8.0-cudnn8-devel-ubuntu22.04`”

Open in Slack

Previous Next