Hey-hey! Could someone help me troubleshoot GPU us...
# community-help
d
Hey-hey! Could someone help me troubleshoot GPU usage? Details in the thread
I have an instance of typesense 27.1 built on top of CUDA 11.8:
Copy code
FROM    nvidia/cuda:11.8.0-runtime-ubuntu22.04

RUN     set -ex; \
        apt-get update; \
        apt-get install -y curl; \
        curl -s typesense-server.tar.gz <https://dl.typesense.org/releases/${TYPESENSE_VERSION}/typesense-server-${TYPESENSE_VERSION}-linux-amd64.tar.gz> | tar xvz typesense-server; \
        mv typesense-server /bin/typesense-server; \
        echo "${TYPESENSE_CHECKSUM} /bin/typesense-server" | md5sum -c -; \
        curl -O <https://dl.typesense.org/releases/${TYPESENSE_VERSION}/typesense-gpu-deps-${TYPESENSE_VERSION}-amd64.deb>; \
        dpkg -i ./typesense-gpu-deps-${TYPESENSE_VERSION}-amd64.deb;
One of the fields has auto-embedding with local model
ts/e5-small-v2
. When instance starts it produces no “onnx shared libs off” logs, but indexing seems indefinitely stuck, no progress is reported in logs as well. GPU usage is also 0%. What should I check to ensure typesense can use GPU?
f
There was a similar CUDA question a while back and the user seemed to find the solution (the GPU utilization wasn't showing in top but was actually being utilized). They also posted their Dockerfile there: https://threads.typesense.org/2J28e88 Could this apply to your use-case as well?
d
Not sure 🤔 I’m using essentially the same build, but even if GPU usage is 0% by some tricky container-host mistake, indexing is not happening anyway
f
Is your
/health
endpoint responding with an
ok: true
response?
d
No, because indexation “is in progress”
Copy code
E20241205 09:47:00.267133   162 raft_server.cpp:762] Node not ready yet (known_applied_index is 0).                                                I20241205 09:47:03.268337   162 raft_server.cpp:706] Term: 49, pending_queue: 0, last_index: 0, committed: 0, known_applied: 0, applying: 0, pendi
I20241205 09:47:03.268383   162 raft_server.cpp:1067] Snapshot timer is active, current_ts: 1733392023, last_snapshot_ts: 1733391423               
I20241205 09:47:03.268395   162 node.cpp:943] node default_group:172.26.164.24:8107:8108 starts to do snapshot                            
E20241205 09:47:03.268494   194 raft_server.cpp:1157] Timed snapshot failed, error: Is loading another snapshot, code: 16
f
How many records are you indexing at that point? Also are you using the import endpoint or are you sending out multiple single document indexing requests?
d
I have 6 collections in total, 5 of them are quite small (<100k) without local auto-embedding and they were indexed fine. 6th is about 300k and has auto-embedding field. Usually on restart node spends about 5 minutes to reindex everything in CPU mode. With GPU it indexes 5 collections without embedding in the first minute or so, and then stuck indefinitely in this state for the last collection with auto-embedding enabled
In CPU mode typesense also reports the progress (something like
loaded XXX documents so far
) but with GPU I see nothing (left it to run at night as well)
Also are you using the import endpoint or are you sending out multiple single document indexing requests?
It is indexation on restart 🙏
o
@Dima Can you try removing CUDA completely and having a fresh install of CUDA libraries?
👀 1
If this doesn't work we probably will need some sample data from your collections if possible.
d
It is a lil bit painful to do in the docker (it is 8gb of dependencies needs to be downloaded and built), but I will try to
Found the issue. I was using
runtime
version of CUDA and should have used
devel
one
k
Is there somewhere we can document this so that someone else can avoid the mistake?
d
🤔 Maybe something like “if you’re going to use CUDA docker images, use
devel
, e.g. `cuda:11.8.0-cudnn8-devel-ubuntu22.04`”