Fix NVIDIA WSL2 container issues - Part I (in hindsight not the right way)
When I build a docker image on an Ubuntu system with NVIDIA support, and try to run a container based on that image inside a WSL2 environment, I run into startup failed errors.
The command I use to start the container has the following syntax:
docker run --rm --gpus all -p OUTSIDE_PORT:INSIDE_PORT --name YOUR_CONTAINER_NAME IMAGE_REPO:VERSION STARTUP_COMMAND_INSIDE_CONTAINER --runtime=nvidia
The error I get is:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/1b40bec7d31333f4b36892d7b3ba6589217f2fcf928f57e8aaa22b05ba3532ea/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
Googling/searching for the issue, I stumbled upon
WSL Modulus Docker run error (libnvidia-ml.so.1: file exists: unknown.) It showed how to remove certain NVIDIA files from an image. I tried it, but it wasn't sufficient.
I found out that I had to remove all libraries matching "/usr/lib/x86_64-linux-gnu/libcuda*" too and that due to differenct driver version on the Ubuntu machine (550) I had to adjust/add that as well. So first step is to check the driver version on your ubuntu machine
Tue Jan 6 20:54:19 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
So write down the version. In my case '550'.
The steps to get things working within WSL2:
export a running container from the Ubuntu machine import the exported container remove specific NVIDIA/CUDA/driver related libraries
So on the UBUNTU machine:
#list running containers
docker container list
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7bb657826d8 your_image_name:version "/home/johndoe/foo…" 19 hours ago Up 2 hours 0.0.0.0:5000->5000/tcp, :::5000->5000/tcp some_name
#export container: docker export -o DESTINATION_FILE CONTAINER_ID
docker export -o /media/nas_tmp/your_image_name.tar e7bb657826d8
On the Windows machine:
#show available distros
wsl --list
Windows Subsystem for Linux Distributions:
YOUR_DISTRO
#Start your distro in WSL2. From command prompt:
wsl -u USERNAME -d YOUR_DISTRO
#within the WSL2 distro, import the exported container:
#docker import FILE_NAME TAG
docker import /mnt/t/Temp/your_image_name.tar YOUR_DISTRO:1.0.0
mkdir Docker
cd Docker
vim Dockerfile
Content of Dockerfile (be sure to use your driver version which you wrote down in one of the previous steps, mine is 550):
FROM YOUR_DISTRO:1.0.0
RUN rm -rf \
/usr/lib/x86_64-linux-gnu/libcuda.so* \
/usr/lib/x86_64-linux-gnu/libnvcuvid.so* \
/usr/lib/x86_64-linux-gnu/libnvidia-*.so* \
/usr/lib/x86_64-linux-gnu/libcuda* \
/usr/local/cuda/compat/lib/*.550*
#Build new image based on Dockerfile
docker build .
Output:
[+] Building 0.5s (6/6) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 348B 0.0s
=> [internal] load metadata for docker.io/library/YOUR_DISTRO:1.0.0 0.0s
=> [1/2] FROM docker.io/library/YOUR_DISTRO:1.0.0 0.0s
=> [2/2] RUN rm -rf /usr/lib/x86_64-linux-gnu/libcuda.so* /usr/lib/x86_64-linux-gnu/libnvcuvid.so* /usr/lib/x86_64-linux-gnu/libnvidia-*.so* /usr/lib/x86_64-linux-gnu/libcuda* /usr/ 0.3s
=> exporting to image 0.1s
=> => exporting layers 0.1s
=> => writing image sha256:97a14ead7fcdecbe9a18631611f725f4a66b1c2ae67d6c083b0421b28db82276
Show created image:
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 97a14ead7fcd 2 minutes ago 18.3GB
YOUR_DISTRO 1.0.0 031d7fcf5460 3 minutes ago 18.3GB
Tag the new image: YOUR_DISTRO:1.0.1
#docer tag CONTAINER_ID YOUR_TAG
docker tag 97a14ead7fcd YOUR_DISTRO:1.0.1
Run the container:
#docker run --rm --gpus all -p OUTSIDE_PORT:INSIDE_PORT --name YOUR_CONTAINER_NAME IMAGE_REPO:VERSION STARTUP_COMMAND_INSIDE_CONTAINER --runtime=nvidia
docker run --rm --gpus all -p 5000:5000 --name YOUR_CONTAINER_NAME YOUR_DISTRO:1.0.1 /home/johndoe/foo.bar --runtime=nvidia
That's it! At least on my machines.