Fix NVIDIA WSL2 container issues - Part I (in hindsight not the right way)

When I build a docker image on an Ubuntu system with NVIDIA support, and try to run a container based on that image inside a WSL2 environment, I run into startup failed errors.
The command I use to start the container has the following syntax:

docker run --rm --gpus all -p OUTSIDE_PORT:INSIDE_PORT --name YOUR_CONTAINER_NAME IMAGE_REPO:VERSION STARTUP_COMMAND_INSIDE_CONTAINER --runtime=nvidia

The error I get is:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/1b40bec7d31333f4b36892d7b3ba6589217f2fcf928f57e8aaa22b05ba3532ea/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

Googling/searching for the issue, I stumbled upon WSL Modulus Docker run error (libnvidia-ml.so.1: file exists: unknown.) It showed how to remove certain NVIDIA files from an image. I tried it, but it wasn't sufficient.
I found out that I had to remove all libraries matching "/usr/lib/x86_64-linux-gnu/libcuda*" too and that due to differenct driver version on the Ubuntu machine (550) I had to adjust/add that as well. So first step is to check the driver version on your ubuntu machine

nvidia-smi

Tue Jan 6 20:54:19 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |

So write down the version. In my case '550'.

The steps to get things working within WSL2:

export a running container from the Ubuntu machine
import the exported container
remove specific NVIDIA/CUDA/driver related libraries

So on the UBUNTU machine:

#list running containers docker container list CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e7bb657826d8 your_image_name:version "/home/johndoe/foo…" 19 hours ago Up 2 hours 0.0.0.0:5000->5000/tcp, :::5000->5000/tcp some_name #export container: docker export -o DESTINATION_FILE CONTAINER_ID docker export -o /media/nas_tmp/your_image_name.tar e7bb657826d8

On the Windows machine:

#show available distros wsl --list Windows Subsystem for Linux Distributions: YOUR_DISTRO #Start your distro in WSL2. From command prompt: wsl -u USERNAME -d YOUR_DISTRO #within the WSL2 distro, import the exported container: #docker import FILE_NAME TAG docker import /mnt/t/Temp/your_image_name.tar YOUR_DISTRO:1.0.0 mkdir Docker cd Docker vim Dockerfile

Content of Dockerfile (be sure to use your driver version which you wrote down in one of the previous steps, mine is 550):

FROM YOUR_DISTRO:1.0.0 RUN rm -rf \ /usr/lib/x86_64-linux-gnu/libcuda.so* \ /usr/lib/x86_64-linux-gnu/libnvcuvid.so* \ /usr/lib/x86_64-linux-gnu/libnvidia-*.so* \ /usr/lib/x86_64-linux-gnu/libcuda* \ /usr/local/cuda/compat/lib/*.550*

#Build new image based on Dockerfile docker build .

Output:

[+] Building 0.5s (6/6) FINISHED                                                                                                                                                                  docker:default

 => [internal] load .dockerignore                                                                                                                                                                           0.0s

 => => transferring context: 2B                                                                                                                                                                             0.0s

 => [internal] load build definition from Dockerfile                                                                                                                                                        0.0s

 => => transferring dockerfile: 348B                                                                                                                                                                        0.0s

 => [internal] load metadata for docker.io/library/YOUR_DISTRO:1.0.0                                                                                                                                   0.0s

 => [1/2] FROM docker.io/library/YOUR_DISTRO:1.0.0                                                                                                                                                     0.0s

 => [2/2] RUN rm -rf     /usr/lib/x86_64-linux-gnu/libcuda.so*     /usr/lib/x86_64-linux-gnu/libnvcuvid.so*     /usr/lib/x86_64-linux-gnu/libnvidia-*.so*     /usr/lib/x86_64-linux-gnu/libcuda*     /usr/  0.3s

 => exporting to image                                                                                                                                                                                      0.1s

 => => exporting layers                                                                                                                                                                                     0.1s

 => => writing image sha256:97a14ead7fcdecbe9a18631611f725f4a66b1c2ae67d6c083b0421b28db82276

Show created image:

docker image list

REPOSITORY       TAG       IMAGE ID       CREATED          SIZE

<none>           <none>       97a14ead7fcd   2 minutes ago    18.3GB

YOUR_DISTRO      1.0.0     031d7fcf5460   3 minutes ago    18.3GB

Tag the new image: YOUR_DISTRO:1.0.1

#docer tag CONTAINER_ID YOUR_TAG docker tag 97a14ead7fcd YOUR_DISTRO:1.0.1

Run the container:

#docker run --rm --gpus all -p OUTSIDE_PORT:INSIDE_PORT --name YOUR_CONTAINER_NAME IMAGE_REPO:VERSION STARTUP_COMMAND_INSIDE_CONTAINER --runtime=nvidia docker run --rm --gpus all -p 5000:5000 --name YOUR_CONTAINER_NAME YOUR_DISTRO:1.0.1 /home/johndoe/foo.bar --runtime=nvidia

That's it! At least on my machines.

Back to List