Proxmox : Installation de Ollama en version LXC

Petit test d’installation de Ollama en version LXC via un script :

bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/ct/ollama.sh)"

On va voir le résultat … actuellement m’a carte NVIDIA (ou Bios) de supporte pas le Proxmox Passthrough.

root@balkany:~# dmesg | grep -e DMAR -e IOMMU | grep "enable"
[    0.333769] DMAR: IOMMU enabled

root@balkany:~# dmesg | grep 'remapping'
[    0.821036] DMAR-IR: Enabled IRQ remapping in xapic mode
[    0.821038] x2apic: IRQ remapping doesn't support X2APIC mode

# lspci -nn | grep 'NVIDIA'
0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF100GL [Quadro 4000] [10de:06dd] (rev a3)
0a:00.1 Audio device [0403]: NVIDIA Corporation GF100 High Definition Audio Controller [10de:0be5] (rev a1)

# cat /etc/default/grub | grep "GRUB_CMDLINE_LINUX_DEFAULT"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt video=vesafb:off video=efifb:off initcall_blacklist=sysfb_init

# efibootmgr -v
EFI variables are not supported on this system.

# cat /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

# cat /etc/modprobe.d/pve-blacklist.conf | grep nvidia
blacklist nvidiafb
blacklist nvidia

J’ai donc ajouter ceci :

# cat  /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

J’ai bien un seul groupe iommugroup pour la carte NVIDIA :

Quand je lance le script cela termine par une erreur :

 
  ____  ____
  / __ \/ / /___ _____ ___  ____ _
 / / / / / / __ `/ __ `__ \/ __ `/
/ /_/ / / / /_/ / / / / / / /_/ /
\____/_/_/\__,_/_/ /_/ /_/\__,_/

Using Default Settings
Using Distribution: ubuntu
Using ubuntu Version: 22.04
Using Container Type: 1
Using Root Password: Automatic Login
Using Container ID: 114
Using Hostname: ollama
Using Disk Size: 24GB
Allocated Cores 4
Allocated Ram 4096
Using Bridge: vmbr0
Using Static IP Address: dhcp
Using Gateway IP Address: Default
Using Apt-Cacher IP Address: Default
Disable IPv6: No
Using Interface MTU Size: Default
Using DNS Search Domain: Host
Using DNS Server Address: Host
Using MAC Address: Default
Using VLAN Tag: Default
Enable Root SSH Access: No
Enable Verbose Mode: No
Creating a Ollama LXC using the above default settings
 ✓ Using datastore2 for Template Storage.
 ✓ Using datastore2 for Container Storage.
 ✓ Updated LXC Template List
 ✓ LXC Container 114 was successfully created.
 ✓ Started LXC Container
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
 //bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
 ✓ Set up Container OS
 ✓ Network Connected: 192.168.1.45 
 ✓ IPv4 Internet Connected
 ✗ IPv6 Internet Not Connected
 ✓ DNS Resolved github.com to 140.82.121.3
 ✓ Updated Container OS
 ✓ Installed Dependencies
 ✓ Installed Golang
 ✓ Set up Intel® Repositories
 ✓ Set Up Hardware Acceleration
 ✓ Installed Intel® oneAPI Base Toolkit
 / Installing Ollama (Patience)   
[ERROR] in line 23: exit code 0: while executing command "$@" > /dev/null 2>&1
The silent function has suppressed the error, run the script with verbose mode enabled, which will provide more detailed output.

Misère.

Ollama n’utilise pas le GPU de la carte … Misère

En passant

Ma version d’OS est « Ubuntu 22.04.5 LTS« .

Ma version de carte/drivers NVIDIA :

# uname -a
Linux 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

# nvidia-smi -L
GPU 0: Quadro 4000 (UUID: GPU-13797e5d-a72f-4c72-609f-686fa4a8c956)

# nvidia-smi 
Mon Jan 20 16:41:52 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.157                Driver Version: 390.157                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro 4000         Off  | 00000000:00:10.0 Off |                  N/A |
| 36%   62C   P12    N/A /  N/A |      1MiB /  1985MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  390.157  Wed Oct 12 09:19:07 UTC 2022
GCC version:  gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

# ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:10.0 ==
modalias : pci:v000010DEd000006DDsv000010DEsd00000780bc03sc00i00
vendor   : NVIDIA Corporation
model    : GF100GL [Quadro 4000]
driver   : nvidia-driver-390 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin


Journal de ollama :

# journalctl -u ollama -f --no-pager
ollama[2974]: llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1
ollama[2974]: llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB
ollama[2974]: llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
ollama[2974]: llama_new_context_with_model:        CPU  output buffer size =     0.56 MiB
ollama[2974]: llama_new_context_with_model:        CPU compute buffer size =   560.01 MiB
ollama[2974]: llama_new_context_with_model: graph nodes  = 1030
ollama[2974]: llama_new_context_with_model: graph splits = 1
ollama[2974]: time=2025-01-20T16:30:12.070Z level=INFO source=server.go:594 msg="llama runner started in 4.28 seconds"

Quand je fais l’installation j’ai bien « NVIDIA GPU installed »

# curl -fsSL https://ollama.com/install.sh | shsh
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100,0%
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
>>> NVIDIA GPU installed.

J’ai un problème de chargement

ollama[3917]: time=2025-01-20T16:52:30.680Z level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
ollama[3917]: time=2025-01-20T16:52:30.681Z level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cuda_v12_avx rocm_avx cpu cpu_avx cpu_avx2 cuda_v11_avx]"
ollama[3917]: time=2025-01-20T16:52:30.681Z level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
ollama[3917]: time=2025-01-20T16:52:30.702Z level=INFO source=gpu.go:630 msg="Unable to load cudart library /usr/lib/x86_64-linux-gnu/libcuda.so.390.157: symbol lookup for cuDeviceGetUuid failed: /usr/lib/x86_64-linux-gnu/libcuda.so.390.157: undefined symbol: cuDeviceGetUuid"
ollama[3917]: time=2025-01-20T16:52:30.741Z level=INFO source=gpu.go:392 msg="no compatible GPUs were discovered"
ollama[3917]: time=2025-01-20T16:52:30.742Z level=INFO source=types.go:131 msg="inference compute" id=0 library=cpu variant=avx compute="" driver=0.0 name="" total="61.4 GiB" available="59.4 GiB"

A suivre …

Munin : Correction de problèmes sur les plugins mysqls

En passant

Pour tester manuellement j’ai fait :

/usr/sbin/munin-run --debug mysql_
/usr/sbin/munin-run --debug mysql_isam_space_

j’ai pu voir qu’il me manquait des installations en Perl

apt-get install -y libcache-cache-perl
apt-get install libdbd-mysql-perl
apt-get install libgd-gd2-perl
apt-get install libgd-graph-perl

Mariadb : Optimisation pour humhub

J’utilise Mariadb sous Debian pour Humhub.

Modification dans /etc/sysctl.conf

vm.swappiness=1

Et changement en direct :

/usr/sbin/sysctl -w vm.swappiness=1

Modification dans /etc/mysql/mariadb.conf.d/50-server.cnf avec :

innodb_buffer_pool_size = 4G

Avant le restart :

mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 52245
Server version: 10.5.26-MariaDB-0+deb11u2 Debian 11

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> select @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
|                 134217728 |
+---------------------------+
1 row in set (0,001 sec)

MariaDB [(none)]> select @@innodb_buffer_pool_size/1024/1024/1024;
+------------------------------------------+
| @@innodb_buffer_pool_size/1024/1024/1024 |
+------------------------------------------+
|                           0.125000000000 |
+------------------------------------------+
1 row in set (0,001 sec)

MariaDB [(none)]> show variables like 'innodb_buffer_pool%';
+-------------------------------------+----------------+
| Variable_name                       | Value          |
+-------------------------------------+----------------+
| innodb_buffer_pool_chunk_size       | 134217728      |
| innodb_buffer_pool_dump_at_shutdown | ON             |
| innodb_buffer_pool_dump_now         | OFF            |
| innodb_buffer_pool_dump_pct         | 25             |
| innodb_buffer_pool_filename         | ib_buffer_pool |
| innodb_buffer_pool_instances        | 1              |
| innodb_buffer_pool_load_abort       | OFF            |
| innodb_buffer_pool_load_at_startup  | ON             |
| innodb_buffer_pool_load_now         | OFF            |
| innodb_buffer_pool_size             | 134217728      |
+-------------------------------------+----------------+
10 rows in set (0,003 sec)

Après :

mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 5
Server version: 10.5.26-MariaDB-0+deb11u2 Debian 11

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show variables like 'innodb_buffer_pool%';
+-------------------------------------+----------------+
| Variable_name                       | Value          |
+-------------------------------------+----------------+
| innodb_buffer_pool_chunk_size       | 134217728      |
| innodb_buffer_pool_dump_at_shutdown | ON             |
| innodb_buffer_pool_dump_now         | OFF            |
| innodb_buffer_pool_dump_pct         | 25             |
| innodb_buffer_pool_filename         | ib_buffer_pool |
| innodb_buffer_pool_instances        | 1              |
| innodb_buffer_pool_load_abort       | OFF            |
| innodb_buffer_pool_load_at_startup  | ON             |
| innodb_buffer_pool_load_now         | OFF            |
| innodb_buffer_pool_size             | 4294967296     |
+-------------------------------------+----------------+
10 rows in set (0,003 sec)

J’ai pas vu beaucoup de changement, alors dans /etc/mysql/mariadb.conf.d/50-server.cnf j’ai ajouté :

innodb_buffer_pool_size = 8G
innodb_log_file_size = 512M
thread_cache_size = 16
query_cache_size = 128M
query_cache_type = 1
table_open_cache = 4096

slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 2

A suivre.