J’ai changé la carte NVIDIA car deux cartes NVIDIA avec 8 Go chacune, elles sont vues par la VM qui est lancé par proxmox :
# nvidia-smi --list-gpus
GPU 0: Quadro M5000 (UUID: GPU-)
GPU 1: Quadro M4000 (UUID: GPU-)
# nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15 Driver Version: 570.86.15 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro M5000 Off | 00000000:00:10.0 Off | Off |
| 38% 37C P8 13W / 150W | 5MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro M4000 Off | 00000000:00:11.0 Off | N/A |
| 46% 39C P8 13W / 120W | 5MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
les résultats du test sont les suivants :
# llm_benchmark run
-------Linux----------
{'id': '0', 'name': 'Quadro M5000', 'driver': '570.86.15', 'gpu_memory_total': '8192.0 MB', 'gpu_memory_free': '8110.0 MB',
'gpu_memory_used': '5.0 MB', 'gpu_load': '0.0%', 'gpu_temperature': '36.0°C'}
{'id': '1', 'name': 'Quadro M4000', 'driver': '570.86.15', 'gpu_memory_total': '8192.0 MB', 'gpu_memory_free': '8110.0 MB',
'gpu_memory_used': '5.0 MB', 'gpu_load': '0.0%', 'gpu_temperature': '38.0°C'}
At least two GPU cards
Total memory size : 61.36 GB
cpu_info: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz
gpu_info: Quadro M5000
Quadro M4000
os_version: Ubuntu 22.04.5 LTS
ollama_version: 0.5.7
----------
....
-------Linux----------
{'id': '0', 'name': 'Quadro M5000', 'driver': '570.86.15', 'gpu_memory_total': '8192.0 MB', 'gpu_memory_free': '3277.0 MB',
'gpu_memory_used': '4838.0 MB', 'gpu_load': '0.0%', 'gpu_temperature': '65.0°C'}
{'id': '1', 'name': 'Quadro M4000', 'driver': '570.86.15', 'gpu_memory_total': '8192.0 MB', 'gpu_memory_free': '2348.0 MB',
'gpu_memory_used': '5767.0 MB', 'gpu_load': '0.0%', 'gpu_temperature': '76.0°C'}
At least two GPU cards
{
"mistral:7b": "16.56",
"llama3.1:8b": "15.71",
"phi4:14b": "8.01",
"qwen2:7b": "15.27",
"gemma2:9b": "15.81",
"llava:7b": "17.82",
"llava:13b": "13.14",
"uuid": "1a60faf0-e97b-5d47-8de5-03d3b22dfbbc",
"ollama_version": "0.5.7"
}
Actuellement j’utilise « llama3.1:8b », je suis donc passé le 1.12 (unitilisable) à 15,71 . L’idéal est d’avoir dans les plus de 32 … donc il va falloir trouver deux nouvelles cartes.
Misère.