updated default model used to ministral 8b

minstral ai just released their latest generation of open source models.
Ministral 8b has great performance even with very small quantizations.
So for now the ministral 8b q2 will be used as the new default. This
significantly dereases the size of the container while improving performance
This commit is contained in:
judge 2025-12-08 00:39:26 +01:00
parent 642ffc60c6
commit b3ad72a7a2
No known key found for this signature in database
GPG key ID: 6512C30DD8E017B5
2 changed files with 5 additions and 2 deletions

View file

@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Changed
- Changed default model shipped with paperless-llm-workflow to ministral 8b base (smaller model with better results)
### Fixed
- increase default num gpu layers to 1024 for better performance with gpu
- updated llama-cpp bindings to version b7314 2025-12-07

View file

@ -1,7 +1,7 @@
ARG INFERENCE_BACKEND="vulkan"
# using quantized version of qwen3 8b for more resource efficiency
ARG MODEL_URL="https://huggingface.co/unsloth/Qwen3-8B-GGUF/resolve/main/Qwen3-8B-UD-Q2_K_XL.gguf?download=true"
ARG MODEL_LICENSE_URL="https://huggingface.co/Qwen/Qwen3-8B-GGUF/resolve/main/LICENSE?download=true"
ARG MODEL_URL="https://huggingface.co/robolamp/Ministral-3-8B-Base-2512-GGUF/resolve/main/Ministral-3-8B-Base-2512-Q2_K.gguf?download=true"
ARG MODEL_LICENSE_URL="https://www.apache.org/licenses/LICENSE-2.0.txt"
FROM docker.io/rust:latest as builder
ARG INFERENCE_BACKEND