updated default model used to ministral 8b
minstral ai just released their latest generation of open source models. Ministral 8b has great performance even with very small quantizations. So for now the ministral 8b q2 will be used as the new default. This significantly dereases the size of the container while improving performance
This commit is contained in:
parent
642ffc60c6
commit
b3ad72a7a2
2 changed files with 5 additions and 2 deletions
|
|
@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|||
|
||||
## [Unreleased]
|
||||
|
||||
### Changed
|
||||
- Changed default model shipped with paperless-llm-workflow to ministral 8b base (smaller model with better results)
|
||||
|
||||
### Fixed
|
||||
- increase default num gpu layers to 1024 for better performance with gpu
|
||||
- updated llama-cpp bindings to version b7314 2025-12-07
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
ARG INFERENCE_BACKEND="vulkan"
|
||||
# using quantized version of qwen3 8b for more resource efficiency
|
||||
ARG MODEL_URL="https://huggingface.co/unsloth/Qwen3-8B-GGUF/resolve/main/Qwen3-8B-UD-Q2_K_XL.gguf?download=true"
|
||||
ARG MODEL_LICENSE_URL="https://huggingface.co/Qwen/Qwen3-8B-GGUF/resolve/main/LICENSE?download=true"
|
||||
ARG MODEL_URL="https://huggingface.co/robolamp/Ministral-3-8B-Base-2512-GGUF/resolve/main/Ministral-3-8B-Base-2512-Q2_K.gguf?download=true"
|
||||
ARG MODEL_LICENSE_URL="https://www.apache.org/licenses/LICENSE-2.0.txt"
|
||||
|
||||
FROM docker.io/rust:latest as builder
|
||||
ARG INFERENCE_BACKEND
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue