Model Catalog
KULVEX maintains a curated catalog of abliterated models optimized for self-hosted inference.
Available Models
Chat / General Purpose
| Model | Params | GGUF Quants | Min VRAM | Source |
|---|---|---|---|---|
| Qwen 3.5 27B | 27B | Q4_K_M, Q6_K, Q8_0 | 16 GB | mradermacher |
| Gemma 3 27B | 27B | Q4_K_M, Q6_K, Q8_0 | 16 GB | mlabonne |
| Qwen 3 14B | 14B | Q4_K_M, Q6_K, Q8_0 | 8 GB | huihui-ai |
| Qwen 3 8B | 8B | Q4_K_M, Q6_K, Q8_0 | 5 GB | huihui-ai |
| Qwen 3 4B | 4B | Q4_K_M, Q8_0 | 3 GB | huihui-ai |
Code
| Model | Params | GGUF Quants | Min VRAM | Source |
|---|---|---|---|---|
| GLM-4.7 Flash 30B | 30B (MoE, 3.3B active) | Q4_K_M, Q8_0 | 18 GB | huihui-ai |
| Devstral 24B | 24B | Q4_K_M, Q6_K | 14 GB | mradermacher |
Vision
| Model | Params | GGUF Quants | Min VRAM | Source |
|---|---|---|---|---|
| Gemma 3 12B Vision | 12B | Q4_K_M, Q6_K | 8 GB | mlabonne |
| GLM-OCR 0.9B | 0.9B | Q8_0 | 2 GB | huihui-ai |
Quantization Guide
| Quant | Quality | Size vs FP16 | When to Use |
|---|---|---|---|
| Q8_0 | ~97% | ~50% | Best quality, fits in VRAM |
| Q6_K | ~92% | ~40% | Good balance |
| Q4_K_M | ~85% | ~28% | Most popular, fits smaller GPUs |
Higher quantization = better quality but more VRAM. The installer picks the best quant that fits your GPU.
How Selection Works
- Installer detects your GPU(s) and VRAM
- Model selector iterates the catalog by priority
- For each model, tries quants from highest to lowest (Q8_0 → Q6_K → Q4_K_M)
- Picks the largest model at the highest quant that fits
- Downloads the GGUF from HuggingFace and creates a symlink (
mnemo-chat.gguf)
Adding Models
The model catalog is at core/ai_engine/model_catalog.py. To add a model:
- Find an abliterated GGUF on HuggingFace
- Add a
CatalogModelentry with GGUF quants, VRAM requirements, and capabilities - The selector will automatically consider it during installation
API
# List all catalog models
curl http://localhost:9100/api/ai/catalog
# Filter by VRAM
curl http://localhost:9100/api/ai/catalog?vram_mb=12000
# See current model assignment
curl http://localhost:9100/api/ai/model-assignment