Model Catalog

KULVEX maintains a curated catalog of abliterated models optimized for self-hosted inference.

Available Models

Chat / General Purpose

Model	Params	GGUF Quants	Min VRAM	Source
Qwen 3.5 27B	27B	Q4_K_M, Q6_K, Q8_0	16 GB	mradermacher
Gemma 3 27B	27B	Q4_K_M, Q6_K, Q8_0	16 GB	mlabonne
Qwen 3 14B	14B	Q4_K_M, Q6_K, Q8_0	8 GB	huihui-ai
Qwen 3 8B	8B	Q4_K_M, Q6_K, Q8_0	5 GB	huihui-ai
Qwen 3 4B	4B	Q4_K_M, Q8_0	3 GB	huihui-ai

Code

Model	Params	GGUF Quants	Min VRAM	Source
GLM-4.7 Flash 30B	30B (MoE, 3.3B active)	Q4_K_M, Q8_0	18 GB	huihui-ai
Devstral 24B	24B	Q4_K_M, Q6_K	14 GB	mradermacher

Vision

Model	Params	GGUF Quants	Min VRAM	Source
Gemma 3 12B Vision	12B	Q4_K_M, Q6_K	8 GB	mlabonne
GLM-OCR 0.9B	0.9B	Q8_0	2 GB	huihui-ai

Quantization Guide

Quant	Quality	Size vs FP16	When to Use
Q8_0	~97%	~50%	Best quality, fits in VRAM
Q6_K	~92%	~40%	Good balance
Q4_K_M	~85%	~28%	Most popular, fits smaller GPUs

Higher quantization = better quality but more VRAM. The installer picks the best quant that fits your GPU.

How Selection Works

Installer detects your GPU(s) and VRAM
Model selector iterates the catalog by priority
For each model, tries quants from highest to lowest (Q8_0 → Q6_K → Q4_K_M)
Picks the largest model at the highest quant that fits
Downloads the GGUF from HuggingFace and creates a symlink (mnemo-chat.gguf)

Adding Models

The model catalog is at core/ai_engine/model_catalog.py. To add a model:

Find an abliterated GGUF on HuggingFace
Add a CatalogModel entry with GGUF quants, VRAM requirements, and capabilities
The selector will automatically consider it during installation

API

# List all catalog models
curl http://localhost:9100/api/ai/catalog
 
# Filter by VRAM
curl http://localhost:9100/api/ai/catalog?vram_mb=12000
 
# See current model assignment
curl http://localhost:9100/api/ai/model-assignment

Overview Inference Engine