Mnemo (AI Engine)Model Catalog

Model Catalog

KULVEX maintains a curated catalog of abliterated models optimized for self-hosted inference.

Available Models

Chat / General Purpose

ModelParamsGGUF QuantsMin VRAMSource
Qwen 3.5 27B27BQ4_K_M, Q6_K, Q8_016 GBmradermacher
Gemma 3 27B27BQ4_K_M, Q6_K, Q8_016 GBmlabonne
Qwen 3 14B14BQ4_K_M, Q6_K, Q8_08 GBhuihui-ai
Qwen 3 8B8BQ4_K_M, Q6_K, Q8_05 GBhuihui-ai
Qwen 3 4B4BQ4_K_M, Q8_03 GBhuihui-ai

Code

ModelParamsGGUF QuantsMin VRAMSource
GLM-4.7 Flash 30B30B (MoE, 3.3B active)Q4_K_M, Q8_018 GBhuihui-ai
Devstral 24B24BQ4_K_M, Q6_K14 GBmradermacher

Vision

ModelParamsGGUF QuantsMin VRAMSource
Gemma 3 12B Vision12BQ4_K_M, Q6_K8 GBmlabonne
GLM-OCR 0.9B0.9BQ8_02 GBhuihui-ai

Quantization Guide

QuantQualitySize vs FP16When to Use
Q8_0~97%~50%Best quality, fits in VRAM
Q6_K~92%~40%Good balance
Q4_K_M~85%~28%Most popular, fits smaller GPUs

Higher quantization = better quality but more VRAM. The installer picks the best quant that fits your GPU.

How Selection Works

  1. Installer detects your GPU(s) and VRAM
  2. Model selector iterates the catalog by priority
  3. For each model, tries quants from highest to lowest (Q8_0 → Q6_K → Q4_K_M)
  4. Picks the largest model at the highest quant that fits
  5. Downloads the GGUF from HuggingFace and creates a symlink (mnemo-chat.gguf)

Adding Models

The model catalog is at core/ai_engine/model_catalog.py. To add a model:

  1. Find an abliterated GGUF on HuggingFace
  2. Add a CatalogModel entry with GGUF quants, VRAM requirements, and capabilities
  3. The selector will automatically consider it during installation

API

# List all catalog models
curl http://localhost:9100/api/ai/catalog
 
# Filter by VRAM
curl http://localhost:9100/api/ai/catalog?vram_mb=12000
 
# See current model assignment
curl http://localhost:9100/api/ai/model-assignment