Hardware Guide

KULVEX auto-detects your hardware and selects the best model configuration.

GPU Tiers

Recommended: 24GB+ VRAM (RTX 4090, RTX 5090)

Full local inference with the highest quality models. 27B+ parameter models at Q6_K or Q8_0 quantization.

Best chat quality
Fast inference (~20-40 tok/s)
Room for additional models (vision, code)

Standard: 12GB+ VRAM (RTX 3060, RTX 4060 Ti)

Core models at Q4_K_M quantization. Excellent quality for daily use.

14B-27B models at Q4_K_M
Good inference speed (~15-25 tok/s)
Single model fits comfortably

Minimum: 8GB VRAM (RTX 3060 8GB, RTX 4060)

Smaller models (8B) or aggressive quantization.

8B models at Q4_K_M
Functional but less capable
Cloud fallback recommended for complex tasks

Cloud-only: No GPU

All inference via Claude API. Requires an ANTHROPIC_API_KEY.

No local models downloaded
Full functionality via cloud
Requires internet connection

Multi-GPU

If you have 2+ NVIDIA GPUs, KULVEX assigns:

Largest VRAM GPU → code model (mnemo:code)
Second GPU → chat model (mnemo)

This gives you dedicated inference for both chat and coding tasks.

Apple Silicon

macOS with Apple Silicon uses unified memory. KULVEX estimates ~75% of total RAM as available GPU memory and selects models accordingly.

M1 Pro (16GB) → ~12GB for models
M1 Max (32GB) → ~24GB for models
M2 Ultra (64GB+) → full 27B+ models

Model Selection

The installer picks the best abliterated model from the catalog:

VRAM	Model	Quant	Size
32GB+	Qwen 3.5 27B	Q8_0	~29 GB
24GB+	Qwen 3.5 27B	Q6_K	~22 GB
16GB+	Qwen 3.5 27B	Q4_K_M	~16.6 GB
12GB+	Qwen 3 14B	Q4_K_M	~8.5 GB
8GB+	Qwen 3 8B	Q4_K_M	~5 GB

All models are abliterated (uncensored) — sourced from huihui-ai and mradermacher communities.

Users never see base model names. Everything is branded as mnemo.

Checking Your Hardware

After installation:

# From the web dashboard
# Go to AI > Status to see GPU utilization, model info, VRAM usage
 
# API endpoint
curl http://localhost:9100/api/ai/hardware

First Steps Updating