Hardware Guide
KULVEX auto-detects your hardware and selects the best model configuration.
GPU Tiers
Recommended: 24GB+ VRAM (RTX 4090, RTX 5090)
Full local inference with the highest quality models. 27B+ parameter models at Q6_K or Q8_0 quantization.
- Best chat quality
- Fast inference (~20-40 tok/s)
- Room for additional models (vision, code)
Standard: 12GB+ VRAM (RTX 3060, RTX 4060 Ti)
Core models at Q4_K_M quantization. Excellent quality for daily use.
- 14B-27B models at Q4_K_M
- Good inference speed (~15-25 tok/s)
- Single model fits comfortably
Minimum: 8GB VRAM (RTX 3060 8GB, RTX 4060)
Smaller models (8B) or aggressive quantization.
- 8B models at Q4_K_M
- Functional but less capable
- Cloud fallback recommended for complex tasks
Cloud-only: No GPU
All inference via Claude API. Requires an ANTHROPIC_API_KEY.
- No local models downloaded
- Full functionality via cloud
- Requires internet connection
Multi-GPU
If you have 2+ NVIDIA GPUs, KULVEX assigns:
- Largest VRAM GPU → code model (mnemo:code)
- Second GPU → chat model (mnemo)
This gives you dedicated inference for both chat and coding tasks.
Apple Silicon
macOS with Apple Silicon uses unified memory. KULVEX estimates ~75% of total RAM as available GPU memory and selects models accordingly.
- M1 Pro (16GB) → ~12GB for models
- M1 Max (32GB) → ~24GB for models
- M2 Ultra (64GB+) → full 27B+ models
Model Selection
The installer picks the best abliterated model from the catalog:
| VRAM | Model | Quant | Size |
|---|---|---|---|
| 32GB+ | Qwen 3.5 27B | Q8_0 | ~29 GB |
| 24GB+ | Qwen 3.5 27B | Q6_K | ~22 GB |
| 16GB+ | Qwen 3.5 27B | Q4_K_M | ~16.6 GB |
| 12GB+ | Qwen 3 14B | Q4_K_M | ~8.5 GB |
| 8GB+ | Qwen 3 8B | Q4_K_M | ~5 GB |
All models are abliterated (uncensored) — sourced from huihui-ai and mradermacher communities.
Users never see base model names. Everything is branded as mnemo.
Checking Your Hardware
After installation:
# From the web dashboard
# Go to AI > Status to see GPU utilization, model info, VRAM usage
# API endpoint
curl http://localhost:9100/api/ai/hardware