Getting StartedHardware Guide

Hardware Guide

KULVEX auto-detects your hardware and selects the best model configuration.

GPU Tiers

Full local inference with the highest quality models. 27B+ parameter models at Q6_K or Q8_0 quantization.

  • Best chat quality
  • Fast inference (~20-40 tok/s)
  • Room for additional models (vision, code)

Standard: 12GB+ VRAM (RTX 3060, RTX 4060 Ti)

Core models at Q4_K_M quantization. Excellent quality for daily use.

  • 14B-27B models at Q4_K_M
  • Good inference speed (~15-25 tok/s)
  • Single model fits comfortably

Minimum: 8GB VRAM (RTX 3060 8GB, RTX 4060)

Smaller models (8B) or aggressive quantization.

  • 8B models at Q4_K_M
  • Functional but less capable
  • Cloud fallback recommended for complex tasks

Cloud-only: No GPU

All inference via Claude API. Requires an ANTHROPIC_API_KEY.

  • No local models downloaded
  • Full functionality via cloud
  • Requires internet connection

Multi-GPU

If you have 2+ NVIDIA GPUs, KULVEX assigns:

  • Largest VRAM GPU → code model (mnemo:code)
  • Second GPU → chat model (mnemo)

This gives you dedicated inference for both chat and coding tasks.

Apple Silicon

macOS with Apple Silicon uses unified memory. KULVEX estimates ~75% of total RAM as available GPU memory and selects models accordingly.

  • M1 Pro (16GB) → ~12GB for models
  • M1 Max (32GB) → ~24GB for models
  • M2 Ultra (64GB+) → full 27B+ models

Model Selection

The installer picks the best abliterated model from the catalog:

VRAMModelQuantSize
32GB+Qwen 3.5 27BQ8_0~29 GB
24GB+Qwen 3.5 27BQ6_K~22 GB
16GB+Qwen 3.5 27BQ4_K_M~16.6 GB
12GB+Qwen 3 14BQ4_K_M~8.5 GB
8GB+Qwen 3 8BQ4_K_M~5 GB

All models are abliterated (uncensored) — sourced from huihui-ai and mradermacher communities.

Users never see base model names. Everything is branded as mnemo.

Checking Your Hardware

After installation:

# From the web dashboard
# Go to AI > Status to see GPU utilization, model info, VRAM usage
 
# API endpoint
curl http://localhost:9100/api/ai/hardware