Mnemo (AI Engine)Overview

Mnemo — AI Engine

Mnemo is KULVEX’s local AI inference system. It runs abliterated (uncensored) language models on your GPU via llama.cpp.

Architecture

User Message

    ├── Local path: Mnemo (llama.cpp on GPU)
    │     └── Ollama-compatible API → streaming response

    └── Cloud path: Claude API (Anthropic)
          └── Native tool_use → streaming response

The user toggles between local and cloud mode in the chat UI. Both paths support:

  • Streaming token-by-token responses
  • Tool use (17 domain agents)
  • Conversation history with context building
  • RAG (Retrieval-Augmented Generation) from knowledge base

Mnemo Branding

Users never see base model names. All models are branded as mnemo:

Internal NameUser Sees
Qwen 3.5 27B abliteratedmnemo
GLM-4.7 Flash 30B abliteratedmnemo:code
GLM-OCR 0.9Bmnemo:scanner
Whisper large-v3mnemo:voice

Why Abliterated?

Standard LLMs have built-in content filters that refuse certain requests. Abliterated models have these refusal mechanisms removed, giving you an AI that:

  • Answers any question without moralizing
  • Doesn’t lecture about safety or ethics
  • Follows your instructions directly
  • Acts as a tool, not a nanny

All models in the KULVEX catalog are pre-abliterated from trusted community sources (huihui-ai, mlabonne, mradermacher).

Key Components

  • Model Catalog — Curated database of abliterated models with GGUF quantizations
  • Model Selector — Automatic hardware-aware model selection
  • llama-server — llama.cpp HTTP server (Docker container with CUDA)
  • Model Router — Routes requests to the right model based on task type
  • Context Builder — Constructs conversation context with system prompts and RAG