KCode — AI Coding Assistant

KCode is KULVEX’s standalone AI coding assistant CLI. It runs 100% on your GPU — your code never leaves your machine.

Features

48 built-in tools — bash, read, write, edit, multi-edit, glob, grep, grep-replace, rename, git, agent, browser, deploy, image-gen, web-fetch, web-search, LSP, cron, worktrees, plan mode, tasks, notebooks, and more
Streaming TUI — React/Ink terminal UI with thinking blocks, permission dialogs, spinner, 11 color themes
Subagents — Spawn general/explore/plan agents, or define custom agents in ~/.kcode/agents/
Git worktree isolation — Agents work on isolated copies of your repo
Multi-agent swarm — Orchestrate parallel sub-agents with --agents for divide-and-conquer workflows
MCP support — Connect external tool servers with per-server tool allow/block lists
Memory system — Persistent YAML-frontmatter memories across sessions
Session transcripts — Full conversation logs in JSONL with full-text search
150+ slash commands — /plan, /pin, /memory, /search, /compact, /rewind, /stats, /benchmark, /fix, /cloud, /toggle, and more
Audit engine — Built-in code audit with /fix recipes for every registered audit pattern, auto-skip for generated projects
Web & API engines — Generate full-stack web apps and REST/GraphQL APIs from a single prompt, across 23+ language/framework stacks
Enterprise managed policies — Admin-deployed policy files with locked settings, model restrictions, and audit logging
Lifecycle hooks — 28 hook events (PreToolUse, PostToolUse, SubagentStart, etc.) with command, prompt, and HTTP webhook types
Security hardening — SSRF protection, protected directories, symlink resolution, permission rules, sensitive file guards

Quick Start

# Interactive REPL
kk
 
# Single-shot query
k "explain this function"
 
# Print mode (no TUI, pipe-friendly)
k --print "list all TODO comments"
 
# With a specific model
k -m mnemo:mark5-max "refactor this module"

How It Works

KCode talks directly to llama-server (port 10091) via OpenAI-compatible SSE streaming. There’s no KULVEX API middleman — it’s a direct connection to Mnemo for minimal latency.

kcode → llama-server:10091 (SSE streaming)
         └── Mnemo model on GPU

It also supports any OpenAI-compatible API (Ollama, vLLM, LM Studio, cloud providers) via the model registry.

Architecture

~157,000 lines of original TypeScript
577 source files across core/, tools/, ui/, web/, cli/
Compiled to a ~107MB standalone Bun binary
5,700+ tests across 329 test files, all passing

Creating Agents Installation