KCode — AI Coding Assistant
KCode is KULVEX’s standalone AI coding assistant CLI. It runs 100% on your GPU — your code never leaves your machine.
Features
- 48 built-in tools — bash, read, write, edit, multi-edit, glob, grep, grep-replace, rename, git, agent, browser, deploy, image-gen, web-fetch, web-search, LSP, cron, worktrees, plan mode, tasks, notebooks, and more
- Streaming TUI — React/Ink terminal UI with thinking blocks, permission dialogs, spinner, 11 color themes
- Subagents — Spawn general/explore/plan agents, or define custom agents in
~/.kcode/agents/ - Git worktree isolation — Agents work on isolated copies of your repo
- Multi-agent swarm — Orchestrate parallel sub-agents with
--agentsfor divide-and-conquer workflows - MCP support — Connect external tool servers with per-server tool allow/block lists
- Memory system — Persistent YAML-frontmatter memories across sessions
- Session transcripts — Full conversation logs in JSONL with full-text search
- 150+ slash commands —
/plan,/pin,/memory,/search,/compact,/rewind,/stats,/benchmark,/fix,/cloud,/toggle, and more - Audit engine — Built-in code audit with
/fixrecipes for every registered audit pattern, auto-skip for generated projects - Web & API engines — Generate full-stack web apps and REST/GraphQL APIs from a single prompt, across 23+ language/framework stacks
- Enterprise managed policies — Admin-deployed policy files with locked settings, model restrictions, and audit logging
- Lifecycle hooks — 28 hook events (PreToolUse, PostToolUse, SubagentStart, etc.) with command, prompt, and HTTP webhook types
- Security hardening — SSRF protection, protected directories, symlink resolution, permission rules, sensitive file guards
Quick Start
# Interactive REPL
kk
# Single-shot query
k "explain this function"
# Print mode (no TUI, pipe-friendly)
k --print "list all TODO comments"
# With a specific model
k -m mnemo:mark5-max "refactor this module"How It Works
KCode talks directly to llama-server (port 10091) via OpenAI-compatible SSE streaming. There’s no KULVEX API middleman — it’s a direct connection to Mnemo for minimal latency.
kcode → llama-server:10091 (SSE streaming)
└── Mnemo model on GPUIt also supports any OpenAI-compatible API (Ollama, vLLM, LM Studio, cloud providers) via the model registry.
Architecture
- ~157,000 lines of original TypeScript
- 577 source files across
core/,tools/,ui/,web/,cli/ - Compiled to a ~107MB standalone Bun binary
- 5,700+ tests across 329 test files, all passing