Octopus — A Lisp‑Native MCP Server Appliance

What Octopus is

Octopus is a machine that boots directly into a live Common Lisp image. The image is the application. Claude connects to it over MCP and can read it, call it, extend it, and rewrite it — in real time, while it's running, without a redeploy.

Here is what that looks like in practice. Claude sends this to the running server:

;; Claude defines a new tool on the live server:
(define-tool query-influx
  "Run a Flux query against the local InfluxDB and return results as JSON."
  (jobj "type" "object"
        "properties" (jobj "query" (jobj "type" "string"
                                         "description" "Flux query string")))
  (influx-query (gethash "query" args)))

;; The tool now exists. Claude can call it immediately.
;; It is written to tools.lisp and will be there after every reboot.

That function is now part of the running server. Not queued for the next deploy. Not waiting for a container rebuild. Not sitting in a pull request. It exists now, in the image, and it will exist after every future restart because it was written to tools.lisp at the moment of definition.

This is the core idea: the server is not a static binary that gets replaced — it is a living Lisp image that grows. Claude is the author, the deployer, and the operator, all in the same conversation.

The underlying OS is a minimal Linux appliance (~50MB) built with Buildroot. It boots in seconds, runs eight supervised services via s6, and exposes the MCP server through a Cloudflare Tunnel — no open inbound ports, no static IP required. The entire infrastructure exists to get out of the way and let the Lisp image run.

Maybe it's time to rethink the sandbox

"If the AI can write the code, can't it also handle security?"

A friend who graduated with a computer science degree in the early 90s recently smiled when he saw Octopus modifying its own code at runtime. He remembered when self-modifying code was the boogeyman of software engineering — the thing you were told never to do. And yet here it is, working cleanly, supervised by an AI that understands what it's changing and why.

That smile contains a real question worth sitting with: was the sandbox model always the right answer, or was it the best answer available at the time?

Why the sandbox model exists

The security model that dominates software today — sandboxes, least privilege, capability restrictions, static analysis, input validation, memory isolation — was built around one core assumption: code is written by humans, executed by machines, and the machine cannot be trusted to know the difference between intended and malicious behaviour.

So we built walls. Processes can't touch each other's memory. Syscalls are filtered. Network access is restricted. Everything is sandboxed because the runtime has no judgement — it will execute whatever it's given. The sandbox is the substitute for judgement.

The assumption is changing

When Claude evaluates Lisp code in the Octopus image, it isn't blindly executing a string. It composed that string. It knows what it's trying to accomplish. It can reason about whether the operation makes sense, whether it's consistent with prior context. This is qualitatively different from a runtime that executes whatever arrives on stdin.

The traditional sandbox assumes a dumb executor. An AI-driven system isn't dumb. It has the one thing the sandbox was built to compensate for: judgement.

What a different model might look like

Intent verification — the agent declares what it's trying to accomplish before acting.
Behavioural auditing — every action is logged with the reasoning that produced it.
Reversibility over prevention — snapshots and rollback instead of blocked operations.
Earned trust tiers — broader access as the agent demonstrates reliable behaviour over time.

The counterargument

We are not there yet. Current AI systems hallucinate, make mistakes, and can be manipulated through prompt injection. The sandbox contains well-intentioned mistakes as much as malicious ones — and AI systems make plenty of those.

Octopus is an experiment on a machine its owner controls, used by a single trusted user. Giving an AI unrestricted eval access to a production system handling sensitive data for thousands of users is a different proposition entirely. We need to earn that trust incrementally, not assume it.

The deeper shift

Security has always been about managing the gap between what a system is supposed to do and what it might be made to do. As AI agents become more capable and more auditable, that gap narrows — not because AI is infallible, but because an agent that understands intent can participate in its own security model rather than simply being constrained by walls imposed from outside.

Your friend's smile was recognition of something real: the old rules were written for a world where code had no author in the room at runtime. That world is changing. What the new rules look like is still an open question — but it's the right question to be asking.

Boot sequence

BIOS/UEFI
  └─► GRUB2 (EFI)
        └─► Linux 6.11.11 kernel
              └─► initramfs (embedded in bzImage)
                    └─► /init
                          └─► /usr/local/bin/octopus-init  (PID 1, stage 1)
                                │
                                ├─ mount /proc /sys /dev
                                ├─ load /etc/octopus.env
                                ├─ set hostname
                                ├─ bring up eth0
                                ├─ write SSH authorized_keys
                                ├─ mkdir /data/projects
                                │
                                └─► exec s6-svscan /etc/s6/services  (PID 1, stage 2)
                                          │
                                          ├─ mcp-server
                                          ├─ cloudflared
                                          ├─ sshd
                                          ├─ chrony
                                          ├─ unbound
                                          ├─ dhcpcd
                                          └─ watchdog

octopus-init is a small shell script that runs as PID 1. It handles the basics — mounts, env, network — then hands control to s6-svscan which supervises all services. If any service dies, s6 restarts it automatically. If s6 itself dies, the kernel panics and the machine reboots (the watchdog ensures it comes back up).

The eight arms

Each service runs as a supervised s6 service with a run script and a finish script that logs crashes.

mcp-server

The SBCL Common Lisp MCP server. Listens on port 8765. Starts in ~100ms from a pre-compiled executable core.

Restart policy: immediate, unlimited

cloudflared

Cloudflare Tunnel daemon. Opens an outbound HTTPS connection to Cloudflare's edge, exposing port 8765 as a public URL without any inbound firewall rules.

Waits for a default route before starting

sshd

OpenSSH server. Key-only auth (no passwords). Authorized keys written from /etc/octopus.env at boot.

Port 22, ed25519 + rsa host keys

chrony

NTP time sync. Essential for TLS certificate validation, log timestamps, and token expiry.

Replaces ntpd — faster initial sync

unbound

Local caching DNS resolver. All services use it via 127.0.0.1:53. Provides DNSSEC validation.

Listens on 127.0.0.1 only

dhcpcd

DHCP client. Brings up the primary network interface and obtains an IP.

Interface configured via NETWORK_INTERFACE env

watchdog

Kicks the kernel hardware watchdog every 10 seconds. If this process dies, the kernel reboots the machine after ~60 seconds.

/dev/watchdog, busybox watchdog

s6-svscan (PID 1)

Not a service — it is the supervisor. Scans /etc/s6/services, starts each service, restarts on exit.

Replaces systemd/init — ~500KB total

How the Cloudflare Tunnel works

Claude.ai ──HTTPS──► Cloudflare edge ──tunnel──► cloudflared on Octopus ──► mcp-server:8765

When Octopus boots, cloudflared makes an outbound HTTPS connection to Cloudflare. No inbound ports need to be open. Cloudflare routes all traffic through the persistent outbound connection. From Octopus's perspective it's all outbound traffic on port 443.

Setup: create a tunnel at one.dash.cloudflare.com, copy the token, set it as TUNNEL_TOKEN in /etc/octopus.env, configure the public hostname to point at http://localhost:8765.

The MCP server

The MCP server is a live Common Lisp image built with SBCL. At build time, sb-ext:save-lisp-and-die compiles the server and all Quicklisp dependencies into a single self-contained executable (~12MB). At boot it starts in ~100ms — no Quicklisp loading, no dependency resolution.

Built-in tools

read_file — read a file from the server's filesystem
write_file — write a file
list_directory — list a directory
exec_command — run a shell command
eval_lisp — evaluate arbitrary Lisp in the running image
server_info — uptime, tool count, hostname
grafana — REST calls to Grafana (persisted tool, auto-loaded)

Defining a new tool from Claude

;; Claude calls eval_lisp with this form:
(define-tool my-tool
  "Does something useful."
  (jobj "type" "object"
        "properties" (jobj "arg" (jobj "type" "string")))
  (format nil "You passed: ~A" (gethash "arg" args)))

;; define-tool registers it in *tool-registry* AND appends it
;; to tools.lisp so it survives restarts.

Persistence model

User-defined tools are written to tools.lisp on the server. At startup, tools.lisp is loaded automatically, restoring all previously defined tools. The *loading-tools* guard prevents re-appending definitions already in the file.

OAuth 2.0 + PKCE

The server implements OAuth 2.0 with PKCE for Claude.ai integration. All MCP requests are authenticated via bearer token.

Why Lisp

The image model

A Lisp image is a snapshot of the entire running system — code, data, compiled functions, and state. save-lisp-and-die freezes the image at build time so the server starts instantly with everything already compiled. At runtime, new tools can be eval'd in, appended to tools.lisp, and will be present in every future restart — without recompiling or redeploying.

Code is data

The define-tool macro expands into a setf on a hash table plus an append to a file. Tools are just functions in a table. Adding a tool is as simple as putting a new function in the table. The MCP protocol handler iterates that table to build the schema. There is no registration step, no restart, no yaml.

Live REPL over MCP

The eval_lisp tool gives Claude a full Lisp REPL into the running server. Claude can inspect state, redefine existing functions, query variables, or build new abstractions that persist across sessions. The running server is the development environment.

Why not Python/Node/Go

Python/Node — dynamic but no persistent image model. Restarting means reloading everything. Nothing survives the process.
Go/Rust — fast but completely static. No runtime redefinition at all.
Java — hot-swap exists but is fragile and limited. No equivalent to save-lisp-and-die.

Lisp is the only mainstream environment that combines runtime redefinition, persistent compiled images, macro-driven extension, native performance, and full OS access. Octopus is a modern Lisp Machine — embedded, networked, and AI-driven.

Drop Docker. Drop the update cycle. Let AI write the services.

"What if the answer to the patch treadmill isn't better patch management — it's not having the software that needs patching?"

Consider what a typical organisation runs: a web server, a database, a cache, a queue, a monitoring stack, maybe a few internal APIs. Each of those is a third-party binary, written by people who have never seen your environment, carrying a dependency tree that extends hundreds of packages deep. Every one of those packages has a CVE history. Every one needs patching. Each patch requires downloading a new binary from the internet, testing it, and rebooting or restarting the service — ideally at 2am to minimise user impact.

This is the current state of operations, and it is genuinely insane. Not because the people running it are incompetent, but because the model itself is broken. The attack surface of modern software is almost entirely composed of code nobody in the organisation wrote, nobody fully understands, and nobody can meaningfully audit. Security patches are downstream acknowledgements of upstream failures in software that was never designed with your specific environment in mind.

What if you wrote your own services instead?

For most organisations, this has historically been laughable — you can't staff a team to write and maintain a web server, a database engine, and a TLS stack. But that assumption was based on the cost of human software development. That cost is collapsing.

With an AI that can write, test, deploy, and iterate on software in a live Lisp image, the calculus changes. A Lisp application that serves your specific API doesn't need a generic web framework with 40 middleware layers. A Lisp data store that holds your specific schema doesn't need a general-purpose SQL engine with decades of legacy surface area. The application is exactly as complex as it needs to be — no more.

Traditional stack — the real costs

nginx + openssl + libssl dependency chain
PostgreSQL with 30 years of legacy code paths
Docker daemon, container runtime, image registry
Kubernetes control plane (if you went there)
CVE alerts, patch windows, reboot cycles
IT staff whose job is applying other people's updates
Vendor lock-in on software you didn't write and can't fully change
Configuration drift between environments

Octopus model — what changes

HTTP server is 50 lines of Lisp you can read in 5 minutes
Data persistence is exactly the schema you need, nothing else
No container runtime — the service is the application
No package manager — dependencies are compiled into the image
Security surface is the kernel + musl + your code
Updates are AI-generated patches applied to a live image
Full understanding of every line running on the machine
Rollback is a Lisp image snapshot

The dependency problem is the security problem

Most CVEs are not in your code. They are in code three layers below yours that you pulled in transitively through a dependency you didn't choose to have. The log4j vulnerability didn't affect organisations because they wrote bad Java — it affected them because log4j was somewhere in a stack they couldn't fully enumerate. The XZ utils backdoor almost shipped in half the Linux distributions in the world because it was a dependency nobody was watching closely enough.

The answer is not better dependency scanning. The answer is fewer dependencies. A Lisp application written by an AI for a specific purpose carries exactly the dependencies required for that purpose. There is no general-purpose web framework pulling in an XML parser you never use. There is no logging library with a JNDI lookup nobody thought to disable.

What this looks like in practice

Imagine describing what you need to an AI: "I need an API that accepts sensor readings from 50 devices, stores the last 30 days of data, and serves a dashboard showing current values and trends." In the current model, that becomes: choose a web framework, choose a database, write glue code, deploy containers, set up monitoring, manage updates. In the Octopus model: Claude writes a Lisp application that does exactly that, deploys it to an Octopus machine, and modifies it in response to your feedback — live, without a reboot.

The security properties are different in kind. You know what the application does because you described it and the AI wrote it in front of you. You can read every line. There is no opaque binary from a vendor. There is no update schedule driven by someone else's CVE disclosure process. If a vulnerability is found, the AI patches the specific function, pushes the change to the live image, and the fix is live in seconds.

The honest scope of this today

This is not yet a replacement for all enterprise software. Databases handling millions of transactions, web servers under serious DDoS load, cryptographic implementations — these still benefit from decades of hardening in well-maintained open source software. Rewriting OpenSSL in Lisp is not the move.

But the vast middle ground of internal tooling, line-of-business applications, dashboards, APIs, data pipelines, monitoring systems — the things that make up the majority of most organisations' operational surface area — are excellent candidates. Straightforward enough for an AI to write correctly, specific enough that a general-purpose tool adds unnecessary complexity, important enough that the current update-and-hope model carries real risk.

The model is: understand exactly what you need, have an AI build exactly that, run it on hardware you control, update it by describing what changed. The patch treadmill stops because there is no upstream to patch from.

How it's built

Build systemBuildroot 2024.11.1 — cross-compilation toolchain, package management, image assembly

C librarymusl libc — smaller and cleaner than glibc for an appliance

KernelLinux 6.11.11 — stripped config, x86_64, PCI, E1000E, VirtIO, no DRM/sound/wireless

Init (stage 1)octopus-init — shell script, PID 1, mounts + env + network then execs s6

Init (stage 2)s6-svscan — service supervision, restart on crash, SIGCHLD handling

Utilitiesbusybox — shell, mount, ip, watchdog, and ~200 other tools in ~1MB

MCP serverSBCL + hunchentoot + dexador + yason — compiled to a 12MB executable at build time

BootloaderGRUB2 EFI — boots from USB or netboot

Image size~50MB ISO, ~23MB bzImage (kernel + embedded initramfs)

Netboot

kernel http://your-server/octo/bzImage-pci ip=dhcp console=tty0 earlyprintk=vga nomodeset panic=5
# initramfs is embedded — no separate initrd needed

Configuration

HOSTNAME=octopus
NETWORK_INTERFACE=eth0
TUNNEL_TOKEN=your-cloudflare-tunnel-token
SSH_AUTHORIZED_KEYS="ssh-ed25519 AAAA... your-key"
NTP_SERVERS="0.pool.ntp.org 1.pool.ntp.org"
MCP_PORT=8765
MCP_ROOT=/data/projects

Why live Lisp evaluation isn't dangerous here

Historically "self-modifying code" meant programs writing bytes into their own executable memory — the province of malware and 1970s assembly tricks. Octopus does none of that. Lisp's runtime redefinition is a first-class, language-level operation: redefining a function is updating a function cell, the same operation a REPL-driven development workflow has used safely for decades.

Every tool definition is logged to tools.lisp. The image can be snapshotted. The reasoning behind each eval is visible in the conversation. The audit trail is structural, not bolted on.