Containerized AI Development with Governance and Intelligence Layers
Changes: v1.5 โ v1.7
Two architecture versions deployed in a single push. v1.6 adds containerization and workspace governance. v1.7 adds the intelligence and automation layer on top.
| Version | Title | Components Added |
|---|---|---|
| v1.6 | Containerization and Workspace Governance | Docker orchestration, sandbox, dashboard, SOPS, 5-layer governance, pre-commit enforcement |
| v1.7 | Intelligence and Automation Layer | Prometheus, Cassandra, Sentinel, OpenClaw nightly cron, skills library, extensions |
Previous state: bare-metal services on WSL2, flat workspace structure, manual secret handling, no overnight automation.
New state: Docker-orchestrated containers with security hardening, five-layer workspace hierarchy, SOPS-encrypted secrets, 12-hook pre-commit enforcement, and 11 overnight cron jobs handling everything from code analysis to content generation.
v1.6: Containerization and Governance
Container Architecture
Three containers run in production, each with security hardening applied.
| Container | Role | Port | Managed By |
|---|---|---|---|
| Gateway | AI agent orchestration, session routing | 18789 | Docker Compose |
| Dev Sandbox | Isolated code execution, multi-runtime environment | 9500 | Standalone Dockerfile |
| Buddy Dashboard | Second-brain UI, memory curation, automation monitoring | 5050 | Docker Compose |
Docker Compose orchestrates the gateway and dashboard. The dev sandbox runs via a standalone Dockerfile with the same hardening. All containers share:
cap_drop: ALLโ no Linux capabilities granted by defaultsecurity_opt: no-new-privilegesโ prevents privilege escalation- Memory and PID limits โ prevents resource exhaustion
- Isolated Docker bridge network
Secret Management
Secrets are encrypted with Mozilla SOPS using age keys. The pattern:
- At rest โ all secrets are SOPS-encrypted in the repository
- At decrypt โ decryption happens to tmpfs (RAM-backed filesystem) only
- At runtime โ containers mount the tmpfs decrypted secrets as read-only
- At shutdown โ tmpfs is cleared, no plaintext persists
No plaintext secrets exist in any repository, on any persistent filesystem, or in any container image.
Workspace Governance
Five-layer workspace hierarchy enforcing separation of concerns:
| Layer | Purpose | Editable by agents? |
|---|---|---|
| _governance | Workspace rules, policies, boundary documents | No โ human-only |
| _foundation | Shared libraries (lib-verification, lib-harmonia) | With review |
| _active | Live services and applications | Yes โ governed |
| _archive | Retired services, preserved for reference | No โ read-only |
| _experiments | Sandbox for prototyping, no production promotion | Yes โ ungoverned |
The hierarchy is enforced by pre-commit hooks and CI validation.
Pre-commit Enforcement
12 hooks run on every commit across the workspace:
| Category | Hooks | Count |
|---|---|---|
| Python quality | ruff (lint), ruff-format | 2 |
| File hygiene | trailing-whitespace, end-of-file-fixer, check-merge-conflict, detect-private-key, check-added-large-files | 5 |
| Format validation | check-yaml, check-json, check-toml | 3 |
| Governance | governance-tests (lib-verification), check-cross-brand-imports | 2 |
Cross-brand detection prevents code from one project referencing another, maintaining strict workspace isolation.
Dev Sandbox Capabilities
The sandbox container provides an isolated multi-runtime development environment:
Language Runtimes: Python 3.12, Node.js 22, Hugo extended edition
ML & Data Stack: PyTorch, scikit-learn, pandas, numpy, Jupyter
Media & Browser: Chromium (headless), ffmpeg
API: Sandbox API on port 9500 for isolated code execution via OpenClaw’s sandbox-exec extension.
Permissions Framework
Agent permissions use a scoped, time-limited design:
- Scoped โ each permission grants access to specific operations, not blanket access
- Expiring โ all grants expire after 90 days, requiring re-authorization
- Auditable โ every permission grant and usage is logged
- Layered โ workspace governance, pre-commit hooks, and runtime checks each enforce independently
No single bypass disables all layers.
v1.7: Intelligence and Automation Layer
Intelligence Pipelines
Three specialized pipelines added for tracking the AI landscape:
| Pipeline | Purpose | Status |
|---|---|---|
| Prometheus | AI model evolution tracking | Active |
| Cassandra | AI predictions lifecycle management | Active |
| AI Sentinel | News intelligence and daily digests | Active |
OpenClaw Skills Library
25 task-specific skills giving the AI agent structured access to operations: builds, deploys, content generation, video processing, pipeline management, and system status checks. Each skill is defined in SKILL.md format with metadata for Buddy’s command system.
OpenClaw Extensions
4 enforcement plugins adding runtime governance:
| Extension | Purpose |
|---|---|
| governance-enforcement | Workspace and security rule enforcement |
| sandbox-exec | Routes execution to dev sandbox container |
| cron-circuit-breaker | Auto-disables jobs after consecutive failures |
| webserver-enforcement | Web server access controls |
Overnight Cron Automation
OpenClaw manages 11 scheduled jobs (8 daily + 3 weekly), not a single monolithic runner. Each job runs in an isolated session with its own model, timeout, and safety rules.
Daily Schedule:
| Time | Job | Model |
|---|---|---|
| 22:00 | Infrastructure health + architecture drift detection | Sonnet |
| 22:50 | Deep code analysis (CLAUDE.md generation, code reviews) | Opus |
| 01:30 | Content generation | Opus |
| 02:00 | Content & research (blog posts, social batches) | Opus |
| 03:00 | Feature discovery (proposals for new content and features) | Sonnet |
| 04:00 | Test & CI generation (test files, GitHub Actions workflows) | Sonnet |
| 05:30 | Morning briefing (summary of all overnight work) | Haiku |
| 06:00 | Content generation (weekdays only) | Sonnet |
Weekly: prediction updates, feedback synthesis, staging cleanup.
The 22:00 infrastructure health job includes architecture drift detection โ it reads the architecture page and changelog, compares against the actual environment, and writes drift reports to staging if anything has changed. This is how gaps between documentation and reality get caught automatically.
All jobs follow safety rules: never push code, never delete files, never modify cron jobs. Output goes to staging for human review in the morning briefing.
Rationale
Why containerize? The bare-metal WSL2 setup worked, but created risks: dependency conflicts between services, no resource isolation, and difficulty reproducing the environment. Docker provides reproducible, isolated, hardened containers.
Why workspace governance? As the number of services grew past 15, flat directory structures became unmanageable. The five-layer system provides clear boundaries: what’s shared, what’s active, what’s experimental, and what’s archived.
Why SOPS? Secrets management was ad-hoc โ some in .env files, some in memory. SOPS + age provides encryption at rest with a simple, auditable decryption pattern. The tmpfs constraint means a powered-off machine has zero accessible secrets.
Why OpenClaw cron instead of system cron? OpenClaw’s cron system runs jobs through the AI gateway with session isolation, model selection, timeout enforcement, and circuit breaker protection. System cron can run scripts; OpenClaw cron can run AI agents with governance.
Architecture version: v1.5 โ v1.7. Architecture drift detection runs nightly at 22:00 via OpenClaw cron Job 1.
Configuration details reflect a production environment at time of writing. Implementation specifics vary based on tooling versions, platform updates, and organizational requirements. Validate approaches against current documentation before deployment.