Lab · Agentic AI · Active Exploration

Agentic AI in Business

A living exploration of autonomous agents, personal AI, and what actually works — from someone who’s deploying them in a real business.

I’m exploring agentic AI frameworks hands-on — not as demos, but as actual deployments against real problems. This Lab section documents what I find: what works, what breaks, what the hype gets wrong, and what the security risks actually look like.

Read the first blog post →

Entry 01 · April 2026

OpenClaw — The Personal AI Agent

An open-source AI agent framework that runs locally on your machine, connects to LLMs, and executes tasks through WhatsApp, Telegram, Slack, or Discord.

Read the full blog post: I Gave an AI Agent the Keys to My Business →

What It Is

OpenClaw is an open-source AI agent framework created by Peter Steinberger, originally launched as Clawdbot in late 2025. It connects large language models to your tools — file system, browser, messaging apps, APIs — and executes real tasks autonomously. You interact with it through WhatsApp or Telegram like texting a colleague. It has 100+ built-in skills and a community registry (ClawHub) with hundreds more.

Core Architecture

Local Node.js service — gateway and message router; runs on your machine

LLM reasoning layer — Claude, GPT, DeepSeek, or local models via your own API keys

100+ built-in skills — plus ClawHub community registry with hundreds more

20+ messaging channels — WhatsApp, Telegram, Slack, Discord as the interaction interface

Scheduled execution — cron jobs ("heartbeats") for autonomous task execution

Privacy-first — data stays local by default; no third-party ingestion

What Works in Practice

Structured, well-defined tasks — research, content drafting, script execution, code deployment

Low-friction interface — existing messaging apps mean no new UI to learn

Local-first architecture — sensitive data stays off third-party servers

Extensible skill system — custom capability building for domain-specific workflows

What Doesn’t — Yet

Reliability — scheduled tasks don't always execute as expected; requires monitoring

Memory — context degrades across sessions; requires user-built memory architecture to compensate

Self-learning — autonomous improvement and proactive idea generation not observed in practice

Agent scaling — no built-in mechanism for spawning or delegating to sub-agents when complexity grows

Security Considerations

This is not optional reading. OpenClaw operates at the system level with significant permissions. The security risks are real, documented, and have affected real deployments — including mine.

Shell and file access — agents can execute shell commands, access file systems, and make network requests

Exposed instances — misconfigured OpenClaw instances have been found exposed to the internet (42,000+ documented)

Third-party skill risk — the ClawHub registry lacks comprehensive vetting; malicious skills have been documented

Prompt injection — embedded instructions in documents or messages can hijack agent behavior

Unintended actions — agents may create, modify, or expose resources beyond the scope of instructions (this happened to me — see the blog post)

Enterprise responses — NVIDIA released NemoClaw (March 2026) with sandboxing and container isolation; Cisco released DefenseClaw for scanning skills and generated code before execution

When to Use an Agent

Use an agent when all three conditions are met. Skip it when any of them isn’t.

Use when:

The task is structured and repeatable

The cost of failure is low or recoverable

You can verify the output before it becomes permanent

Skip when:

The task involves sensitive credentials or intellectual property without sandboxing

You can't verify output before it takes effect (public code pushes, live emails, API calls)

The task requires judgment or context the agent doesn't have

Operational Principles

These are the principles I’m operating with after real-world use. Not theory — practice.

Security first — Evaluate every integration for attack surface. Assume every layer can be compromised.

Structure over natural language — Agents perform better with disciplined instructions than conversational prompts.

Verify before trust — Until reliability improves, treat agent output as draft, not final.

Scope tightly — One agent, one domain. Overloading agents with diverse tasks degrades performance.

Build memory deliberately — Separate brand memory, project memory, and agent memory. Don't rely on built-in persistence.

Coming Next in This Series

Entry 02: Hermes Agent

Self-improving AI and the learning loop — a different model of agent intelligence.

Entry 03: Paperclip AI

Scaling autonomous agent teams — the orchestration problem.

Entry 04: Security Architecture

Designing multi-agent deployments with security as a first-class concern.

This series is updated as I go. Follow the work at the blog or reach out directly.

Exploring agentic AI in your organization?

I help founders and organizations think through AI deployment — security, architecture, and what’s actually worth building. Let’s talk.

Start a Conversation