WASP is a self-hosted cognitive runtime that plans, executes, and improves itself, running real tasks with real tools, powered by an 18-layer memory system, 37 skills, and 27 autonomous background processes. No dependencies. Built for control. Designed to operate.
Agent Wasp is under active development. New capabilities and improvements are continuously being released to enhance performance and intelligence.
Most AI tools answer questions. WASP executes tasks. Describe an objective once: the agent breaks it into steps, runs each one with real tools, monitors outcomes, and reports back. You manage nothing in between.
Your agent does the heavy lifting. Goal planning, real tool execution, 18 layers of persistent memory, self-improvement, and 27 continuous background jobs, all running on infrastructure you control, with zero cloud dependencies.
Set a goal. Get results, without supervising each step. WASP generates a dependency graph (DAG), executes each node with real tools, detects failures, replans automatically, and reports completion with full output.
GoalOrchestratorRemembers everything that matters, across sessions, model switches, and reboots. 11 primary layers injected into every context (episodic, semantic, procedural, visual, KG, behavioral, temporal, vector, working, goal-scoped, self-model) + 7 auxiliary layers running in background (learning examples, dream log, recovery memory, skill patterns, entity states, predictions, capabilities), all persisted in PostgreSQL and Redis.
MemoryManagerYour agent browses the web, runs code, sends emails, searches the internet, captures screenshots, and more, using real tool execution, not text simulation. Custom Python skills can be defined at runtime through natural language, and persist automatically across reboots.
SkillRegistryWhen a task is too complex for one agent, WASP creates a team. Named sub-agents run with isolated memory namespaces and capability sandboxes. MetaSupervisor decomposes complex goals into coordinated agent teams, automatically, without manual wiring.
AgentOrchestratorWASP can fix and improve itself, with no reboot, no redeploy. The agent reads, proposes, applies, and rolls back patches to its own source code at runtime. All changes persist across container rebuilds via automatic patch replay.
self_improveWhile you're offline, WASP works. During idle periods (1–7am), it consolidates episodic memory, extracts knowledge graph entities, runs LLM reflection on recent events, and pre-fetches context, so it's sharper when you return.
DreamJobNo vendor lock-in. WASP routes each task to the best available model across Anthropic, OpenAI, Google, xAI, Mistral, DeepSeek, Moonshot, OpenRouter, and local Ollama, automatically. If a context overflows, it recovers gracefully without losing output or breaking the task.
ModelManagerWASP never truly stops. 27 background jobs run continuously: consolidating memory, monitoring the world, detecting automation opportunities, validating every response, governing resources, and evolving new capabilities. No human supervision required.
SchedulerCorrect it once. It won't make the same mistake again. WASP detects corrections in real-time, analyzes them via LLM, extracts persistent behavioral rules, and injects them into every future prompt automatically. The agent that served you yesterday is already better today.
BehavioralLearnerEvery response is validated before delivery. Nine deterministic checks block hallucinations, price fabrication, grounding failures, and silent task omissions — with no LLM calls. If validation fails, a 2-attempt auto-recovery loop completes the missing work and re-validates before returning a safe, accurate answer.
ResponseValidatorWASP does not rely on AI for every step. Routing, execution, and output validation happen deterministically, with no model calls, no latency, no cost. The LLM is invoked when it genuinely adds value. Everything else runs without it.
Most requests never reach the LLM. A pre-model heuristic classifier with 13 fast-path patterns handles task lists, reminders, agent CRUD, model switching, and more, bypassing the model entirely. GOAL and SCHEDULED_TASK route directly to their handlers with zero tokens consumed.
5 strategies · direct dispatchSkills run through a deterministic validation layer before the model ever sees the result. Template validation, weighted scoring, output completeness checks, and safety validation, all without a model call. Validated successes are cached for reuse.
pre-LLM · zero model callsThe agent builds on what it already knows, not rediscovering it from scratch. Procedural memory, behavioral rules, and recovery patterns are injected into context. The model skips reasoning it has already done, reducing round count and avoiding redundant calls.
procedural · behavioral · recoveryOutput quality is verified without invoking the model again. After each LLM response, the Response Validator checks grounding, completeness, and drift, deterministically, with zero model calls. Auto-recovery runs only on confirmed failure, and only then calls the model a second time.
grounding · drift · completenessMultiple tools run at once, not one at a time. A single LLM round can trigger concurrent tool calls via <parallel> blocks and asyncio.gather(). Fewer round-trips means fewer total model invocations for the same outcome.
Every LLM call moves execution forward, nothing is wasted. Rather than looping the model until something works, WASP uses structured DAG planning with a Plan Critic, budget enforcement (max 3 replans), and goal stability checks. Each invocation has a defined purpose and a defined exit.
DAG · budget · stabilityRuns on any VPS or server with a single docker compose up. Each container has a single responsibility. Together, they form a complete autonomous cognitive system with zero cloud dependencies.
The brain. Goal orchestration, skill execution, 18-layer memory, LLM routing, 27 background jobs, response validation, and the web dashboard on port 8080, all in a single container.
FastAPI · Quart · SQLAlchemyThe nervous system. Event bus via Redis Streams with consumer groups, handling real-time event delivery, state caching, ephemeral flags, circuit breaker state, and working memory.
Redis 7Long-term memory. 21 tables store everything that persists: episodic conversations, semantic facts, knowledge graph, procedural memory, behavioral rules, audit logs, world timeline, recovery patterns, and more.
PostgreSQL 16Reverse proxy with HTTPS termination. Handles TLS certificates, static assets, and routes all HTTP traffic to agent-core with zero configuration.
NginxPolling bridge translating Telegram messages to Redis Stream events. Handles text, photos, videos, voice messages, and documents bidirectionally.
python-telegram-botPrivileged root sidecar with Docker socket access. Exposes a strict allowlist of Docker operations (list/start/stop/logs/inspect) to the core container, with privileged access and contained scope.
Root sidecarWASP doesn't just store data. It builds a structured understanding over time. Every conversation, correction, and outcome feeds into 18 dedicated memory layers. 11 inject directly into every context turn. 7 run continuously in background, enriching the agent's knowledge without interrupting execution. All of it persists across sessions, reboots, and model switches.
Full conversation history in PostgreSQL with timestamps, chat IDs, and model metadata. Complete recall across all sessions.
Distilled facts, preferences, and important context extracted from conversations and stored as structured knowledge entries.
Multi-step solutions abstracted into named procedures with trigger keywords. Injected as few-shot hints for similar future tasks.
Screenshots automatically indexed with metadata and descriptions. Enables cross-session visual reference without re-capturing.
Rule-based entity and relationship extraction after every message. PostgreSQL-backed graph with Redis cache, injected per chat.
User corrections trigger LLM rule extraction. Rules persist and inject into every future system prompt automatically.
World timeline tracking: crypto prices, user state changes, entity states. Trend detection and automated change alerts.
Dense embeddings in PostgreSQL JSONB. Cosine similarity search across all memory types. No external vector DB required.
Redis-backed ephemeral state: active goals, reminders, subscriptions, task queues, dream state, CPI flags, budget tracking.
Observations isolated per active goal. Prevents cross-goal context pollution and injects only what's relevant to the current objective.
Redis-backed living self-model: skill success rates, known failures, strengths, and per-domain confidence (0–1). Injected into every prompt so the agent knows what it knows.
7 additional memory systems that run independently, enriching context without interrupting the main execution loop.
Positive and negative few-shot training examples captured from user feedback. Injected as in-context examples to reinforce correct behavior patterns going forward.
Records from nightly offline consolidation sessions (DreamJob). Captures what was synthesized, promoted, and reflected during idle periods, never lost between runs.
Redis FIFO store (50 entries, 7-day TTL) of validated recovery patterns. Only successful recoveries are stored, with no noise. Reused automatically when a similar failure is detected. v2.2
SkillEvolutionEngine tracking of tool usage patterns, success rates, and composition opportunities. Drives autonomous capability evolution. New skills are generated from detected gaps.
WorldModel structured tracking of real-world entities: crypto prices, user-defined monitored values, external service states. Updated by background perception and world model jobs.
TemporalReasoner trend summaries and directional predictions from historical entity data. Used to anticipate changes before they happen and trigger proactive alerts.
LLM insight after every goal completion or failure. Max 3 reflections per goal, 7-day TTL. Injected into future context so the agent learns what worked and what to avoid.
Every skill is real tool execution, not a text response. Each is policy-gated across 5 capability tiers, audit-logged with automatic secret redaction, and composable with other skills. Custom Python skills can be created at runtime through natural language, loaded immediately, and persisted across reboots.
WASP doesn't wait for your input to be useful. These 27 jobs run continuously in the background, monitoring the world, consolidating memory, detecting automation opportunities, validating outputs, and evolving capabilities. Your agent stays sharp around the clock. Every job runs independently, with no single point of failure.
Every interaction makes WASP better, not just for the current session, but permanently. Five systems let WASP detect its own gaps, seize automation opportunities, learn from every outcome, validate its own responses, and govern its resources, all without human intervention.
If WASP can't do something, it learns how. When repeated failures or self-reflection detect a missing capability, CEE generates new tool code via LLM, validates it through a sandbox (AST parse + security blocklist + structural checks), and registers it into the live SkillRegistry, all at runtime, with no reboot needed.
WASP notices what you repeat and offers to automate it. Every 2 hours, it scans episodic memory for patterns (3+ occurrences in 24h). When a clear opportunity is found, it surfaces the suggestion via Telegram, before you think to ask. Rate-limited with 48h dedup to avoid noise.
WASP learns from every goal, whether a success or a failure. After each completion, it asks the LLM for a learning insight: what worked, what failed, what to do differently. Up to 3 reflections per goal are stored in Redis and the most relevant ones injected into every future system prompt.
Every response is verified before it reaches you. A deterministic validator checks grounding, completeness, and drift. Zero LLM calls, zero false positives. On confirmed failure: 2-retry auto-recovery with real skill execution. Successful recoveries are stored in RecoveryMemory and reused automatically next time the same failure pattern appears.
Autonomy without guardrails is a liability. Redis-backed rate limiting applies across every creation path: goals, agents, tasks, and LLM calls, per user, per minute, per hour. Graceful degradation on Redis failure (fail-open, never blocks). Prevents runaway autonomy while preserving full responsiveness under normal load.
You're never tied to one provider. Switch at runtime with a single command, or just ask WASP to switch. Task-based routing picks the right model for each job automatically. If context overflows, it recovers without losing output or breaking execution.
You're not running an experiment. A 21-finding production security audit identified and fixed path traversal, SSRF, prompt injection, and CSRF vulnerabilities, all with regression tests. WASP is production-hardened by design.
Argon2 password hashing, Redis-backed sessions with 24h TTL, 5-attempt rate limit with 5-minute lockout, full login audit logging.
Single-use tokens bound to authenticated sessions. Header-based X-CSRF-Token validation on all mutating requests. Anonymous session bypass fully blocked.
os.path.realpath() throughout self_improve and skill evolution. Symlink traversal blocked. Strict containment with os.sep suffix checks.
http_request skill blocks RFC-1918 private IPs, loopback, link-local, and cloud metadata endpoints (169.254.169.254) with full DNS resolution.
AST validation of all LLM-generated skill code. Blocks subprocess, eval, exec, ctypes, pickle, importlib. No class definition = auto-rejected.
All audit log entries auto-redacted for API keys: OpenAI sk-, Anthropic sk-ant-, Google AIza, Stripe sk_live, HuggingFace hf_, Bearer tokens, and more.
5 tiers: SAFE / MONITORED / CONTROLLED / RESTRICTED / PRIVILEGED. Agent-level sandbox allowlists. All skill executions audit-logged with redacted summaries.
All app containers run as UID 1000 (non-root). Docker socket access only via agent-broker allowlist proxy. Privileged operations strictly gated and logged.
Everything you need to know about running a self-hosted autonomous AI system.
docker compose up. All data, including memory, goals, audit logs, and behavioral rules, stays on your machines. You connect your own LLM API keys. No vendor accesses your conversations or agent state.
One VPS or local machine, one docker compose up, and your agent is live in minutes with full cognitive autonomy.
You are not just using an AI. You are operating one.