What is cuddlytoddly?

cuddlytoddly is an open-source Python AI agent framework that builds an explicit, editable task graph (DAG) before executing anything. It lets you inspect, edit, and redirect the plan at any point during execution. It supports Claude, OpenAI, and local llama.cpp models.

Which LLM backends does cuddlytoddly support?

cuddlytoddly supports Anthropic Claude (via pip install cuddlytoddly[claude]), OpenAI (via pip install cuddlytoddly[openai]), any OpenAI-compatible API, and fully local models via llama.cpp (via pip install cuddlytoddly[local]). Switch backends by editing one line in config.toml.

Can I run cuddlytoddly entirely offline?

Yes. Install with pip install cuddlytoddly[local] and cuddlytoddly will run Llama 3.3 70B Instruct Q4_K_M via llama.cpp on Apple Silicon, NVIDIA GPU, or CPU — no API key or internet connection required.

What happens if cuddlytoddly crashes mid-run?

Every graph mutation is appended to a JSONL event log. Restart after a crash and the system picks up exactly where it left off — completed tasks stay done, only interrupted ones re-run.

Can I edit the task plan while it is running?

Yes. The plan is live and mutable throughout execution. You can pause the LLM, edit a task's description or dependencies, promote a task into a subgoal for a finer breakdown, or switch to a different goal entirely. Only affected branches re-run — work already completed is preserved.

Is cuddlytoddly free and open source?

Yes. cuddlytoddly is MIT licensed and hosted on GitHub at https://github.com/3IVIS/cuddlytoddly. It is free to use, modify, and distribute.

cuddlytoddly — AI Agent Framework with Explicit Task Planning & Execution

Q: How is cuddlytoddly different from other AI agent frameworks?

Most LLM agents jump straight into action. cuddlytoddly builds a complete task plan as a DAG before executing anything, so you always know what it intends to do and can change that intent before, during, or after execution. It also features automatic quality-checking, crash-proof JSONL event logging, and task-to-subgoal expansion.

How it works

Plan first, execute second —
with you in control throughout

Most LLM agents jump straight into action and hope for the best. cuddlytoddly builds a complete task graph before executing anything, so you always know what it intends to do — and can change that intent before, during, or after execution.

Plan

Goal seeding

A plain-English goal is seeded into the task graph as the root node. Nothing runs yet.
Plan

Context extracted and clarified before planning begins

Before decomposing the goal, the planner reads it carefully and extracts every concrete fact you've already stated — budget, size, hard constraints, locations, roles — and pre-fills those as known context fields. It then identifies what's genuinely missing and surfaces only that for your optional input. Facts that can be fetched at runtime (market prices, public data) are never asked of you. This grounded context flows into every task in the plan.
Plan

Explicit plan built before execution

The planner decomposes the goal into a DAG of tasks — each with declared dependencies and expected outputs — using the clarified context. The raw plan passes through an optional self-review pass (the LLM critiques its own draft for completeness and realism), then structural validation and constraint checks before any node is committed. You can inspect the full plan and edit it freely before anything runs.
Control

Inspect, edit, and redirect — at any time

The plan is live and mutable throughout execution. Pause the LLM, edit a task's description or dependencies, promote a task into a subgoal for a finer breakdown, or switch to a different goal entirely. Execution resumes from the updated graph — only affected branches re-run, everything already completed stays done.
Execute

Ready tasks dispatched concurrently

The orchestrator picks up nodes whose dependencies are met and dispatches them in parallel, up to the configured worker count.
Execute

LLM + tools do the actual work

Each task runs as a multi-turn LLM loop with access to real tools: code execution, file I/O, web access, and any custom skills you've registered. Results are concrete outputs passed directly to downstream tasks, not summaries.
Execute

Results quality-checked automatically

The QualityGate compares each result against the task's declared outputs. If something is missing, a bridging task is injected and the gap is closed before downstream nodes proceed.
Control

Crash-proof state throughout

Every graph mutation is appended to a JSONL event log. Restart after a crash and the system picks up exactly where it left off — completed tasks stay done, only interrupted ones re-run. When resuming, the full token history from the prior session is restored so the UI always shows the correct cumulative count.

What you can do at any point during a run

Pause the LLM — all in-flight tasks complete, no new ones start until you resume.

Edit any task's description or dependencies — the graph updates live and only affected nodes re-run.

Promote a task to a subgoal — the planner decomposes it into a finer sub-DAG without rebuilding the whole plan.

Retry a failed node, or reset a subtree to re-execute an entire branch with updated context.

Switch to a different goal entirely — the running orchestrator stops cleanly and a new one starts from the updated state.

full loop

goal → LLMPlanner → [extract facts + clarify] → [decompose] → [scrutinize?] → [validate] → [constraints]
                                                                                                    │
                                                                                          TaskGraph  ← inspect & edit anytime
                                                                                                    │
                                                                                        Orchestrator
                                                                                        ├── LLMExecutor + tools + skills
                                                                                        └── QualityGate  (verify / bridge)
                                                                                                    │
                                                                                               EventLog (JSONL) → crash-proof replay

Demo runs

Real goals, real plans,
real output

Every run produces a standalone interactive snapshot — the full task graph with node details, results, and a replay of the plan's evolution — and an optional terminal view for headless environments. The two examples below are complete, unedited runs.

cuddlytoddly terminal UI showing the same SaaS business plan executing step by step

web ui

Goal

How to build a SaaS business

Full plan decomposition covering market research, product strategy, go-to-market, pricing, and launch sequencing — executed with web research and structured outputs at each step.

Interactive DAG snapshot

cuddlytoddly terminal UI showing the raise negotiation plan executing

web ui

Goal

How to negotiate a raise

Research-backed plan covering market rate benchmarking, timing strategy, argument construction, and conversation scripts — grounded in publicly available salary data fetched at runtime.

Interactive DAG snapshot

Interactive snapshots are fully self-contained HTML files — no server connection required. Export your own via the web UI's ↓ Export → Snapshot HTML button.

Features

Everything needed to plan
precisely and execute reliably

Explicit Plan Before Execution

The full task graph — every step, dependency, and expected output — is built and visible before the first tool is called. No action without a declared intent.

New

Grounded Clarification

Before planning, the system extracts every concrete fact already in your goal — budget, constraints, size, location — and pre-fills them as known fields. Only genuinely missing information is surfaced for input.

New

Plan Scrutiny Pass

An optional second LLM call reviews each draft plan for goal coverage, task realism, output completeness, and missing implicit steps before anything reaches the graph.

New

Constraint Enforcement

A deterministic pass after validation catches cycles, removes duplicate edges, strips orphaned inputs, and resolves ghost nodes — before any node executes.

New

Task → Subgoal Expansion

Any task can be promoted to a subgoal at any time for a finer-grained breakdown. The planner decomposes it into a sub-DAG without touching the rest of the plan.

Live Plan Editing

Edit task descriptions, add or remove dependencies, and restructure the graph while execution is running. Only affected branches re-run — completed work is always preserved.

Pause & Redirect

Pause the LLM at any point, inspect progress, make changes, then resume — or switch to a different goal entirely.

Real Tool Execution

Tasks execute with real tools: code execution, file I/O, web access. Add custom skills by dropping a SKILL.md folder into the skills directory.

Automatic Gap Bridging

The QualityGate checks each result against declared outputs. Missing pieces automatically inject a bridging task so the plan stays coherent.

Crash & Resume

Every mutation is logged to JSONL. Restart after a crash and completed tasks stay done — only interrupted work re-runs.

Multi-Backend Support

Swap between Claude, OpenAI, any compatible API, or a fully local llama.cpp model with one config line.

Terminal & Web UI

A live curses terminal for headless use and a web UI that shows the full task graph in real time, lets you edit nodes directly, and exports standalone interactive HTML snapshots.

Local Model Support

Run entirely offline with llama.cpp on Apple Silicon, NVIDIA, or CPU. Full privacy, no API costs.

Backends

One config line.
Any model.

The planning and execution layers are fully model-agnostic. Switch backends by editing [llm] backend in config.toml — every prompt, schema, and tool call works identically across all backends.

BackendInstallEnv var

claudepip install cuddlytoddly[claude]ANTHROPIC_API_KEY

openaipip install cuddlytoddly[openai]OPENAI_API_KEY

llamacpppip install cuddlytoddly[local]— (offline)

any OpenAI-compatset base_url in config.tomlprovider key

The pip extra installs the required packages. The active backend is written to config.toml on first run and auto-detected from your environment: ANTHROPIC_API_KEY set → claude, OPENAI_API_KEY set → openai, neither → llamacpp.

Local default: Llama 3.3 70B Instruct Q4_K_M — auto-detected from llama.cpp or Hugging Face cache.

Quick Start

Up and running
in three steps

Step 1 — Install with your chosen backend

terminal

pip install cuddlytoddly[claude]   # Anthropic Claude
pip install cuddlytoddly[openai]   # OpenAI / compatible
pip install cuddlytoddly[local]    # Local llama.cpp
pip install cuddlytoddly[all]      # Everything

Step 2 — Set your API key

terminal

export ANTHROPIC_API_KEY=sk-ant-...
# or OPENAI_API_KEY — whichever key is set becomes the default backend on first run

Step 3 — Run a goal

terminal

cuddlytoddly "Write a market analysis for electric scooters"

# The web UI opens automatically showing the full task plan.
# Inspect or edit it before execution starts, or just let it run.
python -c "from cuddlytoddly.config import CONFIG_PATH; print(CONFIG_PATH)"

Requires Python 3.11+ and git on your PATH (for the DAG visualiser).

Python API

Use it as a library,
not just a CLI

Every component is importable and independently configurable. Swap backends with a single argument — planning, execution, and quality-checking all work identically across Claude, OpenAI, and llama.cpp.

python

from cuddlytoddly.planning.llm_interface  import create_llm_client
from cuddlytoddly.planning.llm_planner    import LLMPlanner
from cuddlytoddly.planning.llm_executor   import LLMExecutor
from cuddlytoddly.engine.quality_gate     import QualityGate
from cuddlytoddly.engine.llm_orchestrator import Orchestrator
from cuddlytoddly.core.task_graph         import TaskGraph
from cuddlytoddly.skills.skill_loader     import SkillLoader

llm    = create_llm_client("claude", model="claude-opus-4-6")
graph  = TaskGraph()
skills = SkillLoader()

orchestrator = Orchestrator(
    graph=graph,
    planner=LLMPlanner(
        llm_client=llm, graph=graph,
        skills_summary=skills.prompt_summary,
        scrutinize_plan=True,
    ),
    executor=LLMExecutor(llm_client=llm, tool_registry=skills.registry),
    quality_gate=QualityGate(llm_client=llm, tool_registry=skills.registry),
)
orchestrator.start()

Documentation

Everything else
is in the docs

Open Source Observatories

Explore the AI agent
ecosystem

We maintain two free, searchable indexes of open-source AI agent tooling — updated daily from GitHub.

Observatory

AI Agent Frameworks

Ranked, tagged, and searchable index of open-source agent frameworks. Stars, forks, language, license, and topics — refreshed daily.

Browse frameworks ↗

Observatory

AI Agent Memory Frameworks

Classified by layer, memory class, and architecture. From production-ready personalization layers to research-stage causal graphs.

Browse memory frameworks ↗

Plan first, execute second —with you in control throughout

Goal seeding

Context extracted and clarified before planning begins

Explicit plan built before execution

Inspect, edit, and redirect — at any time

Ready tasks dispatched concurrently

LLM + tools do the actual work

Results quality-checked automatically

Crash-proof state throughout

Real goals, real plans,real output

How to build a SaaS business

How to negotiate a raise

Everything needed to planprecisely and execute reliably

Explicit Plan Before Execution

Grounded Clarification

Plan Scrutiny Pass

Constraint Enforcement

Task → Subgoal Expansion

Live Plan Editing

Pause & Redirect

Real Tool Execution

Automatic Gap Bridging

Crash & Resume

Multi-Backend Support

Terminal & Web UI

Local Model Support

One config line.Any model.

Up and runningin three steps

Use it as a library,not just a CLI

Everything elseis in the docs

Explore the AI agentecosystem

Plan first, execute second —
with you in control throughout

Real goals, real plans,
real output

Everything needed to plan
precisely and execute reliably

One config line.
Any model.

Up and running
in three steps

Use it as a library,
not just a CLI

Everything else
is in the docs

Explore the AI agent
ecosystem