This is a deep technical analysis of learn-claude-code.

📋 The report on the right includes:

Product — positioning, core features & user journeys
Assessment — architecture, tech stack & implementation
Assets — APIs, data models & key modules

💡 You can:

Copy Prompt to send key insights to your AI coding assistant
Bookmark the project card to your workspace
Ask any follow-up questions below to dive deeper

AI-Generated • Verify Details

Knowledge Base

Code-to-Docs

shareAI-lab/learn-claude-code

@e57ced7 · en

How shareAI-lab/learn-claude-code Works

This project sits between toy agent demos and production coding agents. Compared with one-shot examples that only show basic tool calling, it is much more complete because it progressively adds planning, skill loading, context management, persistent tasks, background execution, team coordination, and isolated work lanes. Compared with full commercial agent products, it is intentionally smaller and openly pedagogical: it exposes the core runtime pattern, keeps each mechanism separable, and documents the trade-offs instead of hiding them behind a polished product surface. Its main competitive advantage is not enterprise readiness, but the clarity of its staged architecture and the fact that every new capability is introduced without breaking the core loop mental model.

Overview

A teaching-oriented nano coding agent project that helps developers and AI product builders understand, from first principles, how a Claude Code-like agent evolves from a single tool loop into a planning, memory-aware, team-capable, and optionally isolated execution system.

How It Works: End-to-End Flows

Developer runs a coding task through the autonomous agent loop

A developer starts with a plain-language coding request, such as inspecting files, making a change, or running checks. The system appends that request to the conversation and sends the current context plus available tools to the model. If the model can answer directly, the session ends. More often, it asks to read files, execute commands, update a plan, load a skill, or launch a focused subtask. The runtime executes those requests, wraps the results back into the conversation, and immediately lets the model continue reasoning without waiting for another human turn. Along the way, the system enforces workspace-safe file access, blocks a small set of dangerous shell patterns, truncates oversized tool output, and may inject planning reminders or compact older context when the session grows too large. The flow ends when the model stops requesting tools and returns a final text answer to the user.

User submits a coding request
Model evaluates the request and decides whether to answer directly or use tools
Runtime executes requested file, shell, planning, skill, or subtask actions within guardrails
Tool results are returned into the conversation and the loop continues automatically
Long sessions trigger reminders or context reduction before the next reasoning turn
Model finishes by returning a direct textual outcome

Agent breaks a large goal into a durable task graph and keeps working while slow jobs run

When the requested job is too large for one continuous chain of thought, the agent first decomposes it into named tasks and records the dependency structure on disk. That external task board gives the model a durable execution plan that survives context compression better than ordinary conversation. As work proceeds, the agent can complete prerequisite tasks and automatically unblock downstream tasks. If one step requires a slow operation such as installing dependencies or running a build, the agent launches it in the background and immediately continues with other analysis instead of freezing the entire session. Before each new reasoning turn, the runtime checks whether any background jobs have finished and injects those results back into the conversation so the model can react naturally. The user experiences this as a more resilient workflow: planning remains visible, long-running operations do not stall progress, and asynchronous completions arrive back in context at the right moment.

Agent decomposes the goal into persisted tasks and dependency links
Agent works on ready tasks while blocked tasks wait for prerequisites
Agent launches a slow command as a background job and receives an immediate job identifier
Background completion is injected into the next reasoning cycle without manual polling
Agent updates task state, unblocks dependents, and moves the workflow forward

Lead agent coordinates a team to progress parallel work

When one agent is not enough, the lead creates named teammates with specific roles and lets them work across multiple cycles instead of treating them as disposable subroutines. Instructions, updates, and approvals move through personal mailboxes, so each participant can think in its own loop and react to new messages on the next cycle. For sensitive actions, the team uses a request-response protocol with correlation identifiers, allowing the lead to approve plans or ask a worker to shut down gracefully. Once workers become idle, they can scan the shared task board, claim ready work for themselves, and resume execution without waiting for explicit assignment. This produces a lightweight but legible collaboration model: ownership is visible, delegation is asynchronous, and the lead focuses on coordination rather than handholding every task. The flow is especially useful when several coding threads must advance at once but still need some shared rules.

Lead creates or revives a named specialist teammate
Lead and teammates exchange instructions and updates through mailbox communication
Risky plans or shutdown requests are handled through correlated approvals
Idle teammates scan the task board and claim unblocked work on their own
Team members continue work and report progress back through the same coordination channels

Operator isolates a risky task into its own delivery lane

For parallel or high-risk code changes, the operator can bind a task to its own isolated work lane instead of letting every action happen in one shared directory. The flow starts by selecting a task and requesting a validated lane name. The system confirms repository availability, creates a new git-based working directory for that lane, records the lane in a local index, and binds it back to the task so status and ownership remain visible. Commands executed for that task then run only inside the isolated lane, reducing cross-task collisions. When work is complete, the operator makes an explicit closeout choice: preserve the lane for later inspection or remove it and optionally mark the task finished. Lifecycle events are recorded along the way so the operator can inspect what happened after the fact. The value of the flow is safer parallel delivery with a very concrete mapping between task ownership and filesystem isolation.

Operator selects a task and requests an isolated work lane
System validates repository state, creates the lane, and binds it to the task
Commands for that task run only inside the isolated lane
Operator chooses to keep or remove the lane and optionally completes the task
Recent lifecycle events are reviewed for visibility and troubleshooting

Learner studies the 12-session curriculum through the web platform

A learner enters the web platform in English, Chinese, or Japanese and uses the persistent sidebar to move through the 12-session progression. On a session page, the product presents a structured deep-dive with tutorial content, architecture views, source browsing, and a step-by-step simulator that explains how that stage of the agent behaves. If the learner wants a broader mental model, they can switch to the layer view to understand whether a session belongs to tools, planning, memory, concurrency, or collaboration. If they want to understand the exact evolution of the system, they can open the comparison workspace, pick two versions, and inspect line-count growth, new tools, architecture deltas, and code differences side by side. The result is not just documentation but a guided reverse-engineering environment where code, diagrams, and narrative stay aligned closely enough to support true understanding rather than passive reading.

Learner enters the site in a selected language and navigates the curriculum
Learner opens a session and explores tutorial, simulation, code, and architecture views
Learner uses layer framing to understand the conceptual category of that session
Learner compares two versions to inspect structural deltas and source changes

Key Features

Core agent execution and safety boundaries

This module defines the irreducible runtime of the product: a persistent loop in which the model reasons, requests tools, receives results, and continues until it can answer directly. The design strategy is to keep the central loop almost unchanged while layering new abilities around it, so learners can see which parts are truly fundamental and which are extensions. The system also puts strict guardrails around file access and command execution, making the teaching agent usable for local coding tasks without allowing obvious workspace escape or destructive host operations. The product value is clarity and reliability: users get a visible, reusable agent pattern rather than a monolithic black box, though the safety model remains intentionally minimal rather than enterprise-grade.

Persistent tool-use execution loop — 【User value】 Users need an agent that can finish multi-step tasks without waiting for a human after every action. A single model reply is not enough for coding work, because file reading, command execution, and follow-up reasoning must happen in sequence. 【Design strategy】 The product treats tool use as the default continuation path, not as a special case. The model keeps receiving the latest conversation state, asks for one or more tools when needed, gets structured results back, and immediately continues reasoning until it can produce a final answer. 【Business logic】 Step 1: The user sends an initial instruction, which is appended to the conversation history. Step 2: The system sends the full history plus available tools to the model. Step 3: If the model indicates that it wants to use tools, the runtime scans every requested tool action in that reply, executes each matching handler, and captures the output. Step 4: Each tool result is wrapped as structured feedback and appended back into the conversation as the next user-side turn. Step 5: The system automatically starts another model turn without waiting for a real human message. Step 6: The loop only stops when the model no longer asks for tools and instead returns a direct text answer. Step 7: To prevent oversized feedback from destabilizing the loop, each tool output is truncated to 50,000 characters before being returned. 【Trade-off】 This design makes the agent autonomous for local multi-step work and keeps the core mental model extremely simple. The trade-off is that all higher-order behavior still depends on model judgment, so poor tool choices or repeated loops are possible without additional policy layers.
Extensible capability registration — 【Design strategy】 The product avoids redesigning the runtime every time a new capability is added. Instead, each new ability is treated as one more named action that can be registered into the same execution loop. 【Business logic】 Step 1: The runtime exposes a catalog of tool names and their input schemas to the model. Step 2: When the model requests a tool by name, the system looks up the corresponding handler in a centralized dispatch map. Step 3: If a matching handler exists, it is executed with the requested inputs. Step 4: Its output is normalized into the same structured result format used by every other tool. Step 5: If the tool name is unknown, the system returns a clear error instead of silently failing. Step 6: Because the loop itself does not change, adding a new tool means defining its schema and registering one handler, while keeping all conversation semantics intact. 【Trade-off】 The benefit is strong conceptual simplicity and low extension cost. The limitation is that all tools must conform to the same request-response shape, which is excellent for teaching but less expressive than richer workflow orchestration systems.
Workspace-scoped file and command safety — 【User value】 Users want the agent to edit project files and run commands, but they do not want a teaching agent to damage the machine or modify files outside the intended working area. 【Design strategy】 The product applies two hard boundaries: all file operations must stay inside the current workspace, and shell execution rejects a small set of obviously dangerous commands while also enforcing hard time limits. 【Business logic】 Step 1: For any file read, write, or edit request, the system resolves the requested path against the current working directory. Step 2: It checks whether the resolved target still remains inside that workspace boundary. Step 3: If the target escapes the workspace, the request is rejected with an explicit error. Step 4: For shell execution, the system scans the command text for banned destructive patterns such as privileged shutdown or whole-system deletion commands. Step 5: If a banned pattern is found, execution is refused. Step 6: If the command is allowed, it is run with a fixed timeout of 120 seconds in the core tool runtime. Step 7: The resulting output is returned to the model, again with truncation limits to avoid oversized context injection. 【Trade-off】 This gives learners a concrete safety baseline and keeps experimentation practical. The downside is that pattern blocking is shallow and cannot replace real authorization, policy, or isolation in multi-user environments.

Planning, knowledge loading, and context continuity

This module addresses the main failure mode of autonomous agents: they drift, forget, or overload their context window during long tasks. The design strategy is to separate three concerns. First, the agent is nudged to externalize a plan instead of improvising indefinitely. Second, domain knowledge is loaded only when needed rather than preloaded into every session. Third, conversation history is compressed progressively so long-running work can continue without crashing on context limits. The product value is a more durable and cost-aware agent workflow, with the trade-off that some memory fidelity is sacrificed when compression and summarization replace raw history.

Structured todo planning with anti-drift reminders — 【User value】 Complex coding tasks often fail because the agent starts acting before it has broken the goal into manageable steps. Users need the agent to maintain a visible plan and update it as work progresses. 【Design strategy】 The product introduces a lightweight todo board with strict status rules, then reinforces its use by reminding the model when too many turns pass without a planning update. 【Business logic】 Step 1: The agent can create or update a todo list containing up to 20 subtasks. Step 2: The plan enforces a status model in which only one task may be marked as actively in progress at any moment. Step 3: As the conversation continues, the runtime tracks how many consecutive turns occurred without the todo tool being used. Step 4: If 3 consecutive turns pass without a planning update, the system injects a reminder telling the agent to update its todos. Step 5: The agent is expected to revise the task list as work advances, moving items from pending to in progress to completed. Step 6: Because the plan lives outside raw conversational prose, it remains more stable than informal reasoning scattered across tool outputs. 【Trade-off】 The benefit is better task discipline and lower drift on multi-step work. The cost is additional overhead: the model must spend some turns maintaining explicit state rather than only pushing execution forward.
On-demand skill loading — 【User value】 Users want the agent to access specialized workflows such as code review or building a protocol server, but loading every domain manual into the initial prompt would waste context and raise token costs. 【Design strategy】 The product uses a two-layer knowledge exposure model: short skill descriptions are always visible, while full skill bodies are only loaded when the model explicitly asks for them. 【Business logic】 Step 1: The system prompt contains a lightweight catalog of available skills, each identified by name and short description. Step 2: When the model detects that a user request needs one of these domain playbooks, it calls the skill-loading tool with the selected skill name. Step 3: The runtime looks for a matching skill document in the skill library. Step 4: If found, it returns the full skill body as structured injected content, wrapped as a named skill block so the model can distinguish it from ordinary conversation. Step 5: If no such skill exists, the system returns an explicit unknown-skill error and the model must continue using general reasoning. Step 6: After the skill body is injected, the model re-enters the normal tool-use loop and can apply the newly loaded guidance to subsequent file, shell, or planning actions. Step 7: Full skill bodies are expected to be much larger than their descriptions, so this pattern shifts the heavy token cost to only the sessions that truly need it. 【Trade-off】 The gain is lower baseline prompt size and higher relevance. The cost is one extra decision step: the model must recognize when a skill is worth loading, and unknown or malformed skill files degrade gracefully only through error strings.
Subtask delegation with isolated context — 【User value】 Some work is noisy, exploratory, or only tangentially relevant to the main conversation. Users need a way to let the agent investigate without polluting the primary thread with every intermediate tool call. 【Design strategy】 The product treats a subtask as a temporary child agent with its own fresh context, its own tool permissions, and a bounded execution budget. The parent receives only the final summary. 【Business logic】 Step 1: The parent agent invokes the subtask tool with a description, a prompt, and a child role type such as exploration, coding, or planning. Step 2: The runtime creates a clean child conversation history rather than inheriting the parent transcript. Step 3: It assigns the child a role-specific system instruction and a filtered tool set, including read-only or fuller access depending on the child type. Step 4: The child runs the same core tool-use loop as the parent, but with a safety cap of 30 iterations. Step 5: The child is not allowed to spawn further subtasks, preventing recursive delegation loops. Step 6: When the child stops, the runtime extracts its final text answer and returns only that summary to the parent. Step 7: If the child ends without usable text, the parent receives a fallback message stating that no text was returned. 【Trade-off】 This preserves cleanliness in the main conversation and makes focused exploration easier. The trade-off is information loss: only the summary comes back, so important nuance can disappear if the child condenses poorly.
Progressive context compaction — 【User value】 Long-running coding sessions eventually hit context limits. Without intervention, the agent either crashes on request size or becomes too expensive to operate. 【Design strategy】 The product uses a three-stage memory reduction strategy. Lightweight historical tool outputs are compressed first, full transcript summarization happens only after a threshold, and manual compaction remains available when the operator wants explicit control. 【Business logic】 Step 1: On every turn, the system applies micro-compaction to older tool-result blocks, preserving only the most recent 3 loops in full detail and replacing older ones with minimal placeholders that note which tool had been used. Step 2: The system continuously estimates conversation size using a rough character-based token heuristic. Step 3: When the estimated size exceeds 50,000, it triggers automatic compaction. Step 4: Before shrinking the live history, it writes the full conversation transcript to disk in a transcript archive directory for later inspection or recovery. Step 5: It then asks the model to summarize the accumulated history and replaces the active conversation with that compressed summary. Step 6: The session continues from the summarized state instead of the original full transcript. Step 7: A manual compaction path can trigger the same summarization process on demand. 【Trade-off】 The value is continued operation across very long sessions. The cost is fidelity loss: older tool outputs are reduced to placeholders, and summarized memory depends on the model preserving the right details. The current token estimate is also intentionally naive, which can trigger compaction too early or too late.

Persistent task orchestration and background work

This module turns one conversation into a durable work process. Instead of relying purely on in-memory dialogue state, the agent can break a goal into persisted tasks with dependencies and can launch slow operations without blocking its own reasoning. The design strategy is to externalize workflow state onto disk and re-inject asynchronous completion signals back into the agent at the right moment. The product value is continuity and responsiveness: plans survive compression and restarts better than raw chat alone, and the agent can continue thinking while builds or installs are running. The main trade-off is that local file persistence and basic threading are easy to inspect but not robust enough for high-concurrency production workloads.

Dependency-aware task board — 【User value】 Users need the agent to manage large goals as an ordered set of steps, not as a single undifferentiated objective. The task state should survive context compaction and be inspectable outside the live conversation. 【Design strategy】 The product persists each task as its own disk record, including dependency links that describe what must finish first and what becomes unblocked afterward. 【Business logic】 Step 1: The agent creates tasks one by one, each with a unique identifier, a subject, a description, and a status. Step 2: Tasks can declare dependency relationships by listing the tasks that block them and the tasks they themselves block. Step 3: A downstream task remains blocked while its prerequisite task identifiers still appear in its dependency list. Step 4: When a task is updated to completed, the system scans the stored task board and removes that completed task from other tasks' blocking lists. Step 5: Any task whose blocking list becomes empty is now effectively ready for work. Step 6: The agent can list all tasks, inspect a single task, or update task fields over time. Step 7: Because tasks are persisted as separate disk records, the work graph remains available even if the conversation is compressed. 【Trade-off】 This creates a durable execution plan and supports later multi-agent collaboration. The downside is that file-based updates do not include strong concurrency guarantees, so simultaneous writers can corrupt task state under heavier coordination loads.
Non-blocking background command execution — 【User value】 Long-running commands such as package installation, test suites, or builds can stall the agent if they occupy the only active execution thread. Users need the agent to stay responsive while waiting. 【Design strategy】 Slow operations are launched as background jobs that immediately return a task identifier, while completion status is captured later and delivered back into the next reasoning cycle. 【Business logic】 Step 1: When the agent identifies a potentially slow shell command, it starts it through the background execution tool rather than the foreground shell tool. Step 2: The system immediately returns a unique job identifier so the agent can move on to other work. Step 3: The command continues in a background worker with a hard timeout of 300 seconds. Step 4: Standard output and error output are captured and truncated if they become too large. Step 5: When the command finishes, times out, or errors, the result is placed into a shared notification queue. Step 6: The agent does not need to manually poll; it can continue solving other parts of the task while the background job runs. 【Trade-off】 The key gain is responsiveness and parallel progress within a single session. The limitation is that resource governance is minimal: there are no global quotas or multi-tenant scheduling controls.
Automatic injection of background completions — 【User value】 Background work only helps if the agent naturally becomes aware when it finishes. Otherwise the model must remember to poll, which adds friction and failure risk. 【Design strategy】 The system treats background completions as a first-class conversational event and inserts them into the next reasoning cycle before the model thinks again. 【Business logic】 Step 1: Before every new model call, the runtime checks whether the background notification queue contains finished job results. Step 2: If one or more results exist, it drains the queue in chronological order. Step 3: The system formats each completion with its job identifier, terminal status, and truncated logs. Step 4: These updates are injected into the conversation as a new user-side message dedicated to background results. Step 5: To keep turn-taking valid for the model API, the runtime immediately follows this with a minimal assistant acknowledgment. Step 6: The next model call therefore starts with the agent already aware that previous long-running work has completed and can react accordingly. 【Trade-off】 This removes manual polling and makes async execution feel native. The trade-off is that completions are only surfaced at the next loop opportunity, so there is no true push UI or real-time interrupt mechanism.

Multi-agent teamwork and isolated delivery lanes

This module expands the product from one agent into a small coordinated team. The design strategy evolves in stages: persistent named teammates, asynchronous mailboxes, request-response protocols for approvals, self-claiming workers that pull from a shared task board, and finally separate git work lanes tied to tasks. The business value is clear delegation and safer parallel progress without requiring the lead agent to micromanage every step. The main trade-off is that coordination remains intentionally local and lightweight, so it is highly understandable for learning but not durable enough for hosted or adversarial environments.

Persistent specialist teammates — 【User value】 Users often want one agent to delegate work to named specialists instead of repeatedly creating disposable helpers that forget their role after each subtask. 【Design strategy】 The product maintains a visible teammate roster with persistent identity, role, and status so delegation feels like managing coworkers rather than launching anonymous child runs. 【Business logic】 Step 1: The lead creates or revives a teammate by providing a name, a role, and an initial prompt. Step 2: If a teammate with the same name already exists and is still actively working, the system rejects duplicate spawning to avoid identity collisions. Step 3: If the existing teammate is idle or already shut down, it may be restarted under the same identity. Step 4: The system persists the roster entry, marks the member as working, and starts that teammate's loop. Step 5: Each teammate receives a role-specific system instruction, keeps its own local conversation history, checks its inbox before each reasoning turn, and uses the same core tool-use pattern to act. Step 6: When normal work for that cycle ends, status returns to idle; a confirmed shutdown path instead marks the member as shut down. 【Trade-off】 This creates clearer ownership and reusable specialization. The downside is that teammates live only within one local process, so they are not durable worker services.
Asynchronous mailbox communication — 【User value】 A team agent setup needs communication that does not force every participant into the same active conversation. Users need a simple way to send instructions, updates, and broadcasts across workers. 【Design strategy】 The product gives each participant a personal mailbox on disk. Messages are appended asynchronously and consumed by the recipient before its next reasoning cycle. 【Business logic】 Step 1: A sender chooses a recipient and sends a message containing sender identity, content, timestamp, and any extra protocol fields. Step 2: The system appends that message to the recipient's mailbox file. Step 3: A broadcast path repeats this for all teammates except the sender. Step 4: Before each model call, the recipient checks its mailbox and reads all queued entries. Step 5: If messages exist, they are injected into the conversation as user-side content so the model reacts on its next cycle. Step 6: After reading, the mailbox is cleared, meaning each message is consumed once rather than replayed indefinitely. 【Trade-off】 The benefit is a very inspectable and low-complexity async communication layer. The cost is fragility: messages are destructive on read, have no retry or replay path, and can be lost if a process crashes after draining the inbox.
Approval and shutdown negotiation — 【User value】 Delegated work needs control points. Users may want a teammate to stop gracefully or ask the lead to review a risky execution plan before proceeding. 【Design strategy】 The product uses one shared request-response pattern built around correlation identifiers so multiple protocol types can follow the same lifecycle: pending, approved, or rejected. 【Business logic】 Step 1: When the lead wants a teammate to stop, it creates a short request identifier and records a pending shutdown request. Step 2: The request is mailed to the target teammate. Step 3: The teammate answers with the same request identifier and an approval decision. Step 4: The system updates the stored request state to approved or rejected. Step 5: If approved, the teammate exits its loop and its final status becomes shut down. Step 6: A similar pattern supports plan approval: a teammate submits a proposed plan with a new request identifier, and the lead later approves or rejects it, optionally with feedback. Step 7: If a request identifier is unknown, the system returns an explicit error rather than pretending the workflow succeeded. 【Trade-off】 This gives the team a shared interaction grammar for sensitive decisions. The main weakness is durability: approval state is tracked in memory rather than persisted, so outstanding negotiations disappear on restart.
Self-claiming autonomous workers — 【User value】 Lead agents become a bottleneck if they must manually assign every task. Users need teammates that can pick up ready work on their own. 【Design strategy】 When teammates go idle, they temporarily switch into a watch mode that first checks direct messages, then scans the shared task board for work that is ready, unowned, and unblocked. 【Business logic】 Step 1: After a work burst, a teammate can declare itself idle. Step 2: The system marks that member as idle and starts a polling window that lasts up to 60 seconds. Step 3: Every 5 seconds, the teammate first checks its inbox; if a direct message arrives, it immediately resumes work from that message. Step 4: If no message is waiting, the system scans the task board for the first task whose status is pending, whose owner is empty, and whose blocking list is empty. Step 5: Under a claim lock, it assigns that task to the teammate and changes the task status to in progress. Step 6: The system injects an auto-claimed task prompt, optionally reasserting teammate identity if the recent conversation history is too short. Step 7: If no message and no eligible task appears before the timeout window ends, the teammate shuts down. 【Trade-off】 The benefit is lower coordination overhead and a more self-organizing team. The trade-off is simplistic scheduling: task selection is first-match rather than priority-based, and fairness is not addressed.
Task-bound isolated work lanes — 【User value】 Parallel coding work becomes risky when multiple agents share one directory. Users need a way to isolate changes per task while keeping ownership visible. 【Design strategy】 The product binds tasks to named git work lanes, so each risky or parallel effort can run in its own directory while still remaining linked to task state. 【Business logic】 Step 1: The operator creates or selects a task and optionally requests a dedicated isolated lane for it. Step 2: The system validates the lane name and confirms that the repository and task exist when task binding is requested. Step 3: It creates a new git work lane under a dedicated directory, based on the chosen reference branch. Step 4: The work lane is recorded in an index, and if tied to a task, that task is bound to the lane and can be promoted from pending to in progress. Step 5: Commands executed for that lane run only inside the corresponding isolated directory and still inherit timeout and dangerous-command restrictions. Step 6: When work is finished, the operator explicitly chooses one of two closeout paths: keep the lane for later or remove it. Step 7: Removal can optionally mark the linked task completed and clear the binding. 【Trade-off】 This reduces file collision risk and makes parallel ownership legible. The limitation is that it depends on a real local git repository and is only partially integrated with the earlier teammate runtime, so isolation is not automatic across all team scenarios.
Lightweight lifecycle visibility for task and work lanes — 【Design strategy】 To make local multi-task delivery understandable, the product records important task and work-lane events in an append-only local event stream rather than requiring external observability tooling. 【Business logic】 Step 1: Before and after key lifecycle actions such as creating or removing a work lane, the system writes an event with timestamp and relevant payload. Step 2: Failure events also include error context. Step 3: If removing a work lane also completes its linked task, that completion is written as a separate event. Step 4: Operators can request the recent event history, with a default small window and a hard cap of 200 events to keep inspection bounded. Step 5: These logs give users a chronological picture of what happened to task bindings and isolated work lanes. 【Trade-off】 The gain is simple local observability and post-hoc inspection. The trade-off is that this is only retrospective logging; there are no alerts, dashboards, or retention policies.

Interactive learning interface and curriculum navigation

This module packages the runtime into a productized learning experience rather than leaving it as a code repository alone. The design strategy is to convert the 12 sessions into a structured, locale-aware curriculum with persistent navigation, layered mental models, and per-session deep dives. Instead of asking learners to inspect code files manually, the platform lets them move through the progression, compare versions, and inspect architecture from multiple angles. The product value is educational usability and global accessibility, with the trade-off that the content structure is partly hardcoded and depends on prebuilt metadata.

Locale-aware learning navigation — 【User value】 Learners need to explore the curriculum in their preferred language without losing navigation continuity across pages. 【Design strategy】 The product organizes all learning routes under a locale-aware shell and keeps a persistent navigation frame around session pages. 【Business logic】 Step 1: The user enters the learning site under a selected locale, currently English, Chinese, or Japanese. Step 2: The root layout initializes language context and shared page chrome. Step 3: Learning pages share a dedicated layout that keeps the sidebar visible while the main content changes. Step 4: When the user opens a specific session, the system resolves its previous and next sessions to support linear curriculum movement. Step 5: Navigation therefore works both as a structured course path and as a persistent reference browser. 【Trade-off】 This makes the experience smoother and more accessible across regions. The cost is that supported locales and session ordering are pre-generated rather than dynamically discovered.
Architecture layer framing — 【User value】 Learners can struggle to understand whether a session is about tools, planning, memory, concurrency, or collaboration. They need a higher-level map, not just a chronological list. 【Design strategy】 The product groups sessions into five conceptual layers and visually tags each session with its layer so users can learn both linearly and by theme. 【Business logic】 Step 1: Every session is assigned to one of five predefined layers: tools, planning, memory, concurrency, or collaboration. Step 2: Each layer has a consistent visual identity, including a color treatment used throughout the interface. Step 3: A dedicated layer view aggregates sessions by these categories so learners can see how the stack grows over time. Step 4: Session badges and metadata repeat this categorization in navigation and detail views, reinforcing the mental model. 【Trade-off】 The benefit is stronger conceptual scaffolding. The trade-off is rigidity: adding new layers or changing taxonomy requires manual updates.
Cross-version comparison workspace — 【User value】 The core promise of the project is progressive learning. Users need to see exactly what changed between two stages, not just read that a feature was added. 【Design strategy】 The product turns version comparison into a first-class learning surface, combining structural deltas, tool additions, architecture diagrams, and source diffs in one place. 【Business logic】 Step 1: The learner selects any two session versions to compare. Step 2: The system loads precomputed metadata for both versions. Step 3: It calculates differences such as line-count change, newly introduced tools, and newly introduced classes or functions. Step 4: The interface renders side-by-side architecture views and highlights what is new in the later version. Step 5: A source diff view lets the learner inspect implementation changes directly. Step 6: Because the comparison is metadata-driven, the page can present analytical summaries without running the Python runtime live. 【Trade-off】 This sharply improves pattern recognition across sessions. The limitation is that comparison quality depends on the correctness of the pre-extracted metadata pipeline.
Session-specific deep-dive views — 【Design strategy】 A single page is not enough for each session because learners want different lenses: tutorial explanation, simulated execution, source browsing, and deeper architecture detail. 【Business logic】 Step 1: When a learner opens a session, the page loads that version's metadata and hands it to a rich client-side detail view. Step 2: The interface exposes multiple tabs, including learning content, simulation, code inspection, and deeper architecture explanation. Step 3: The page also renders a version-specific execution flow view showing how the logic of that stage works. Step 4: This allows one session to serve both beginners who want narrative guidance and advanced users who want implementation detail. 【Trade-off】 The value is flexible educational depth on one route. The cost is a more content-heavy interface that depends on curated metadata and version-specific assets.

Content extraction, simulation, and visual explanation

This module is the technical backbone of the web learning product. Its role is to convert source code and documentation into synchronized educational artifacts: metadata, comparison data, interactive flow diagrams, and step-by-step simulators. The design strategy is to precompute most teaching data from the repository itself so the web app stays static, fast, and aligned with the reference implementation. The product value is that the educational interface does not drift too far from the code it explains, though the extraction method remains heuristic rather than parser-grade.

Source-to-learning metadata extraction — 【User value】 Manually maintaining educational summaries for every session is slow and error-prone. Learners benefit when metrics and structural highlights stay aligned with the real code. 【Design strategy】 The product runs a prebuild extraction pass that scans source and documentation folders, derives structural metadata, and emits generated assets for the web app. 【Business logic】 Step 1: During the content build process, the extractor recursively scans the agent source and documentation directories. Step 2: It identifies top-level Python classes and functions, tracks their line positions, and counts meaningful lines of code. Step 3: It also inspects tool definitions to infer which tools are available in each version. Step 4: Across versions, it calculates deltas such as new tools, new classes, and line-count growth. Step 5: The resulting structured metadata is written into generated JSON assets that the web app can consume directly. Step 6: The comparison and detail pages rely on this generated dataset rather than repeatedly parsing the repository at runtime. 【Trade-off】 The gain is synchronized, low-latency educational data. The trade-off is extraction fragility: regex-based detection can miss symbols if formatting patterns change.
Step-by-step agent loop simulator — 【User value】 Many learners want to understand the agent loop behavior without setting up local credentials or running live model calls. A deterministic walkthrough lowers the barrier to understanding. 【Design strategy】 The product uses preauthored scenarios and a playback state machine to animate how user messages, tool calls, and results unfold turn by turn. 【Business logic】 Step 1: A learner opens a session that includes a simulator scenario. Step 2: The simulator loads the session's structured scenario file, which contains an ordered list of message and action events. Step 3: The playback controller resets to the beginning and waits for the learner to play, pause, or scrub. Step 4: When playback starts, the simulator advances through the event list at a default tempo of 1200 milliseconds divided by the selected speed factor. Step 5: As each step becomes active, the visual console reveals the corresponding message or tool interaction and scrolls to keep the newest content visible. Step 6: The learner can therefore watch the reasoning-action-observation cycle unfold without needing a live backend. 【Trade-off】 This makes the core loop much easier to grasp. The limitation is that scenarios are curated examples, not live runtime traces.
Lazy-loaded session visualizations — 【Design strategy】 Rich per-session diagrams are valuable, but loading every interactive asset up front would make the learning site slower than necessary. 【Business logic】 Step 1: Each session has its own specialized visualization asset. Step 2: The application defers loading that asset until the learner actually opens the relevant session. Step 3: While the visualization is loading, the interface shows a large placeholder skeleton to preserve layout continuity. Step 4: Once loaded, the version-specific renderer displays the corresponding architecture or execution view. Step 5: This keeps the initial bundle smaller even though the site supports many distinct interactive diagrams. 【Trade-off】 The benefit is better page performance and more scalable content packaging. The cost is a short loading delay the first time a specific visualization is opened.
Abstract execution flow diagrams — 【User value】 Reading implementation alone makes it hard to see how each session changes the agent's decision structure. Learners need a simplified process view. 【Design strategy】 The product defines session logic as visual node-and-edge flows so each architectural step can be understood as a change in control flow, not just a code diff. 【Business logic】 Step 1: For each session, the product defines a set of flow nodes such as start, process, decision, subprocess, and end. Step 2: These nodes are positioned on a consistent visual grid so the diagrams feel related across sessions. Step 3: The session page renders the relevant flow definition as an SVG-based architecture view. Step 4: Because the same visual language is reused from session to session, learners can quickly spot what structural branch, loop, or subsystem was added at each stage. 【Trade-off】 The gain is stronger mental-model learning. The cost is abstraction: diagrams intentionally omit implementation edge cases and lower-level runtime detail.

Core Technical Capabilities

Invariant agent loop that scales from one tool to a full autonomous runtime

Problem: How can a coding agent gain many new abilities without rewriting its core execution model every time? Without a stable central loop, each new mechanism would create a different mental model, making the system harder to extend, debug, and teach.

Solution: Step 1: The runtime standardizes all interaction around a repeating cycle: send conversation plus tool schemas to the model, inspect whether the reply requests tools, and either execute those tools or finish with text. Step 2: Tool requests are handled through a centralized name-to-handler registry, so extension happens by registration rather than loop redesign. Step 3: Structured tool results are fed back into the same conversation stream, allowing the model to reason over outcomes and chain actions autonomously. Step 4: The same loop pattern is reused by parent agents, subagents, and teammate agents, which is why later features can be layered while preserving conceptual continuity. The smart part is architectural restraint: the project treats the loop as a platform contract and moves complexity into tools, prompts, and lifecycle wrappers instead of constantly changing control flow.

Technologies: Anthropic Messages API, tool-calling, dispatch map

Boundaries & Risks: This works well for educational clarity and local extensibility, but it still relies heavily on model judgment. Without stronger policy, retry, and supervision layers, loops can wander, repeat, or make poor tool selections.

Token-efficient long-session continuity through layered memory reduction

Problem: How can the agent keep working across long coding sessions without hitting context-window limits or paying to resend the full transcript forever? Without compression, large tool outputs and long histories would either exceed request limits or make the runtime too expensive and fragile.

Solution: Step 1: The runtime applies a silent first layer that reduces older tool outputs to compact placeholders while keeping the most recent 3 loops intact. Step 2: It tracks approximate context size with a simple heuristic and uses 50,000 as the threshold for automatic intervention. Step 3: Before shrinking active memory, it writes the full history to disk as a transcript archive. Step 4: It then asks the model to summarize the conversation and replaces the live context with that compressed summary. Step 5: A manual compaction path gives operators direct control when they want to trigger the same mechanism earlier. The clever part is combining cheap per-turn compression with heavier summarization only when needed, while keeping a disk trail for recovery rather than fully discarding history.

Technologies: LLM summarization, JSONL transcript archiving, heuristic token estimation

Boundaries & Risks: The current token estimate is intentionally naive and can misfire. Summary quality depends on model fidelity, and micro-compaction removes detail aggressively, so this is useful for continuity but not reliable enough for exact state preservation in production.

On-demand knowledge injection that avoids bloated system prompts

Problem: How can the product expose many specialized workflows without stuffing every session with all reference material up front? If all domain knowledge is preloaded, prompt cost rises, relevance drops, and the model wastes context on skills it may never use.

Solution: Step 1: The system scans the skill library and surfaces only lightweight name-and-description entries in the base prompt. Step 2: When the model recognizes a relevant domain task, it explicitly requests the full skill body through a tool call. Step 3: The runtime loads the matching skill document and injects it back as structured content in the next loop turn. Step 4: The agent then continues execution with that newly loaded domain guidance in context. The smart design choice is separating discoverability from payload: the model always knows what skills exist, but only pays the full token cost when a skill is actually needed.

Technologies: skill library, YAML frontmatter, tool-result knowledge injection

Boundaries & Risks: This assumes the model correctly decides when to load a skill. Missing or malformed skill assets degrade through errors, and there is no evidence of a hardened production parser or permissions model around skill access.

Local durable workflow state that survives beyond the live conversation

Problem: How can an agent keep track of multi-step execution state when the chat history may be compacted, summarized, or restarted? Without externalized workflow state, long goals become fragile because the model must remember everything through conversation alone.

Solution: Step 1: The runtime stores each task as its own structured file, including status and dependency relations. Step 2: Completion events trigger dependency cleanup so downstream tasks automatically become ready. Step 3: Because the task graph lives on disk instead of only in the conversation, the agent can recover or continue planning even after memory compaction. Step 4: The same task board later becomes the coordination substrate for autonomous teammates and isolated work lanes. The clever part is using a humble local file model to solve several product problems at once: plan durability, readiness tracking, and future team coordination.

Technologies: filesystem persistence, JSON task records, dependency graph

Boundaries & Risks: This is easy to inspect and good for local learning, but file-based reads and writes are not transactional. Under concurrent agents or multiple processes, race conditions can corrupt dependencies or ownership state.

Non-blocking background execution with conversational result delivery

Problem: How can the agent run slow shell work without freezing its reasoning loop or forcing manual polling? If long commands stay in the foreground, the agent becomes unresponsive and cannot use the waiting time productively.

Solution: Step 1: Slow operations are launched in background worker threads and immediately return a job identifier. Step 2: The worker executes the command with a 300-second timeout and captures bounded output. Step 3: Completion status is placed into a shared notification queue. Step 4: Before the next model call, the runtime drains that queue and injects the results back into the conversation as if the environment had reported in. Step 5: A small acknowledgment message is inserted to satisfy turn-taking rules in the model API. The smart part is that asynchronous system events are translated into ordinary conversational context, so the model does not need a separate polling protocol.

Technologies: Python threading, subprocess timeouts, notification queue

Boundaries & Risks: This improves responsiveness in single-user local settings, but there are no global concurrency controls, per-user quotas, or resource isolation. It is not safe as-is for heavy multi-tenant workloads.

Low-overhead multi-agent coordination through inspectable local protocols

Problem: How can a single local process simulate a small agent team with delegation, updates, and approvals without introducing external queues or orchestration services? Without a coordination layer, multiple workers would either collide in one conversation or require constant lead micromanagement.

Solution: Step 1: Each teammate is given a persisted identity and its own execution loop. Step 2: Communication happens through append-only personal mailboxes on disk, which are consumed before each reasoning cycle. Step 3: Sensitive interactions such as plan approval and shutdown use short request identifiers so responses can be matched to the original request. Step 4: Idle workers can also scan the shared task board and claim unblocked work, reducing the lead's assignment burden. The clever part is that messaging, approvals, and autonomy all reuse the same simple local artifacts: mailbox files, task files, and lightweight in-memory tracking.

Technologies: JSONL mailboxes, thread-based workers, request correlation identifiers

Boundaries & Risks: The design is deliberately teaching-grade. Mailbox reads are destructive, most coordination is file-based, approvals are not durable across restarts, and there is no authentication or permission boundary between agents.

Task-linked directory isolation for safer parallel code changes

Problem: How can multiple coding efforts proceed in parallel without stepping on each other's files in one shared workspace? Without directory isolation, delegated work can create cross-task interference and unclear ownership.

Solution: Step 1: The runtime validates a named work lane and creates a dedicated git-based directory for it. Step 2: That lane is linked to a task record so execution state and filesystem state stay connected. Step 3: Commands for that lane run only inside its directory, inheriting the same timeout and command-safety checks used elsewhere. Step 4: Closeout explicitly separates keeping a lane from removing it, and lifecycle events are logged for operator inspection. The smart part is the separation of coordination state from execution surface: tasks manage intent and status, while work lanes isolate file mutations physically.

Technologies: git worktree, JSON index files, local event log

Boundaries & Risks: This requires a real local git repository and is not automatically enforced across the earlier teammate runtime. Index updates remain file-based, so concurrency robustness is limited.

Repository-driven educational content pipeline that keeps learning assets close to the code

Problem: How can an educational web product stay aligned with a fast-changing reference codebase without manually rewriting every diagram, metric, and comparison page? Without automation, the teaching surface would quickly drift from the implementation it claims to explain.

Solution: Step 1: A prebuild extraction script scans source and docs, derives structural metadata such as classes, functions, tools, and line counts, and computes version-to-version deltas. Step 2: The generated metadata is emitted as static assets for the frontend. Step 3: The web app uses these assets to power session pages, comparison views, and architecture explanations without needing live repository parsing. Step 4: Rich session-specific visualizations and simulator content are loaded only when needed, preserving frontend performance. The smart choice is using the repository itself as the content source so the learning product remains a projection of the codebase rather than a separately curated knowledge base.

Technologies: TypeScript extraction scripts, static JSON assets, lazy-loaded visualizations

Boundaries & Risks: The extraction approach relies partly on regex heuristics, so unusual code formatting can reduce accuracy. The frontend also depends on rebuilds to reflect repository changes.

Everything begins with understanding.