Designing Lightweight AI Agents for Edge Deployment
A Minimal Capability Framework with Insights from Literature Synthesis
This chapter demonstrates how the three core MCD principles (Section 4.2: Bounded Rationality, Degeneracy Detection, Minimality by Default) manifest across system layers as concrete architectural implementation patterns—from prompt structure to deployment tier selection (Bommasani et al., 2021).
The prompt-only agent is guided by a minimal architecture pattern documented in Appendix D. This template explicitly omits orchestration layers and persistent state, conforming to the MCD Layered Model from Chapter 4 (Singh et al., 2023). The core of this instantiation is a fail-safe control loop where prompt logic serves as the decision tree (Mitchell, 2019).
This fail-safe design means that each loop iteration either terminates with a symbolic ‘exit’ state, re-prompts the user for clarification, or degrades into a predefined default behavior (Amodei et al., 2016). No persistent state is assumed between turns. This instantiation directly applies the Prompt Layer (4.3.1), and its reliance on statelessness is evaluated via the Memory Layer tests (4.6.2). Its structure is a concrete application of the Minimality by Default principle (4.2.3).
System-Wide Principle Application:
- The stateless template embodies all three MCD principles simultaneously (Strubell et al., 2019):
- Bounded Rationality: Each control loop iteration operates within fixed token budgets, preventing runaway reasoning chains
- Degeneracy Detection: The template systematically excludes orchestration layers, persistent databases, and external tool dependencies unless specific failure cases demand them
- Minimality by Default: The architecture begins with zero external dependencies, adding only essential components validated through constraint testing
In MCD, the prompt is not just a query mechanism but an executable symbolic script (Liu et al., 2023; Wei et al., 2022). It contains embedded routing logic that acts as a runtime pathway, eliminating the need for external orchestration. This is achieved through:
- Intent Identification: The prompt itself is structured to parse the user’s intent (Brown et al., 2020).
- Decision Delegation: The agent uses token patterns to route tasks. For example, it encodes decision branches as token-level cues (e.g., ‘If intent contains booking, delegate to appointment_slot_logic’) (Kojima et al., 2022).
- Task Routing: The agent uses a minimal symbolic input to trigger the correct execution path (Shinn et al., 2023).
These symbolic decisions are evaluated in Chapter 6 under the Prompt Routing test (T3) to verify their capability under compressed prompt windows (Min et al., 2022).
A sample agent prompt implementing executable routing might look like:
System: You are a lightweight stateless assistant.
User: I want to book an appointment.
Agent: [intent = 'book_appointment'] → Run booking_routine
If [specialty missing] → Ask: "What kind of doctor?"
If [time missing] → Ask: "What date or time works for you?"
Else → Confirm with minimal prompt.
This structure uses symbolic token decisions to implement stateless routing logic (Sahoo et al., 2024).
Validation Preview: This symbolic routing approach demonstrates constraint-resilience in healthcare appointment scenarios (W1), maintaining 80% success rate under standard conditions while achieving 75% performance retention under Q1 constraint pressure—compared to 40% retention for Few-Shot and 25% for conversational approaches under identical constraint conditions. T4 testing validates 96% context preservation in stateless reconstruction, confirming the effectiveness of token-level decision logic.
Beyond Prompt Engineering:
This approach represents Bounded Rationality applied to decision architecture—symbolic routing constrains computational pathways within minimal token boundaries, eliminating the need for complex orchestration layers that would violate resource constraints in edge deployment scenarios (Xu et al., 2023).
5.2.1 Domain-Specific Prompt Adaptation Patterns
The symbolic routing logic introduced above manifests differently across the three domain-specific walkthroughs in Chapter 7, revealing fundamental differences in how MCD prompts must adapt to task characteristics (Yin, 2017). Understanding these adaptation patterns clarifies when dynamic intent parsing versus deterministic rule execution is necessary under constraint-first design principles.
Dynamic Slot-Filling: Healthcare Appointment Booking (W1)
The healthcare booking agent implements dynamic slot-filling logic that adapts based on user input completeness:
MCD Structured Implementation:
Task: Extract appointment slots [doctortype, date, time]
Rules: Complete slots → "Confirmed [type, date, time]. ID [ID]"
Missing slots → "Missing [slots] for [type] appointment"
Adaptive Behavior:
Input: "I want to book an appointment" → Output: "Missing [time, date, type] for appointment" Input: "Cardiology tomorrow at 2pm" → Output: "Confirmed Cardiology, tomorrow, 2PM. ID [generated]"
This dynamic routing is necessary because natural language appointment requests vary unpredictably in information density. The prompt must conditionally identify missing slots and request specific information, requiring symbolic intent parsing at runtime (Brown et al., 2020).
Deterministic Spatial Logic: Indoor Navigation (W2)
In contrast, the navigation agent uses coordinate-based transformation rules that follow predictable spatial logic:
MCD Structured Implementation:
Navigate: Parse coordinates [start→target], identify [obstacles] Output format: Direction→Distance→Obstacles Constraints: Structured spatial logic, max 20 tokens, no explanations
Semi-Static Behavior:
Input: "Navigate from A1 to B3" → Output: "North 2m, East 1m" Input: "A1 to B3, avoid C2" → Output: "North 2m (avoid C2), East 1m"
This deterministic approach is viable because navigation operates on structured coordinate systems with fixed spatial relationships. The directional calculations (North/South/East/West) from coordinate pairs follow mathematical rules rather than requiring natural language interpretation (Lynch, 1960). While implemented through MCD's stateless prompt architecture for consistency, the underlying logic could theoretically be hardcoded as coordinate transformation functions.
Dynamic Classification: System Diagnostics (W3)
System diagnostics require heuristic classification logic that routes based on issue complexity:
MCD Structured Implementation:
Task: Classify system issues into [category, priority, diagnosticsteps]
Rules: P1→P2→P3 priority | Category [type], Priority [level], Steps [sequence]
Missing info → "Insufficient data for [category] classification"
Adaptive Behavior:
Input: "Server crash" → Output: "Category: Infrastructure, Priority: P1, Steps: [Check logs→services→hardware]" Input: "Something's slow" → Output: "Insufficient data for classification"
This dynamic classification adapts based on diagnostic information availability, requiring heuristic pattern matching across multiple categories and priority levels with varying step sequences depending on issue type (Basili et al., 1994).
Architectural Implications for MCD Design
| Walkthrough | Prompt Type | Adaptation Mechanism | Design Rationale |
|---|---|---|---|
| W1: Healthcare Booking | Dynamic | Conditional slot extraction with variable missing-data prompts | Natural language request variability requires runtime intent parsing |
| W2: Spatial Navigation | Semi-Static | Deterministic coordinate calculations with fixed directional rules | Structured spatial relationships enable mathematical transformation logic |
| W3: System Diagnostics | Dynamic | Heuristic category routing with priority-based step sequencing | Issue complexity variation demands adaptive classification paths |
This pattern distinction demonstrates a critical MCD principle: constraint-first design must match prompt logic complexity to task structure (Kahneman, 2011). Over-engineering navigation with dynamic NLP parsing wastes tokens; under-engineering diagnostics with hardcoded rules fails to handle variable issue patterns. W1 and W3 implement symbolic routing that adapts to user intent, while W2 leverages deterministic logic where task structure permits (Kojima et al., 2022).
Cross-Reference to Validation: These adaptation patterns are empirically validated through comparative strategy testing in Chapter 7, where MCD's structured approaches achieve 75-80% performance retention under Q1 constraint pressure compared to 25-40% for conversational baselines (detailed in Sections 7.2-7.4).
To operate without persistent memory, context is anchored entirely within the prompt using several techniques (Lewis et al., 2020; Thoppilan et al., 2022):
- Declarative Token Packing: Semantically rich content is transformed into token-efficient representations (e.g., an appointment request becomes [intent: book], [time: today], [specialty: neuro]) (Radford et al., 2021).
- Token Window Budgeting: Each prompt is budgeted using a formula: total_window = core_logic_tokens + fallback_tokens + input_compression_tokens (Howard et al., 2017). This budget is typically constrained to 128–256 tokens for browser-based WebLLM deployments. For example, in the Drone Navigation walkthrough -W2 (Ch. 7), waypoint data is expressed as compressed spatial tokens like [N, 2], [E, 3] instead of verbose instructions, preserving space for fallback logic.
- Symbol Compression for Inference: If total_window exceeds the pre-set budget, the Capability Plateau Detector (4.5) is invoked to flag potential prompt bloat (Perez et al., 2022).
This token efficiency is validated in T1 and T5 in Chapter 6, ensuring the design remains within deployment constraints (Li et al., 2024).
MCD agents are designed to recover gracefully from ambiguity or user error by invoking structured fallback loops embedded in their prompt logic (Kadavath et al., 2022). All fallback loops terminate in one of three states: task completion, symbolic abandonment, or escalation (e.g., a ‘defer to human’ message) (Lin et al., 2022). This involves:
- Re-prompting for clarification.
- Controlled failure and safe exits.
- Stateless retry logic.
These fallback flows are mapped using failure diagrams (Appendix D) and validated using the loop complexity and semantic collapse diagnostics in Appendix E (Basili et al., 1994). For example, the appointment booking agent’s Loop 2 recovery (see Table 5.1) maps to the Redundancy Index thresholds defined in simulation test T6. Citing a real example from Chapter 7, the agent maps the input ‘I want to book something for tomorrow’ to a symbolic routing node: {intent: ‘appointment_booking’, time: ‘tomorrow’}, which is encoded directly in the prompt logic.
| Loop Stage | Condition Trigger | Action |
|---|---|---|
| Loop 1 | Missing time or specialty | Re-prompt: “Please specify a time and specialty.” |
| Loop 2 | Invalid doctor name or unavailability | Re-prompt with a list of available options. |
| Loop 3 | Repeated error or ambiguity | Exit with: “Unable to book. Please try again later.” |
Empirical Fallback Validation:
Structured fallback loops achieve 83% recovery from degraded inputs compared to 41% for free-form conversational approaches (T3) (Ouyang et al., 2022). The two-loop maximum prevents semantic drift while maintaining 420ms average resolution time, validating bounded recovery design (T9).
To rigorously explore minimal capability agents, we formalize a three-tier capability structure based on quantization levels that reflects real-world deployment constraints (Jacob et al., 2018; Nagel et al., 2021)
| Capability Tier | Architectural Purpose | Representative Models | Target Environment |
|---|---|---|---|
| Q1 | Ultra-minimal simulation— Extreme constraint testing; evaluates framework stability under severe resource limitations | Simulated decoding (Top-1, 0 temp, ≤16 tokens) | Embedded or ultra-low-power devices |
| Q4 | Optimal balance point— Realistic minimal models that maintain capability while respecting constraint boundaries | TinyLlama, SmolLM, Qwen 1.5B/3B (q4f16) | Web, mobile, edge |
| Q8 | Strategic fallback tier— Higher-capability models for complex task recovery while preserving minimality principles | Phi-3.5, Gemma, Mixtral (q4f32 or q8) | Full-stack fallback or cloud |
Constraint-Progressive Validation: This tiered structure enables systematic testing of constraint-resilience—measuring how agents maintain functionality across progressive capability tiers. Unlike traditional benchmarking that optimizes for peak performance (Q8), MCD validates minimal sufficiency by testing whether lower tiers (Q1/Q4) achieve equivalent task completion with superior resource efficiency. (Dettmers et al., 2022).
Architectural Minimality Across Tiers:
- Each tier implements Minimality by Default through progressive capability restriction (Frantar et al., 2023):
- Q1: Ultra-minimal baseline with zero external dependencies
- Q4: Optimal balance maintaining MCD principles while enabling practical deployment
- Q8: Strategic fallback preserving minimalist architecture while providing recovery capability
Degeneracy Detection operates across all tiers, systematically removing unused computational pathways regardless of available resources (Zafrir et al., 2019).
Q1 Ultra-Minimal Simulation Protocol
Since true 1-bit quantized LLMs remain technically infeasible as of 2025 (though emerging research suggests future viability), Q1 conditions are simulated through architectural constraints that functionally replicate extreme quantization effects rather than actual bit-precision reduction (Haas et al., 2017). This simulation protocol creates a conservative constraint boundary that tests framework resilience beyond currently available quantization implementations (Jacob et al., 2018; Nagel et al., 2021).
The Q1 tier enforces the following constraints:
Token budget constraint: ≤16 tokens maximum per interaction
Deterministic decoding: Top-1 greedy selection (temperature = 0)
Stateless enforcement: Zero context retention between interactions
Latency simulation: Introduces realistic edge processing delays
By simulating 1-bit conditions through deterministic decoding and extreme token budgets, this approach ensures MCD principles remain valid even as hardware capabilities advance toward true 1-bit inference (Dettmers et al., 2022; Frantar et al., 2023). The simulation approximates the resource scarcity and performance characteristics expected from ultra-low-precision quantization without requiring actual 1-bit hardware implementations, enabling systematic validation of constraint-resilience under conditions that exceed current deployment limitations (Zafrir et al., 2019).
Modern agents span multiple architectural paradigms (Park et al., 2023; Qin et al., 2023). For minimal agents under resource constraints, it is critical to choose architectures that balance capability and cost:
- Prompt-based agents: Stateless, lowest memory footprint, excellent for edge/WASM deployment.
- Context-aware agents: Retain minimal session context or page state. May use embeddings or Redis-backed memory (Karpukhin et al., 2020).
- Self-reflective agents: Implement chain-of-thought or reflection cycles. High accuracy but incompatible with MCD goals (Zhang et al., 2022).
This thesis adopts prompt- and page-context-based approaches, avoiding persistent memory for fallback compatibility.
Table 5.4: Agent Architecture Comparison
| Agent Type | Memory Required | Toolchain Dependence | Prompt Size Flexibility | MCD Compatibility | Deployment Fit |
|---|---|---|---|---|---|
| Prompt-only | ❌ No | ✅ Minimal | ⚠️ Moderate | ✅ Full | Edge/Mobile |
| Context-aware | ✅ Yes | ⚠️ Redis/Embedding | ✅ Large | ❌ Limited | Full-stack Web |
| Self-reflective | ✅ Yes | ❌ High | ✅ Expansive | ❌ Incompatible | Cloud / R&D |
| MCD Tiered Agent | ❌ No | ✅ Quantization only | ⚠️ Constrained | ✅ Full | Web, Edge |
Deployment Context Differentiation: This analysis demonstrates that MCD’s prompt-centric approach sacrifices peak performance capabilities for constraint-resilience and deployment flexibility (Schwartz et al., 2020). While context-aware and self-reflective agents excel in resource-abundant environments, MCD provides stable functionality when resource constraints eliminate traditional architectural approaches.
MCD principles operate across all architectural layers, not just prompt design (Bommasani et al., 2021):
System Architecture Level:
- Bounded Rationality: WebAssembly deployment constraints enforce computational frugality across runtime, memory allocation, and execution cycles
- Degeneracy Detection: Systematic removal of unused JavaScript modules, redundant API endpoints, and dormant execution paths
- Minimality by Default: Zero-dependency deployment baseline, with external tools added only after constraint-bounded failure analysis
Runtime Execution Level:
- Bounded Rationality: Token budgets constrain not just prompts but tool invocations, context reconstruction, and fallback iterations
- Degeneracy Detection: Dynamic pruning of unused routing branches and idle capability modules during execution
- Minimality by Default: Stateless regeneration protocols that reconstruct context without persistent storage systems
This comprehensive application distinguishes MCD from optimization approaches that focus solely on model compression or prompt efficiency.
For safety and compatibility with minimal contexts, agents in this thesis support bounded adaptation using regeneration protocols (e.g., MCP) (Anthropic, 2024). These protocols reconstruct sufficient local context without storing state, ensuring:
- Compatibility with browser and serverless environments
- Avoidance of over-engineering (e.g., full memory graphs, chat threading)
- Safe fallback behavior under uncertain input
Entropy-based heuristics and stateless fallback ensure robust behavior even in failure-prone, low-capacity models (Q1/Q4 tiers) (Barocas et al., 2017).
For instance, in a symbolic calendar agent, the MCP may represent user state as:
[MCP] = [intent: 'add_event'], [date: '2025-09-01'], [time: '10:00'], [desc: 'Team Sync']
This minimal context can be reconstructed from user text like “Add a meeting at 10am on September 1” without retaining prior dialogue turns. Each prompt regeneration encodes such MCP states inline, preserving context without memory.
These architectural decisions reflect comprehensive MCD implementation rather than isolated prompt optimization, validated through systematic constraint testing in Chapter 6 and applied domain analysis in Chapter 7.
Safety Validation Evidence:
Under constraint overload, MCD approaches exhibit safe failure modes with transparent limitation acknowledgment, while over-engineered systems generate confident but incorrect responses (87% hallucination rate vs 0% for MCD in T7 stress testing). This validates bounded adaptation as a safety mechanism.
This chapter detailed how the MCD framework is instantiated into a concrete, testable agent template. The template’s stateless logic, symbolic prompt routing, and fallback-safe control flows are designed for minimal hardware assumptions. This instantiation serves as the operational baseline for the framework’s evaluation in the subsequent chapters.
The Prompt Layer (4.3.1) is validated via tests T1–T3 for symbolic routing and minimal reasoning.
The principles of the Memory Layer (4.6.2) are tested in T4–T5 for stateless regeneration.
The Fallback Readiness (4.6.4) is assessed in T6–T9 for controlled failure recovery.
These components are then applied in the domain-specific walkthroughs in Chapter 7, ensuring that the theoretical design translates into practical, edge-ready agent behavior.
These stateless designs are mapped directly to simulation tests T1–T9 described in Chapter 6, allowing for structured validation of each agent behavior under symbolic, quantized, and degraded conditions. This connection ensures that theoretical design principles are not merely assumed but empirically tested.
Having defined and instantiated the MCD framework, we now turn to its validation. Part III begins with constrained simulations that probe MCD’s robustness, followed by applied walkthroughs, comparative evaluation, and conclusions. These empirical and practical evaluations determine whether MCD, as designed, holds up under real-world limitations.
🧩 Part III: Validation, Extension, and Conclusion
Having laid the conceptual foundation of Minimal Capability Design (MCD) in Parts I and II, this final part transitions into validation and evaluation. It demonstrates how MCD performs under real-world constraints, both in controlled simulations and applied agent workflows.
This part follows a coherent arc: it begins with simulation tests that probe MCD’s core principles under stress (Chapter 6), then applies these principles in domain-specific walkthroughs (Chapter 7). Next, it evaluates MCD’s sufficiency and trade-offs against full-stack frameworks (Chapter 8), proposes forward-looking extensions (Chapter 9), and concludes with a synthesis of findings (Chapter 10).
Together, these chapters test the viability, robustness, and generalizability of MCD in constrained environments.

