Designing Lightweight AI Agents for Edge Deployment
A Minimal Capability Framework with Insights from Literature Synthesis
Contents Overview
This appendix provides detailed architectural diagrams for each of the MCD layers: the Prompt Layer, the Stateless Control Layer, the Execution Layer, and the integrated Fallback mechanisms. These visual representations clarify how MCD avoids orchestration-heavy pipelines while maintaining architectural discipline.
Purpose Statement
To visually link the subsystem designs from Chapter 4 with the instantiated agent architecture in Chapter 5, demonstrating how MCD principles (Minimality by Default, Bounded Rationality, Degeneracy Detection) manifest in concrete system architecture without requiring complex orchestration frameworks.
Figure D.1: Complete MCD Layer Architecture
┌─────────────────────────────────────────────────────────────┐
│ PROMPT LAYER (Section 4.3.1) │
├─────────────────────────────────────────────────────────────┤
│ • 90-130 token capability plateau (Bounded Rationality) │
│ • Zero-shot baseline prompting (Minimality by Default) │
│ • Embedded fallback logic (Degeneracy Detection) │
│ • Symbolic routing with IF-THEN decision trees │
│ │
│ Input: User Query → Intent Router → Decision Prompt │
│ Output: Symbolic routing tokens + Execution instructions │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ STATELESS CONTROL LAYER (Section 4.3.2) │
├─────────────────────────────────────────────────────────────┤
│ • In-prompt routing logic (No external orchestration) │
│ • Deterministic fallback paths (Bounded Rationality) │
│ • Symbolic decision trees (≤3 depth, ≤4 branches) │
│ • Context regeneration without persistent memory │
│ │
│ Flow: Intent Classification → Route Selection → Context │
│ Anchoring → Execution Triggering │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ EXECUTION LAYER (Section 4.3.3) │
├─────────────────────────────────────────────────────────────┤
│ • Q1/Q4/Q8 quantization tiers (Hardware-aware) │
│ • Local inference only (WebAssembly/llama.cpp) │
│ • Dynamic tier routing: Q1→Q4→Q8 (drift >10% threshold) │
│ • Resource constraints: <512MB RAM, <500ms latency │
│ │
│ Components: Quantized LLM → Local Runtime → Response │
└─────────────────────────────────────────────────────────────┘
↓
RESPONSE OUTPUT
Figure D.2: Prompt Layer Design Pattern
USER INPUT
↓
┌─────────────────────────────────────────────────┐
│ PROMPT STRUCTURE │
├─────────────────────────────────────────────────┤
│ System: [Lightweight stateless assistant] │
│ Context: [Compressed state tokens] │
│ Intent Router (Symbolic Decision Tree): │
│ • IF intent=booking → appointment_logic │
│ • IF intent=navigation → spatial_logic │
│ • IF intent=diagnostic → heuristic_logic │
│ • ELSE → clarification_logic │
│ Fallback: [Bounded loops ≤2 iterations] │
│ Output Format: [Structured symbolic tokens] │
└─────────────────────────────────────────────────┘
↓
SYMBOLIC ROUTING DECISION
↓
EXECUTION PATHWAY
Key Components:
- Token-efficient context packing:
intent=book, time=today, specialty=neuro
(explicit slot passing, T4 validation) - Embedded routing logic: Decision branches encoded as IF-THEN token patterns (Section 5.2.1)
- Fallback safety: Bounded clarification loops (≤2 iterations, Anti-Pattern 4)
- Adaptation patterns: Dynamic (W1/W3), Semi-Static (W2) routing strategies (Table 5.1)
Figure D.3: Control Layer Decision Logic
PROMPT INPUT
↓
┌─────────────────────────────────────────────────┐
│ INTENT CLASSIFICATION │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │
│ │ BOOKING │ │ NAVIGATION │ │DIAGNOSTIC│ │
│ │ Route A │ │ Route B │ │ Route C │ │
│ │ (Dynamic) │ │(Semi-Static)│ │(Dynamic) │ │
│ └─────────────┘ └─────────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘
↓ ↓ ↓
ROUTE A: Booking ROUTE B: Navigation ROUTE C: Diagnostic
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ • Dynamic slot │ │ • Deterministic │ │ • Heuristic │
│ extraction │ │ coordinate │ │ category │
│ • Clarification │ │ calculation │ │ routing │
│ • Confirmation │ │ • Landmark refs │ │ • Priority │
│ (W1 pattern) │ │ (W2 pattern) │ │ (W3 pattern) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
↓ ↓ ↓
FALLBACK ROUTE (if needed)
┌─────────────────────────┐
│ • Bounded clarification │
│ • Safe limitation exit │
│ • Controlled failure │
└─────────────────────────┘
↓
EXECUTION LAYER
Control Flow Characteristics:
- No persistent state: Each decision cycle is self-contained (T4: 5/5 stateless success)
- Symbolic routing: Token patterns trigger execution paths (Section 5.2.1)
- Bounded fallback: Maximum 2-loop recovery prevents semantic drift (T5: >3 steps causes drift)
- Context regeneration: State reconstructed from explicit slot reinjection (Section 4.2)
Figure D.4: Tiered Execution Model
TASK COMPLEXITY ASSESSMENT
↓
┌─────────────────────────────────────────────────┐
│ TIER SELECTION LOGIC (T10) │
├─────────────────────────────────────────────────┤
│ Q1: Ultra-minimal (Qwen2-0.5B, 300MB RAM) │
│ ↓ (if semantic drift >10%) │
│ Q4: Optimal balance (TinyLlama-1.1B, 560MB) │
│ ↓ (if performance <80% or timeout) │
│ Q8: Strategic fallback (Llama-3.2-1B, 800MB) │
│ │
│ Evidence: Q4 optimal for 80% of tasks (T10) │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ LOCAL EXECUTION RUNTIME (T8) │
├─────────────────────────────────────────────────┤
│ WebAssembly Runtime (Browser deployment) │
│ OR │
│ llama.cpp (Native/Raspberry Pi deployment) │
│ OR │
│ WebLLM (JavaScript-based inference) │
│ │
│ Validated Constraints (T8): │
│ • No backend servers (edge-first principle) │
│ • Local inference only │
│ • <500ms average latency (Q4 tier: 430ms) │
│ • <512MB memory stable deployment │
└─────────────────────────────────────────────────┘
↓
RESPONSE OUTPUT
Figure D.5: Fallback Recovery Paths
TASK EXECUTION
↓
MONITORING LAYER (Continuous Validation)
┌─────────────────────────────────────────────────┐
│ • Semantic Drift Detection (>10% threshold, T10) │
│ • Confidence Scoring (below threshold triggers) │
│ • Response Timeout (>latency limit detection) │
│ • Input Ambiguity (unclear intent classification)│
└─────────────────────────────────────────────────┘
↓ (if failure detected)
┌─────────────────────────────────────────────────┐
│ BOUNDED FALLBACK SEQUENCE │
├─────────────────────────────────────────────────┤
│ Loop 1: Specific clarification request │
│ "Please specify [missing_slot]" │
│ ↓ (if still unclear) │
│ Loop 2: Bounded options or constraints │
│ "Choose: [option_A, option_B, option_C]" │
│ ↓ (if continued failure, max depth=2) │
│ Safe Exit: Transparent limitation │
│ "Unable to complete [task]. Limitation: │
│ [specific_constraint]. Please [action]." │
└─────────────────────────────────────────────────┘
↓
CONTROLLED TERMINATION (T7: 80% success)
Fallback Characteristics (Empirically Validated):
- Bounded loops: Maximum 2 recovery attempts (T5: >3 steps causes semantic drift)
- Progressive degradation: Each loop reduces complexity, narrows scope
- Transparent limitation: Clear acknowledgment of constraint boundaries (W2/W3 safety-critical)
- Stateless recovery: No dependency on session memory (T4: 5/5 stateless success)
Figure D.6: Complete MCD Agent Lifecycle
USER QUERY
↓
┌─────────────────────────────────────────────────┐
│ PROMPT LAYER: Intent parsing + Route selection │
│ • Adaptation pattern determination (W1/W2/W3)│
├─────────────────────────────────────────────────┤
│ CONTROL LAYER: Symbolic routing + Context mgmt │
│ • Decision tree execution (≤3 depth, ≤4 branch)│
├─────────────────────────────────────────────────┤
│ EXECUTION LAYER: Q-tier selection + Local exec │
│ • Dynamic tier routing Q1→Q4→Q8 (T10) │
├─────────────────────────────────────────────────┤
│ FALLBACK MONITORING: Error detection + Recovery│
│ • Bounded loops ≤2, transparent limitations │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ SUCCESS PATH │
│ Task Completion → Validated Response Output │
│ Performance: 85% retention under Q1 (T10) │
└─────────────────────────────────────────────────┘
OR
┌─────────────────────────────────────────────────┐
│ FALLBACK PATH │
│ Controlled Degradation → Safe Limitation Exit │
│ Transparency: Clear constraint acknowledgment │
└─────────────────────────────────────────────────┘
Chapter 4, Section 4.6: MCD Subsystem Definitions
Chapter 5: Instantiated Agent Design Patterns
Chapter 6, Tests T1-T10: Empirical validation of layer interactions
Chapter 7, Walkthroughs W1-W3: Applied layer architecture in domain scenarios
This configuration framework ensures reproducible, statistically valid results while maintaining the ecological validity of real-world deployment constraints. All parameters were optimized for browser-based execution environments typical of edge AI deployment scenarios.