Chapter 3

Designing Lightweight AI Agents for Edge Deployment

A Minimal Capability Framework with Insights from Literature Synthesis

🧱 Part I: Foundations

📘 Chapter 3: Methodology

This chapter outlines the research strategy used to formulate, instantiate, and evaluate the Minimal Capability Design (MCD) framework (Peffers et al., 2007). The methodology combines constructive design—deriving the MCD framework from literature synthesis—with evaluation via constrained simulations and domain walkthroughs (March & Smith, 1995). This design-science approach creates the artifact (the framework) and tests its internal coherence through use-oriented demonstration (Gregor & Hevner, 2013).

3.1 Research Design

The research is grounded in two complementary paradigms (Creswell & Creswell, 2017):

Constructive Design: The MCD framework is inductively derived from an extensive literature analysis, grounded in architectural failures and over-engineering patterns observed in existing AI agents (Järvinen, 2007; Kasanen et al., 1993). This process emphasizes abstraction, simplification, and design synthesis over direct empirical comparison.

Evaluative Demonstration: Rather than proving universal superiority through performance benchmarks, this work validates MCD principles through constraint-resilience testing via progressive resource degradation scenarios (Chapter 6) and domain-specific walkthroughs (Chapter 7) (Venable et al., 2016). This approach specifically measures how agents maintain functionality as computational resources decrease, testing MCD’s core hypothesis that predictable constraint-handling outweighs peak performance optimization in edge deployment scenarios (Singh et al., 2023).

This dual strategy reflects the epistemic stance of design science research: creating an artifact (the MCD framework) and validating its internal coherence and utility through demonstration (Hevner et al., 2004; March & Smith, 1995).

Table 3.1 - Methodological Framework Components
Methodological Element Description
Framework Construction Literature-grounded synthesis of design principles.
Simulation Browser-based heuristic stress tests under emulated edge constraints.
Walkthroughs Domain-grounded validation of MCD principles in realistic scenarios.
Evaluation Qualitative comparison of MCD agents against orchestration-heavy design patterns.
Risk Analysis Identification of failure modes related to prompt dependency and architectural brittleness.

Agent architecture selection (TinyLLMs, symbolic agents, minimal prompt-executors) was informed not by simplicity alone, but through a structured exclusion of over-engineered patterns (e.g., MoE, PEFT-heavy stacks, orchestration-reliant agents) as evaluated in Chapter 2 (Bommasani et al., 2021). Design decisions favor architectures with provable fallback behavior, auditability, and stateless re-instantiation—criteria formalized in the MCD validation matrix (Ribeiro et al., 2016).

3.2 Literature Synthesis Method

The framework’s development involved a structured analysis of over 70 academic and industry sources related to lightweight agents, model compression, prompt engineering, stateless inference, and over-engineered toolchains (Webster & Watson, 2002; Vom Brocke et al., 2009).

Synthesis Protocol:
This research analyzed 73 peer-reviewed papers and technical reports using a structured approach (Petticrew & Roberts, 2006). Search terms included “minimal capability AI,” “edge agent deployment,” “prompt minimalism,” and “lightweight LLM optimization” across databases such as ACL Anthology, NeurIPS, ICML, and arXiv, focusing on publications from 2020-2025 (Kitchenham & Charters, 2007). Papers were selected for inclusion based on three criteria: (a) demonstration of lightweight reasoning, (b) deployment or benchmarking on real or simulated edge hardware, and (c) evidence of prompt minimalism or a stateless/lean design philosophy (Braun & Clarke, 2006). Insights were extracted and coded using a three-layer taxonomy: (1) Prompt Layer patterns, (2) Memory management strategies, and (3) Execution optimization techniques (Thomas, 2006). Exclusion criteria eliminated papers focusing solely on cloud-based agents or those without empirical data. This synthesis method directly informed the MCD framework components detailed in Chapters 4-7 (Miles et al., 2013).

3.3 Simulation Validation Strategy

To evaluate the robustness of MCD principles under real-world constraints, a browser-based simulation testbed was created (Li et al., 2024). This setup emulates edge-like conditions (no backend, no persistent memory, no external tools) and allows for controlled interaction with lightweight language models (Xu et al., 2023).

Simulation Setup:

  • Platform: Purely browser-executed LLMs (e.g., WebLLM, Transformers.js, or quantized GGUF models via WebAssembly) to ensure local execution (Haas et al., 2017; Chen et al., 2024).
  • Constraints: No backend calls, no server-side memory, and strictly token-limited prompts to mirror edge limitations (Banbury et al., 2021).

Programming Language and Runtime Selection for Edge Deployment

The choice of programming language and runtime environment fundamentally impacts edge deployment viability, particularly for resource-constrained scenarios. JavaScript with WebAssembly (Wasm) compilation was selected for MCD validation due to several constraint-alignment factors:

Cross-Platform Portability: JavaScript executes consistently across browsers, embedded systems (via Node.js), and microcontrollers (ESP32, RP2040), eliminating platform-specific compilation dependencies that increase deployment fragility.

Memory Efficiency: WebAssembly enables near-native execution performance with minimal memory overhead—critical for devices with 512MB RAM constraints where Python interpreters consume 100-200MB baseline memory before model loading.

Zero-Dependency Deployment: Browser-native JavaScript requires no external runtime installation, aligning with MCD's Minimality by Default principle. In contrast, Python-based deployments introduce dependency management complexity (pip, conda environments) that violates stateless design requirements.

Latency Characteristics: Validated 430ms average latency in browser-based WebAssembly environments provides realistic proxy for ARM-based edge device performance without hardware procurement variability.

Alternative language considerations were architecturally evaluated but excluded:

  • Python: High interpretive overhead, runtime dependency complexity, and 3× memory footprint compared to WebAssembly make it unsuitable for ultra-constrained edge scenarios despite mature ML ecosystem support.
  • C/C++: Near-optimal performance but compilation complexity, platform-specific binary management, and development overhead conflict with MCD's reproducibility and rapid prototyping requirements.
  • Rust: Excellent memory safety and performance characteristics, but limited edge AI ecosystem maturity and steep learning curve reduce accessibility for framework validation and adoption.

This runtime selection ensures that MCD validation reflects realistic edge deployment constraints—where computational efficiency, zero-dependency execution, and cross-platform consistency determine deployment viability rather than optimal-condition performance benchmarks.

For each MCD principle under test, 3–5 runs are conducted per variation, logging token usage, recovery success rate, and failure type to assess robustness (Cohen, 1988).

Table 3.2 - Metrics Tracked:

Metric Measurement Method Purpose
Token Budget Utilization Average tokens per successful interaction. Measures prompt efficiency.
Inference Latency Time from prompt submission to response completion (ms). Assesses real-time viability.
Memory Load Peak browser tab memory usage during inference (MB). Validates low-footprint design.
Recovery Success Rate % of successful task completions after prompt degradation. Tests fallback robustness.
Failure Type Categorization of errors (e.g., hallucination, context loss). Diagnoses architectural weaknesses.

Constraint-Progression Methodology: Each simulation test implements progressive resource degradation (Q8→Q4→Q1 quantization, token budget reduction, memory limitation) to validate the hypothesis that MCD maintains stable performance while alternative approaches show significant degradation (Jacob et al., 2018; Nagel et al., 2021). This methodology specifically tests constraint-resilience rather than optimal-condition performance, reflecting real-world edge deployment scenarios where resources fluctuate unpredictably (Strubell et al., 2019).

Threshold Calibration: Token efficiency thresholds were calibrated based on edge deployment constraints where 512-token budgets represent realistic limits (Howard et al., 2017). The 90% recovery success rate threshold reflects reliability requirements for safety-critical applications, while semantic drift detection at 10% deviation provides early warning for capability degradation under constraint conditions where traditional approaches show significant degradation (Amodei et al., 2016).

The purpose of these simulations is not to benchmark raw task performance but to stress-test the framework’s design principles, such as fallback robustness, stateless regeneration, and symbolic prompt sufficiency (Venable et al., 2016).

Among various optimization strategies surveyed (e.g., pruning, PEFT, distillation), only quantization is implemented in the simulation layer (Dettmers et al., 2022; Frantar et al., 2023). This is due to its runtime applicability without training infrastructure, full compatibility with stateless agents, and its ability to enable multiple capability tiers (1-bit, 4-bit, 8-bit) without retraining or persistent memory overhead (Zafrir et al., 2019). Other techniques—while valuable architecturally—introduce session state, model retracing, or external dependency that violates MCD deployment assumptions. This distinction reflects the design-time trade-off analysis discussed in Chapter 2. Subsequent validation confirms Q4 quantization as optimal for 80% of constraint-bounded reasoning tasks, with Q1→Q4 fallback mechanisms providing safety for ultra-minimal deployments while Q8 represents over-provisioning for most edge scenarios.

Crucially, validation demonstrates that under progressive constraint pressure, MCD approaches maintain 85% performance retention when quantization drops to Q1, compared to 40% retention for Few-Shot approaches and 25% for conversational methods—validating the constraint-first design philosophy (Sahoo et al., 2024).

3.4 Walkthrough Design Method

Chapter 7 demonstrates MCD principles through three domain-specific walkthroughs using comparative multi-strategy evaluation (Yin, 2017). Each domain tests MCD against four alternative prompt engineering approaches (Conversational, Few-Shot Pattern, System Role Professional, Hybrid Multi-Strategy) under progressive resource pressure across quantization tiers (Q1/Q4/Q8).

Domain Selection

Healthcare Appointment Booking: Tests structured slot-filling extraction, dialogue completion under tight token constraints, and predictable failure patterns in high-stakes medical contexts (Berg, 2001).

Symbolic Indoor Navigation: Tests stateless spatial reasoning, coordinate processing without persistent maps, and safety-critical decision-making where route hallucination poses liability risks (Lynch, 1960).

System Diagnostics: Tests heuristic classification under complexity scaling, bounded diagnostic scope, and transparent limitation acknowledgment when data is insufficient (Basili et al., 1994).

Together, these domains cover structured extraction, symbolic reasoning, and heuristic classification tasktypes under resource constraints (Eisenhardt, 1989).

Methodological Framework

Constraints: All walkthroughs simulate edge deployment with <256MB RAM, <512 token budgets, and no external APIs or persistent storage (Banbury et al., 2021).

Models: Quantized general-purpose LLMs (Q1: Qwen2-0.5B, Q4: TinyLlama-1.1B, Q8: Llama-3.2-1B) maintain consistency with Chapter 6 architecture (Dettmers et al., 2022).

Evaluation: Rather than optimal task performance, walkthroughs prioritize constraint-resilience evaluation: predictable degradation patterns under resource pressure (Q4→Q1 transitions), transparent failure modes that acknowledge capability boundaries rather than hallucinating, and production-reliability trade-offs between peak performance and constraint-tolerance (Amodei et al., 2016; Singh et al., 2023).

Scope Note

Walkthroughs employ generalized implementations demonstrating MCD architectural principles rather than domain-specific optimization. Domain enhancements (medical databases, SLAM algorithms, code parsers) would improve performance but fall outside the constraint-first architecture validation scope (Venable et al., 2016).

3.5 Evaluation Criteria

The evaluation of MCD agents relies on qualitative and behavior-driven criteria, emphasizing design principles over raw performance scores (Patton, 2014; Lincoln & Guba, 1985):

Table 3.3: MCD Agent Evaluation Criteria
Criterion Evaluation Method
Capability Sufficiency Task completion under the minimal viable architecture.
Statelessness % of correct state reconstructions after a simulated context reset.
Fallback Robustness Success rate after a 30% random token degradation in the prompt.
Degeneracy Detection Absence of unused component calls or empty API scaffolds in the execution trace.
Token Efficiency Average tokens per response must remain below a predefined budget (e.g., 256 tokens).
Interpretability A human reviewer rating of the clarity and logical coherence of the agent’s execution trace.
Design Simplicity The number of distinct functional components must not exceed the MCD threshold for the task.

No agent is expected to excel at every task particularly in resource-abundant scenarios where other approaches may excel (Venable et al., 2016) —rather, the evaluation assesses whether the agent’s design remains coherent and functional when subjected to architectural minimality and context degradation.

3.6 Ethical Assumptions and Risks

This research assumes agents will be deployed in constrained, non-critical environments (IEEE, 2017; Jobin et al., 2019). Nonetheless, ethical considerations are integrated into the framework:

Failure Transparency: In MCD, stateless agents deliberately omit persistent memory, which can cause silent failures (Barocas et al., 2017). Walkthroughs explicitly surface and log these cases to prevent invisible errors and ensure that system limitations are auditable (Selbst et al., 2019).

Constraint-Induced Safety: Under resource overload conditions, validation demonstrates that MCD approaches fail transparently (clear limitation acknowledgment) while over-engineered systems exhibit dangerous failure patterns including confident hallucination at 87% rates (Lin et al., 2022). This constraint-safety advantage validates the framework’s conservative design philosophy.

User Misinterpretation: Minimal agents may offer plausible but incorrect responses under prompt limits (Kadavath et al., 2022). The framework includes heuristics that guide prompt design to ensure user awareness of confidence boundaries and system limitations (Ribeiro et al., 2016).

Security and Privacy: All simulations are local; no real user data or internet tools are invoked (Papernot et al., 2016). The MCD principle of minimalism inherently reduces the attack surface (e.g., fewer dependencies, no data retention), but the framework also mandates that any adaptation to sensitive domains must include additional security layers (Barocas et al., 2017).

3.7 Tooling Artifacts and Future Hardware Evaluation

In line with design science methodology, the MCD validation includes diagnostic checklists and agent failure detection matrices (see Appendix E), used both during walkthrough design and retrospective evaluation (Hevner et al., 2004). These artifacts serve to formalize tacit design trade-offs into reusable tooling.

While not implemented in this thesis, future iterations of MCD agent evaluation are envisioned for hardware environments like the Raspberry Pi 4 and NVIDIA Jetson Nano (NVIDIA, 2020). These tests would track real-time latency, energy consumption, and memory profiles under live execution constraints, grounding the framework’s deployment assumptions in empirical data (Banbury et al., 2021).

Table 3.4: Target Hardware Deployment Environments
Device Class Recommended Models MCD Components Supported Max Agent Complexity
Ultra-Low Power ESP32-S3 Prompt Layer only Single-turn Q&A
Edge Computing Jetson Nano All layers Multi-turn + RAG
Browser Runtime WebAssembly Prompt + Memory Stateless dialogue

Validation Continuity Framework: Browser-based WebAssembly simulation (430ms average latency) provides baseline measurements for ARM device comparison, ensuring that constraint-resilience findings translate to real hardware deployment scenarios (Haas et al., 2017). This methodology bridges controlled validation with practical deployment requirements.

Table 3.5: Tooling Differentiator Table

Optimization Tool

MCD Compatibility

Runtime Dependency

Design Justification

Quantization (Q1–Q8)

✅ High

❌ None

Enables tiered fallback and edge runtime

Small Language Models (SLMs)

✅ High

❌ None

Domain specialization with parameter efficiency at model level

Distillation

❌ Low

✅ Training infra

Requires teacher models and session state

PEFT (e.g., LoRA)

❌ Low

✅ Persistent modules

Adds latency and memory fragility

Pruning

⚠️ Medium

⚠️ Requires retraining

Potential loss of logical structure

Adaptive Computation

❌ Low

✅ Dynamic graphing

Incompatible with stateless inference

Of these, quantization and Small Language Models maintain minimal architectural complexity while enabling runtime adaptability. Quantization achieves efficiency through post-training compression across tiers (Q1/Q4/Q8), while SLMs achieve similar goals through domain-focused pre-training and parameter reduction (Belcak et al., 2025). Both approaches align naturally with MCD's stateless, constraint-first design principles without requiring persistent modules or dynamic runtime infrastructure, making them the primary MCD-aligned optimization strategies (Jacob et al., 2018; Microsoft Research, 2024).

However, empirical validation of purpose-built SLMs (e.g., Phi-3-mini, SmolLM) was not conducted in this research. The simulations and walkthroughs utilized quantized general-purpose LLMs (Chapters 6-7), making SLM-MCD integration validation an important direction for future research (Hu et al., 2021; Hinton et al., 2015).

Next Part Preview

🧱 Part II: The MCD Framework

Part II introduces the core contribution of this thesis: the Minimal Capability Design (MCD) framework. This section defines MCD’s conceptual underpinnings (Chapter 4) and then instantiates it as a practical, deployable agent architecture (Chapter 5).

Unlike traditional agent stacks that add memory, orchestration, and redundancy by default, MCD is a design-first approach grounded in statelessness, prompt sufficiency, and failure-resilient minimalism.

This part lays the architectural groundwork upon which simulation and walkthrough validations in Part III are built.