Designing Lightweight AI Agents for Edge Deployment
A Minimal Capability Framework with Insights from Literature Synthesis
Formula:
whereExample: W3 MCD Structured (80%) vs Few-Shot (40%)
- Mean difference: 0.40
- Pooled SD: 0.490
- Cohen's d = 0.82 (Large effect, d > 0.8)
Additional Comparisons:
Comparison | Cohen's d | Interpretation |
---|---|---|
T1: MCD vs Ultra-Minimal (100% vs 0%) | 2.00 | Extreme effect |
W1: Hybrid vs System Role (100% vs 60%) | 1.00 | Large effect |
W2: MCD vs Few-Shot (60% vs 40%) | 0.40 | Medium effect |
Interpretation: Large effects (d > 0.8) dominate key MCD comparisons, providing practical significance despite small sample sizes.
Formula:
T1 Token Efficiency Analysis:
- Approaches: MCD (0.297), Verbose (0.114), Baseline (0.125), CoT (0.159), Few-Shot (0.297)
- Grand mean: 0.198
- η² = 0.14-0.16 (Large effect by conventional standards, η² > 0.14)
Interpretation: Token efficiency variance across approaches represents large practical effects, validating architectural differentiation.
Extreme Case: MCD (5/5) vs Ultra-Minimal (0/5)
Approach | Success | Failure |
---|---|---|
MCD Structured | 5 | 0 |
Ultra-Minimal | 0 | 5 |
- Odds ratio: Infinite (complete separation)
- p-value = 0.0079 (p < 0.05, statistically significant)
Moderate Case: MCD (4/5) vs Few-Shot (2/5)
Approach | Success | Failure |
---|---|---|
MCD Structured | 4 | 1 |
Few-Shot | 2 | 3 |
- Odds ratio: 6.00
- p-value = 0.524 (not statistically significant, n=5 insufficient)
Interpretation: Extreme binary outcomes (5/5 vs 0/5) achieve statistical significance despite small n. Moderate differences (4/5 vs 2/5) lack power but show large effect sizes.
95% Confidence Intervals for Completion Rates (n=5):
Scenario | Point Estimate | 95% CI |
---|---|---|
MCD Structured (5/5) | 1.00 | [0.57, 1.00] |
MCD Structured (4/5) | 0.80 | [0.38, 0.96] |
Few-Shot (3/5) | 0.60 | [0.23, 0.88] |
Few-Shot (2/5) | 0.40 | [0.12, 0.77] |
Ultra-Minimal (0/5) | 0.00 | [0.00, 0.43] |
Interpretation: Wide confidence intervals reflect estimation uncertainty with n=5, emphasizing need for effect size analysis and cross-tier replication over point estimates.
MCD Cross-Tier Performance:
- Q1: 0.80, Q4: 0.80, Q8: 0.80
- Mean: 0.80, SD = 0.00 (perfect consistency)
Few-Shot Cross-Tier Performance:
- Q1: 0.40, Q4: 0.30, Q8: 0.20
- Mean: 0.30, SD = 0.10 (high variance)
Reliability Ratio: MCD demonstrates zero variance across tiers while Few-Shot shows 50% degradation (Q1 → Q8), validating constraint-resilience claim.
Comparison | Metric | Value | Interpretation | Sample |
---|---|---|---|---|
MCD vs Ultra-Minimal (T1) | Cohen's d | ∞ (5/5 vs 0/5) | Extreme effect | n=5/group |
MCD vs Few-Shot (W3) | Cohen's d | 0.82 | Large effect | n=5/group |
Hybrid vs System Role (W1) | Cohen's d | 1.00 | Large effect | n=5/group |
Token Efficiency (T1) | η² | 0.14-0.16 | Large practical effect | n=5 groups |
Cross-Tier Consistency | σ ratio | MCD: 0.00 vs FS: 0.10 | Perfect vs variable | n=3 tiers |
Sample Size Limitations: Small sample sizes (n=5 per variant) limit statistical power and generalizability. Traditional parametric assumptions (normality, homogeneity of variance) cannot be reliably assessed.
Effect Size Emphasis: Analysis prioritizes practical significance (effect sizes) over statistical significance (p-values):
- Cohen's d > 0.8 = large effect (practically meaningful)
- η² > 0.14 = large effect (substantial variance explained)
- Wide CIs reflect uncertainty but extreme point estimates (1.00 vs 0.00) provide categorical evidence
Validation Strategy: Strength of claims derives from:
- Extreme effect sizes (d = 2.0, η² = 0.14-0.16)
- Cross-tier replication (Q1/Q4/Q8 consistent patterns)
- Cross-domain validation (W1/W2/W3 convergent evidence)
- Categorical outcomes (100% vs 0% completion where applicable)
Appropriate Use Cases:
- ✅ Fisher's Exact Test for extreme binary outcomes (5/5 vs 0/5)
- ✅ Effect size calculations for practical significance
- ✅ Wide CIs to reflect estimation uncertainty
- ❌ Parametric tests (t-tests, ANOVA) underpowered with n=5
- ❌ Point estimates without confidence intervals