Designing Lightweight AI Agents for Edge Deployment
A Minimal Capability Framework with Insights from Literature Synthesis
This appendix provides comprehensive implementation guidance for the MCD Framework Application Decision Tree introduced in Section 8.7.2. Practitioners applying MCD principles to real-world deployment scenarios should consult this appendix for detailed decision logic, validation workflows, and empirically-derived thresholds from Chapters 4-7.
Structure Overview:
- G.1 – Phase 1: Context assessment questions and priority classification
- G.2 – Phase 2: Approach selection decision trees with anti-pattern enforcement
- G.3 – Phase 3: MCD principle validation workflows (minimality, rationality, degeneracy)
- G.4 – Phase 4: Three-layer implementation with pseudocode examples
- G.5 – Phase 5: Evidence-based validation test protocols
Each decision point includes empirical thresholds (e.g., token budgets, complexity limits, performance criteria) validated through browser-based simulations (T1-T10) and domain walkthroughs (W1-W3), ensuring practitioners can apply MCD framework with quantified deployment expectations. In Reference to Chapter 8 - MCD Framework Decision Tree
Purpose: Establish deployment profile through systematic questioning, determining whether MCD principles align with task requirements and resource constraints.
Q1: Primary Deployment Context Classification
CONTEXT_DECISION_TREE:
IF deployment IN [Edge Device, RAM <1GB, Offline, Battery-Powered]:
→ CONTEXT = CONSTRAINED
→ RATIONALE: Hardware limits require resource-efficient approaches
→ PROCEED TO Q2
ELIF deployment IN [Browser, WebAssembly, Client-Side]:
→ CONTEXT = BROWSER_EDGE
→ RATIONALE: WASM environment validated in T8 (Q4 tier optimal)
→ PROCEED TO Q2
ELIF deployment IN [Cloud, Full-Stack, RAM >2GB]:
→ EXIT_RECOMMENDATION: AutoGPT, LangChain, LangGraph
→ RATIONALE: Resource abundance enables richer frameworks
→ MCD not optimal for unconstrained environments
ELIF deployment == Hybrid:
→ CONTEXT = HYBRID_CONSTRAINTS
→ PROCEED TO Q2 with detailed constraint profiling
Deployment Context Examples:
- Constrained: Raspberry Pi, Jetson Nano, smartphone edge inference
- Browser Edge: In-browser agents, PWAs, WebAssembly deployment
- Hybrid: Progressive enhancement (edge-first with cloud fallback)
Q2: Optimization Priority Assignment
PRIORITY_MATRIX:
[1] Resource Efficiency (EFFICIENCY_PRIORITY = HIGH):
→ Optimization: Token minimization, memory footprint, latency
→ Empirical validation: T1/T6 token efficiency analysis
[2] User Experience Quality (UX_PRIORITY = HIGH):
→ Optimization: Natural language, conversation flow, error messages
→ Empirical validation: W1 UX scoring (89% conversational vs 68% MCD)
[3] Professional Output (QUALITY_PRIORITY = HIGH):
→ Optimization: Accuracy, completeness, domain expertise
→ Empirical validation: W3 diagnostic quality (96% hybrid vs 84% MCD)
[4] Educational/Learning (EDUCATION_PRIORITY = HIGH):
→ Optimization: Explanatory depth, pedagogical structure
→ Use case: Tutoring agents, learning assistants
[5] Balanced Multi-Objective (HYBRID_PRIORITY = HIGH):
→ Optimization: Weighted balance across dimensions
→ Requires advanced prompt engineering (74% accessibility threshold)
Note: Priority selection determines approach selection in Phase 2.
Q3: Stateless Capability Assessment
STATELESS_VALIDATION_CHECKLIST:
Task Requirements Analysis:
[Q3.1] Persistent conversation history needed? YES/NO
[Q3.2] Learning across sessions required? YES/NO
[Q3.3] Cumulative knowledge updates required? YES/NO
DECISION LOGIC:
IF ALL_ANSWERS == NO:
→ Task = STATELESS_COMPATIBLE
→ T4 Validation: 5/5 stateless regeneration success
→ PROCEED TO Q4
ELIF PARTIAL_YES (1-2 requirements):
→ Evaluate HYBRID_MCD_ARCHITECTURE
→ Design: Stateless core + external state manager
→ Document: State dependencies (Section 4.2)
→ WARNING: Increased complexity vs pure MCD
ELSE (ALL_YES):
→ MCD NOT SUITABLE
→ RECOMMENDATION: RAG/Vector DB + LangChain
→ EXIT with architectural justification
Stateless Viability Examples:
- ✅ Suitable: FAQ, appointment booking, navigation, single-turn diagnostics
- ⚠️ Hybrid: Multi-turn conversations with session context
- ❌ Unsuitable: Personalized learning, customer relationship management
Q4: Token Budget Classification
TOKEN_BUDGET_DECISION_TREE:
User specifies acceptable token budget:
[1] budget < 60 tokens:
→ MODE = ULTRA_MINIMAL
→ RISK: T6 validation shows 60% failure rate <60 tokens
→ RECOMMENDATION: Relax constraints if feasible
→ IF MANDATORY: Use symbolic logic, IF-THEN routing
[2] 60 ≤ budget ≤ 150 tokens:
→ MODE = MINIMAL (VALIDATED RANGE)
→ EVIDENCE: T1/T6 show 94% success rate maintenance
→ OPTIMAL: 75-85 token sweet spot (Section 8.3)
[3] 150 < budget ≤ 512 tokens:
→ MODE = MODERATE
→ NOTE: Approaching 90-130 token capability plateau
→ CONSIDERATION: Diminishing returns beyond 90 tokens
[4] budget > 512 tokens:
→ MODE = RESOURCE_ABUNDANT
→ EXIT_RECOMMENDATION: Non-MCD approaches likely optimal
→ RATIONALE: MCD sacrifices peak performance for constraints
[5] budget = Variable/Dynamic:
→ MODE = ADAPTIVE
→ IMPLEMENTATION: Dynamic allocation (Section 5.3)
→ VALIDATION: Tier-based routing (Q1→Q4→Q8)
Empirical Token Budget Guidance (from T1/T6):
- Minimum viable: 60 tokens (94% success floor)
- Optimal range: 75-90 tokens (peak efficiency-to-performance)
- Plateau threshold: 90-130 tokens (< 5% improvement beyond)
G.1 Output: Context profile fully documented → PROCEED TO PHASE 2 (Appendix G.2)
Purpose: Select optimal prompt engineering approach based on context profile from Phase 1, using empirically-validated performance data from Chapters 6-7. Each priority (Efficiency, UX, Quality, Education, Hybrid) maps to specific approaches with quantified trade-offs.
Decision Framework: Priority-driven selection trees route practitioners to approaches validated through T1-T10 simulations and W1-W3 domain walkthroughs, with explicit anti-pattern enforcement preventing empirically-documented failure modes.
G.2.1 Efficiency Priority Decision Tree
When to Use: EFFICIENCY_PRIORITY = HIGH (from G.1 Q2) — Deployments prioritizing token minimization, memory footprint reduction, and latency optimization.
EFFICIENCY_APPROACH_SELECTOR:
[Branch 1] Token Budget < 60 tokens (ULTRA_MINIMAL):
→ APPROACH: MCD STRUCTURED (MANDATORY)
→ PERFORMANCE: 92% efficiency, 81% context-optimal
→ VALIDATION: T1 approach comparison, T6 over-engineering detection
→ RATIONALE: Only viable approach at extreme constraints
→ QUANTIZATION: Force Q1 tier (Qwen2-0.5B, 300MB)
→ RISK: 60% failure rate if budget <60 (T6 evidence)
[Branch 2] 60 ≤ Token Budget ≤ 150 (MINIMAL):
→ APPROACH: HYBRID MCD+FEW-SHOT
→ PERFORMANCE: 88% efficiency, 86% context-optimal
→ VALIDATION: T1/W1/W2/W3 cross-domain validation
→ RATIONALE: Balances efficiency with pattern learning
→ QUANTIZATION: Start Q4 tier, fallback to Q1 if needed
→ IMPLEMENTATION: MCD structure + 2-3 Few-Shot examples
[Branch 3] Hardware RAM < 256MB (HARDWARE OVERRIDE):
→ APPROACH: MCD STRUCTURED (MANDATORY)
→ PERFORMANCE: Same as Branch 1
→ RATIONALE: Hardware constraint supersedes token budget
→ QUANTIZATION: Force Q1/Q4 tiers only
→ VALIDATION: T8 deployment environment testing
→ NOTE: Hardware limitations override task complexity
[Branch 4] DEFAULT (Budget >150, RAM ≥256MB):
→ APPROACH: MCD STRUCTURED with Q4 tier
→ PERFORMANCE: 85% retention under Q1, 95% under Q4
→ FALLBACK: Escalate to Hybrid if performance <80%
→ QUANTIZATION: Q4 optimal (TinyLlama-1.1B, 560MB)
→ VALIDATION: T10 quantization tier validation
Practical Example:
- Scenario: Edge device FAQ chatbot, 256MB RAM, 80-token budget
- Selection: Branch 2 → Hybrid MCD+Few-Shot
- Implementation: MCD slot-filling structure + 3 Few-Shot Q&A examples
- Expected Performance: 88% efficiency, 430ms average latency (W1 data)
G.2.2 User Experience Priority Decision Tree
When to Use: UX_PRIORITY = HIGH (from G.1 Q2) — Deployments prioritizing natural language interaction, conversation flow, and user-friendly error handling.
UX_APPROACH_SELECTOR:
[Branch 1] Deployment Constraints = Unconstrained (>2GB RAM, >512 tokens):
→ APPROACH: CONVERSATIONAL
→ PERFORMANCE: 89% user experience score
→ VALIDATION: W1 healthcare booking walkthrough
→ TRADEOFF: 1.5x token cost, 2.1x latency vs MCD
→ RATIONALE: Natural flow maximizes satisfaction when resources permit
→ WARNING: Degrades severely under constraint pressure (28% at <512 tokens)
[Branch 2] Deployment Constraints = Moderate (512MB-2GB, 150-512 tokens):
→ APPROACH: SYSTEM ROLE PROFESSIONAL
→ PERFORMANCE: 82% UX, 78% context-optimal
→ VALIDATION: W1/W2 walkthroughs
→ BALANCE: Professional framing + constraint-awareness
→ QUANTIZATION: Q4 tier recommended
→ IMPLEMENTATION: Structured persona with graceful degradation
[Branch 3] Deployment Constraints = Tight (<512MB, <150 tokens):
→ APPROACH: FEW-SHOT PATTERN
→ PERFORMANCE: 68% UX, 78% context-optimal
→ VALIDATION: W3 diagnostics walkthrough
→ JUSTIFICATION: Best UX achievable under strict constraints
→ QUANTIZATION: Q1/Q4 adaptive routing
→ NOTE: Conversational approach fails here (28% completion)
[Branch 4] FALLBACK (Constraints = Severe):
→ APPROACH: MCD STRUCTURED with enhanced error messages
→ PERFORMANCE: 60% UX (baseline), 92% efficiency
→ COMPROMISE: Sacrifice conversational flow for reliability
→ ENHANCEMENT: Add user-friendly clarification templates
→ VALIDATION: T7 constraint stress test (80% controlled degradation)
Practical Example:
- Scenario: Browser-based appointment booking, moderate constraints
- Selection: Branch 2 → System Role Professional
- Implementation: "Healthcare scheduling assistant" persona + structured prompts
- Expected Performance: 82% UX, 1724ms latency (W1 data)
G.2.3 Quality Priority Decision Tree
When to Use: QUALITY_PRIORITY = HIGH (from G.1 Q2) — Deployments prioritizing accuracy, completeness, and domain expertise over efficiency or UX.
QUALITY_APPROACH_SELECTOR:
[Branch 1] Context = Professional Domain (Healthcare, Legal, Finance):
→ APPROACH: SYSTEM ROLE PROFESSIONAL
→ PERFORMANCE: 86% completion, 82% UX
→ VALIDATION: W1 healthcare, W3 diagnostics
→ RATIONALE: Expertise framing improves accuracy perception
→ QUANTIZATION: Q4/Q8 tier for complex reasoning
→ DOMAINS: Healthcare, diagnostics, formal communication
[Branch 2] Requirement = Technical Accuracy (>90% correctness):
→ APPROACH: HYBRID MULTI-STRATEGY
→ PERFORMANCE: 96% completion, 91% accuracy
→ VALIDATION: W3 system diagnostics (highest quality)
→ WARNING: Requires 75% engineering sophistication threshold
→ IMPLEMENTATION: MCD + Few-Shot + System Role coordination
→ QUANTIZATION: Q8 tier preferred (Llama-3.2-1B, 800MB)
→ TRADEOFF: 2.3x complexity vs MCD alone
[Branch 3] Requirement = Balanced Quality (80-90% target):
→ APPROACH: FEW-SHOT PATTERN
→ PERFORMANCE: 84% completion, balanced across metrics
→ VALIDATION: W2 spatial navigation
→ RATIONALE: Pattern learning without full hybrid complexity
→ QUANTIZATION: Q4 tier optimal
→ ACCESSIBILITY: 89% engineering accessibility (vs 74% hybrid)
[Branch 4] EVALUATION REQUIRED (Ambiguous quality needs):
→ DECISION POINT: Task complexity vs resource availability
→ IF complex_reasoning AND resources_available:
→ TRY: Hybrid Multi-Strategy
→ ELIF moderate_complexity:
→ TRY: Few-Shot Pattern
→ ELSE:
→ FALLBACK: MCD with domain-specific examples
→ VALIDATE: Run T1-style comparison before deployment
Practical Example:
- Scenario: System diagnostics agent, technical accuracy critical
- Selection: Branch 2 → Hybrid Multi-Strategy
- Implementation: MCD routing + Few-Shot diagnostic examples + System Role expertise
- Expected Performance: 96% completion, 91% accuracy (W3 data)
G.2.4 Hybrid Priority Decision Tree
When to Use: HYBRID_PRIORITY = HIGH (from G.1 Q2) — Deployments requiring balanced optimization across efficiency, UX, and quality.
HYBRID_APPROACH_SELECTOR:
[Branch 1] Prompt Engineering Expertise = Advanced (ML engineering team):
→ APPROACH: HYBRID MULTI-STRATEGY
→ COORDINATION: MCD + Few-Shot + System Role
→ PERFORMANCE: Superior across all metrics (W1/W2/W3)
→ ACCESSIBILITY: 74% engineering threshold
→ QUANTIZATION: Dynamic tier routing (Q1→Q4→Q8)
→ MAINTENANCE: High complexity, requires ongoing tuning
→ VALIDATION: All T1-T10 tests + W1-W3 walkthroughs
[Branch 2] Expertise = Moderate (Software engineering background):
→ APPROACH: FEW-SHOT + SYSTEM ROLE (Two-Strategy)
→ COORDINATION: Simpler than full hybrid
→ PERFORMANCE: Good balance without complexity overhead
→ ACCESSIBILITY: 82% engineering threshold
→ QUANTIZATION: Q4 tier with Q8 fallback
→ IMPLEMENTATION: System Role persona + Few-Shot examples
[Branch 3] Expertise = Basic (Product/UX team):
→ APPROACH: MCD + FEW-SHOT
→ PROVEN COMBINATION: 88% efficiency, 86% context-optimal
→ JUSTIFICATION: Validated in W1/W2, accessible implementation
→ ACCESSIBILITY: 94% engineering threshold
→ QUANTIZATION: Start Q4, fallback Q1
→ MAINTENANCE: Low complexity, stable performance
[Branch 4] ITERATIVE STRATEGY (Unknown expertise):
→ START: MCD STRUCTURED baseline
→ MEASURE: Performance across efficiency/UX/quality dimensions
→ ITERATE: Add Few-Shot examples incrementally
→ VALIDATE: T1 approach comparison after each iteration
→ STOP: When improvement <5% for 2 consecutive iterations
→ RESULT: Custom-tuned hybrid adapted to team capabilities
Practical Example:
- Scenario: Navigation assistant, balanced requirements, moderate expertise
- Selection: Branch 2 → Few-Shot + System Role
- Implementation: "Navigation expert" persona + spatial reasoning examples
- Expected Performance: Balanced 80%+ across efficiency/UX/quality (W2 data)
G.2.5 Anti-Pattern Enforcement (Critical Validation)
Purpose: Prevent empirically-validated failure modes that cause catastrophic degradation under constraint conditions.
FORBIDDEN_APPROACHES_VALIDATOR:
[Anti-Pattern 1] Chain-of-Thought under Constraints:
IF approach_includes(CoT) AND constraints == True:
→ REJECT: Empirically validated failures
→ EVIDENCE: T6/T7/T8 browser crashes, token overflow
→ COMPLETION RATE: 2/5 with CoT vs 5/5 with Few-Shot
→ ROOT CAUSE: Reasoning chains exceed token budgets
→ ALTERNATIVE: Replace with Few-Shot examples (T6 validation)
→ EXCEPTION: None — CoT universally incompatible with constraints
[Anti-Pattern 2] Verbose Conversational under Budget Pressure:
IF approach == Conversational AND token_budget < 512:
→ REJECT: 28% completion rate (W1 evidence)
→ EVIDENCE: Conversational requires 1.5x tokens vs MCD
→ FAILURE MODE: Natural language phrasing exceeds budgets
→ ALTERNATIVE: System Role Professional (82% UX at constraints)
→ THRESHOLD: Conversational viable only when budget ≥512
[Anti-Pattern 3] Q8 without Q4 Justification:
IF quantization == Q8 AND NOT performance_inadequacy_at_Q4:
→ REJECT: Violates minimality principle (Section 4.2)
→ EVIDENCE: T10 shows Q4 optimal for 80% of tasks
→ VALIDATION REQUIRED: Document Q4 failures before Q8 escalation
→ RATIONALE: Resource efficiency core to MCD philosophy
→ PROCESS: Try Q4 → Measure drift → Escalate if drift >10%
[Anti-Pattern 4] Unbounded Clarification Loops:
IF clarification_loops == Unbounded:
→ REJECT: 1/4 recovery rate, semantic drift (T5: 2/4 drift)
→ EVIDENCE: Loops >2 iterations cause confusion
→ FAILURE MODE: Progressive semantic drift accumulation
→ ALTERNATIVE: Bounded loops (≤2 iterations, explicit termination)
→ IMPLEMENTATION: Hard limit + graceful escalation message
→ VALIDATION: T3 structured fallback (4/5 success with bounds)
Critical Implementation Note: All four anti-patterns must be checked before deployment. Violations historically correlate with >70% failure rates in constraint conditions.
G.2 Output: Primary approach selected, validated, and anti-pattern checked → PROCEED TO PHASE 3 (Appendix G.3)
Purpose: Systematically apply MCD's three core principles—Minimality by Default, Bounded Rationality, Degeneracy Detection—to validate and refine architectural designs from Phase 2. Each principle includes empirically-derived validation workflows with quantified thresholds from Chapters 4-7.
Critical Context: Phase 3 transforms selected approaches into constraint-compliant architectures through iterative component validation, ensuring every element justifies its token/memory cost through measurable performance contribution.
G.3.1 Step 1: Minimality by Default Validation
Principle Foundation: Remove all components unless empirical evidence demonstrates necessity (Section 4.2). Default assumption: simpler architectures outperform complex ones under constraints.
Q5: Component Necessity Assessment
For Each Component in [Memory, Tools/APIs, Orchestration Layers]:
Q5.1 Memory Component Validation
MEMORY_NECESSITY_TEST:
Question: Can task complete without persistent state?
TEST PROTOCOL (T4 Methodology):
1. Implement stateless regeneration workflow
2. Run 5 trials with explicit context reinjection
3. Run 5 trials with implicit reference (baseline)
4. Measure completion rate for both conditions
DECISION LOGIC:
IF stateless_completion_rate ≥ 90% (5/5 trials succeed):
→ ACTION: REMOVE memory component
→ EVIDENCE: T4 validation shows 5/5 stateless vs 2/5 implicit
→ BENEFIT: -200 tokens, -40MB RAM, +15% latency improvement
→ DOCUMENT: Stateless viability confirmed
→ IMPLEMENTATION: Use explicit slot reinjection (Section 4.2)
ELSE (stateless_rate < 90%):
→ ACTION: KEEP memory, justify with fallback design
→ CALCULATE: Memory Fragility Score (Appendix E.2.2)
→ FORMULA: MFS = state_dependencies / total_interactions
→ THRESHOLD: If MFS > 40% → High fragility, redesign required
→ MITIGATION: Implement hybrid stateless core + external state
Practical Example:
- Task: Healthcare appointment booking (W1)
- Test Results: 5/5 stateless completions with {doctor_type, date, time} reinjection
- Decision: Remove session memory, use explicit slot passing
- Benefit: 200-token reduction, simplified architecture
Q5.2 Tool/API Component Validation
TOOL_UTILIZATION_TEST:
Question: Utilization rate >10%? (T7 Degeneracy Threshold)
MEASUREMENT PROTOCOL:
1. Track tool invocations across test scenarios
2. Calculate: utilization_rate = invocations / total_interactions
3. Measure latency impact: latency_with_tool vs latency_baseline
DECISION LOGIC:
IF utilization_rate < 10%:
→ ACTION: REMOVE tool/API component
→ EVIDENCE: T7 shows <10% triggers degeneracy detection
→ RATIONALE: Maintenance overhead outweighs rare utility
→ DOCUMENT: Degeneracy threshold violated
→ BENEFIT: Reduced complexity, faster response times
IF 10% ≤ utilization_rate < 30%:
→ ACTION: CONDITIONAL KEEP (monitor closely)
→ REQUIREMENT: Document specific use cases justifying inclusion
→ VALIDATE: Latency improvement must be >15% when triggered
→ WARNING: Borderline utility, candidate for future removal
IF utilization_rate ≥ 30%:
→ ACTION: KEEP tool, document usage patterns
→ VALIDATE: Latency improvement justifies inclusion cost
→ MONITOR: Track utilization trends over deployment lifecycle
Practical Example:
- Tool: Medical terminology API for appointment booking
- Utilization: 8% (only triggered for ambiguous specialty names)
- Decision: Remove API, use Few-Shot examples of common specialties
- Benefit: -50ms average latency, simplified deployment
Q5.3 Orchestration Layer Validation
ORCHESTRATION_NECESSITY_TEST:
Question: Does prompt-level routing suffice? (Section 5.3)
TEST PROTOCOL:
1. Implement IF-THEN routing directly in prompt
2. Implement equivalent orchestration layer routing
3. Run T3-style structured fallback test (5 trials each)
4. Measure: completion rate, latency, token cost
DECISION LOGIC:
IF prompt_routing_success ≥ 80% (4/5 trials):
→ ACTION: REMOVE orchestration layer
→ EVIDENCE: T3 shows 4/5 structured fallback success
→ BENEFIT: -30 tokens overhead, -25ms latency
→ DOCUMENT: Prompt-native routing validated
→ IMPLEMENTATION: Use symbolic IF-THEN in prompt text
ELIF prompt_routing_success 60-79% (3/5 trials):
→ ACTION: HYBRID APPROACH
→ DESIGN: Simple router for complex cases only
→ FALLBACK: Default to prompt routing when possible
→ JUSTIFY: Document specific failure modes requiring orchestration
ELSE (prompt_routing < 60%):
→ ACTION: KEEP orchestration layer
→ JUSTIFY: Document complexity vs performance gain
→ VALIDATE: Calculate Redundancy Index (Step 3)
→ THRESHOLD: RI must be ≤10 to justify complexity
Practical Example:
- Task: Navigation routing between {booking, navigation, diagnostic} intents
- Prompt Routing: 4/5 successful classifications with IF-THEN structure
- Orchestration Layer: 5/5 successes but +30 tokens, +25ms latency
- Decision: Remove orchestration, use prompt-native IF-THEN
- Benefit: Simpler architecture, validated performance
G.3.2 Step 2: Bounded Rationality Application
Principle Foundation: Limit reasoning complexity to ≤3 sequential steps; replace natural language reasoning chains with symbolic compression (Section 4.2).
Q6: Reasoning Chain Complexity Assessment
REASONING_COMPLEXITY_ANALYZER:
Task Decomposition Protocol:
1. Break task into atomic reasoning steps
2. COUNT: number_of_sequential_steps
3. IDENTIFY: dependencies between steps
4. MEASURE: token cost per reasoning step
COMPLEXITY DECISION TREE:
IF sequential_steps > 3:
→ RISK_LEVEL = HIGH
→ EVIDENCE: T5 shows semantic drift in 2/4 cases beyond 3 steps
→ WARNING: Failure probability increases exponentially >3 steps
→ PROCEED TO MITIGATION OPTIONS
ELIF sequential_steps = 3:
→ RISK_LEVEL = MODERATE
→ ACTION: Apply symbolic compression (Option 1)
→ VALIDATE: Ensure no cascading failures
→ MONITOR: Track drift rates in production
ELIF sequential_steps < 3:
→ RISK_LEVEL = LOW
→ ACTION: PROCEED with bounded reasoning design
→ VALIDATION: Standard T1-style testing sufficient
Mitigation Options for High-Complexity Tasks (>3 steps)
COMPLEXITY_REDUCTION_STRATEGIES:
[Option 1] Symbolic Compression:
TECHNIQUE: Replace natural language with symbolic logic
BEFORE (Natural Language, 45 tokens):
"Think carefully about the route from your current location to
the destination, considering all landmarks and directions..."
AFTER (Symbolic, 12 tokens):
"Calculate: current_pos → landmarks → destination"
VALIDATION: Maintains semantics, reduces token cost 73%
EVIDENCE: W2 navigation shows equivalent accuracy
BENEFIT: -33 tokens per reasoning step
⭐ ADAPTATION PATTERN NOTE:
Symbolic compression effectiveness varies by domain structure (Section 5.2.1):
- Semi-Static domains (W2 navigation): Deterministic rules enable aggressive compression
- Dynamic domains (W1 booking, W3 diagnostics): Moderate compression with adaptive logic
Implementation guidance: See G.4.1 Adaptation Pattern Classification
[Option 2] Task Decomposition:
TECHNIQUE: Split into independent sub-agents
DESIGN:
- Each sub-agent: ≤3 reasoning steps maximum
- Coordination: Sequential execution, NOT chained reasoning
- State passing: Explicit outputs → explicit inputs
EXAMPLE (System Diagnostics):
- Sub-agent 1: Symptom classification (2 steps)
- Sub-agent 2: Priority assignment (2 steps)
- Sub-agent 3: Action recommendation (2 steps)
Total: 6 steps divided into 3 independent agents
VALIDATION: T3 shows modular agents maintain 4/5 success rate
TRADEOFF: +50ms coordination latency, but safer than chaining
[Option 3] Chain-of-Thought Replacement (CRITICAL):
RULE: IF CoT seems necessary → FORBIDDEN under constraints
EVIDENCE: T6/T7/T8 show catastrophic CoT failures
- T6: 2/5 completion rate with CoT vs 5/5 with Few-Shot
- T7: Browser crashes with CoT under memory pressure
- T8: Token overflow in 4/5 WASM deployments
ALTERNATIVE: Few-Shot examples showing reasoning patterns
BEFORE (CoT, 120 tokens):
"Let's think step by step. First, I need to understand..."
AFTER (Few-Shot, 60 tokens):
Example 1: Input X → Output Y (reasoning implicit in examples)
Example 2: Input A → Output B
Apply to current: Input Z → Output ?
VALIDATION: T6 shows 5/5 Few-Shot success vs 2/5 CoT
BENEFIT: 2x token reduction, 100% reliability improvement
Q7: Token Budget Allocation
TOKEN_BUDGET_ALLOCATOR:
Input: Total_Budget (from G.1 Q4)
ALLOCATION FORMULA (Empirically Validated):
Core_Logic: 40-60% of Total_Budget
Fallback_Handling: 20-30% of Total_Budget
Input_Processing: 10-20% of Total_Budget
Buffer_Variations: 10-15% of Total_Budget
VALIDATION CHECKS:
CHECK 1: Budget sum must equal 100%
IF SUM(allocations) ≠ 1.0:
→ ERROR: "Budget allocation must total 100%"
→ ACTION: Rebalance percentages
CHECK 2: Core logic must dominate
IF Core_Logic < 40% OR Core_Logic > 60%:
→ WARNING: "Unbalanced allocation may cause failures"
→ RECOMMENDATION: Shift tokens to core from buffer/input
CHECK 3: Fallback budget adequate
IF Fallback < 20%:
→ ERROR: "Insufficient fallback budget"
→ EVIDENCE: T3/T7 show ≥20% required for recovery
WORKED EXAMPLE (Total_Budget = 80 tokens):
Allocation Calculation:
Core_Logic: 48 tokens (60% - upper bound for complex task)
Fallback: 20 tokens (25% - mid-range for safety)
Input: 8 tokens (10% - minimal for slot extraction)
Buffer: 4 tokens ( 5% - tight but acceptable)
─────────────────────────────
Total: 80 tokens (100% ✓)
Validation:
✓ Core dominates (60%)
✓ Fallback adequate (25%)
✓ Sum equals 100%
→ APPROVED for deployment
Critical Note: Token budgets <60 total require proportional adjustment but maintain relative percentages. For example, 50-token budget: Core 30 (60%), Fallback 10 (20%), Input 5 (10%), Buffer 5 (10%).
G.3.3 Step 3: Degeneracy Detection
Principle Foundation: Quantify component value through Redundancy Index; remove elements contributing <10% marginal improvement (T6 methodology).
Q8: Redundancy Index Calculation
REDUNDANCY_INDEX_PROTOCOL:
FORMULA:
RI = excess_tokens / marginal_correctness_improvement
MEASUREMENT PROCEDURE:
STEP 1: Establish Baseline
- Implement minimal prompt (Section 4.2 guidance)
- Run 5 test trials across representative scenarios
- MEASURE:
* task_success_rate_baseline (0-100%)
* token_count_baseline
* latency_baseline (ms)
STEP 2: Test Enhanced Version
- Add proposed component/feature to baseline
- Run 5 test trials with identical scenarios
- MEASURE:
* task_success_rate_enhanced (0-100%)
* token_count_enhanced
* latency_enhanced (ms)
STEP 3: Calculate Metrics
excess_tokens = token_count_enhanced - token_count_baseline
improvement = task_success_rate_enhanced - task_success_rate_baseline
RI = excess_tokens / improvement
latency_overhead = latency_enhanced - latency_baseline
INTERPRETATION THRESHOLDS:
IF RI > 10:
→ CLASSIFICATION: OVER-ENGINEERED
→ EVIDENCE: T6 verbose case study
* Verbose prompt: 145 tokens
* Minimal prompt: 58 tokens
* Improvement: +0.2 on 0-4 scale (+5% absolute)
* RI = (145-58) / 0.05 = 87 / 0.05 = 1,740
* Conclusion: Extreme over-engineering
→ ACTION: Remove enhancement, revert to baseline
→ BENEFIT: Token savings without performance loss
IF 5 ≤ RI ≤ 10:
→ CLASSIFICATION: BORDERLINE ACCEPTABLE
→ ACTION: Conditional keep with monitoring
→ REQUIREMENT: Document specific justification
→ REVIEW: Reassess after deployment data collection
IF RI < 5:
→ CLASSIFICATION: JUSTIFIED COMPLEXITY
→ ACTION: Keep enhanced version
→ RATIONALE: Improvement justifies token cost
→ DOCUMENT: RI value for future reference
Worked Example:
CASE STUDY: Healthcare Booking Enhanced Clarification
Baseline Version:
- Token count: 65 tokens
- Success rate: 84% (21/25 trials)
- Latency: 380ms
Enhanced Version (added multi-turn clarification):
- Token count: 95 tokens
- Success rate: 92% (23/25 trials)
- Latency: 450ms
Calculation:
excess_tokens = 95 - 65 = 30 tokens
improvement = 0.92 - 0.84 = 0.08 (8%)
RI = 30 / 0.08 = 375
latency_overhead = 450 - 380 = +70ms
Interpretation:
RI = 375 >> 10 → OVER-ENGINEERED
Decision: Remove multi-turn clarification
Alternative: Single-turn bounded clarification (RI = 6.2, acceptable)
Q9: Usage Pattern Analysis
USAGE_PATTERN_VALIDATOR:
FOR EACH component_or_pathway IN architecture:
METRIC: Utilization Rate
utilization_rate = actual_uses / total_possible_uses
DECISION LOGIC:
IF utilization_rate < 10%:
→ FLAG: Unused or rarely-triggered component
→ ACTION: REMOVE component immediately
→ RATIONALE: Maintenance cost exceeds rare utility
→ DOCUMENT: "Degeneracy threshold violated"
→ CROSS-CHECK: Verify no edge-case dependencies
IF 10% ≤ utilization_rate < 25%:
→ FLAG: Low-usage component
→ ACTION: Mark for review after deployment
→ MONITOR: Track trend over time (increasing/decreasing)
→ CONDITION: Keep if critical for edge cases
IF utilization_rate ≥ 25%:
→ STATUS: VALIDATED
→ ACTION: Keep component
→ DOCUMENT: Usage patterns for long-term monitoring
→ OPTIMIZE: Consider frequency-based caching
DEAD PATH DETECTION:
FOR EACH decision_pathway IN prompt_logic:
IF pathway_triggered_count == 0 across all test cases:
→ ALERT: "DEAD PATH IDENTIFIED"
→ INVESTIGATION: Why was pathway never triggered?
* Unreachable condition?
* Redundant with other pathways?
* Test coverage gap?
→ ACTION OPTIONS:
1. Remove dead pathway entirely
2. Merge with active pathways
3. Add test coverage if genuinely needed
→ UPDATE: Decision tree structure after removal
Practical Example:
CASE STUDY: Indoor Navigation Agent Path Analysis (W2 Domain)
Pathway Usage Results (n=100 navigation queries):
- direct_route: 52 triggers (52% utilization) → KEEP ✓
- obstacle_avoidance: 31 triggers (31% utilization) → KEEP ✓
- multi_waypoint: 11 triggers (11% utilization) → KEEP ✓
- accessibility_route: 4 triggers ( 4% utilization) → REMOVE ✗
- emergency_exit: 2 triggers ( 2% utilization) → REMOVE ✗
- scenic_route: 0 triggers ( 0% utilization) → REMOVE ✗ (DEAD PATH)
Actions Taken:
1. Remove accessibility_route pathway (below 10% threshold)
- Justification: Specialized requests should escalate to human assistance
2. Remove emergency_exit pathway (below 10% threshold)
- Justification: Safety-critical routing requires real-time fire alarm integration
3. Remove scenic_route pathway (never triggered)
- Justification: Dead path with no real-world usage patterns
4. Token savings: -22 tokens from removed pathways
5. Simplified decision tree: 6 branches → 3 branches
6. Latency improvement: -15ms average
Result: Focused navigation agent maintains 94% route success (direct + obstacle + waypoint)
with 27% token reduction and improved response times
G.3 Output: Clean minimal architecture validated through three-principle workflow → PROCEED TO PHASE 4 (Appendix G.4)
Purpose: Implement validated MCD architecture from Phase 3 through three-layer structure—Prompt Layer (intent classification/slot extraction), Control Layer (routing logic), Execution Layer (quantization-aware model selection). Each layer includes constraint validation and empirical thresholds from T3/T5/T10.
Critical Context: Layer separation enables modular testing, maintenance, and dynamic tier routing while maintaining stateless operation principles.
G.4.1 Layer 1: Prompt Layer Design (With Adaptation Patterns)
Purpose: Embed decision logic directly into prompt text using IF-THEN structures, intent classification trees, and slot extraction workflows. Implementation strategy varies by task structure following Table 5.1 adaptation pattern taxonomy from Section 5.2.1.
Critical Design Principle: Match prompt logic complexity to task structure—over-engineering navigation wastes tokens; under-engineering diagnostics fails variable patterns (Section 5.2.1).
Adaptation Pattern Classification (Table 5.1 Integration)
Before Implementation: Determine adaptation mechanism based on task characteristics.
Pattern Type | When to Use | Implementation Strategy | Validation Evidence |
---|---|---|---|
Dynamic | Natural language variability, unpredictable information density | Conditional slot extraction with runtime intent parsing | W1: 84% completion with dynamic slot-filling |
Semi-Static | Structured relationships, mathematical transformations | Deterministic coordinate calculations with fixed rules | W2: 85% success with coordinate logic |
Dynamic | Heuristic classification, variable complexity patterns | Adaptive category routing with priority-based sequencing | W3: 91% accuracy with heuristic classification |
Intent Classification Decision Tree Structure
# Pseudocode for Prompt Layer Intent Classification
# Constraints: Depth ≤3, Branches ≤4, Token ≤25% budget per path
def intent_classification_tree(user_input):
"""
ROOT-level intent detection with bounded complexity.
Validation Constraints (T5/T3):
- Maximum depth: ≤3 levels
- Branching factor: ≤4 per node
- Token allocation: ≤25% total budget per path
- Fallback: Every path must have explicit recovery
"""
# PRIMARY INTENT DETECTION (Level 0)
primary_intent = classify_primary_intent(user_input)
if primary_intent == "booking":
# ADAPTATION PATTERN: Dynamic (Section 5.2.1, W1)
return booking_subtree(user_input, depth=1)
elif primary_intent == "navigation":
# ADAPTATION PATTERN: Semi-Static (Section 5.2.1, W2)
return navigation_subtree(user_input, depth=1)
elif primary_intent == "diagnostic":
# ADAPTATION PATTERN: Dynamic (Section 5.2.1, W3)
return diagnostic_subtree(user_input, depth=1)
else: # DEFAULT FALLBACK (T3: 4/5 success with explicit fallback)
return escalation_node(
message="Intent unclear. Please specify: booking, navigation, or diagnostic.",
retry_allowed=True,
max_retries=2 # Bounded loops (G.2.5 Anti-Pattern 4)
)
Pattern 1: Dynamic Slot-Filling (W1 - Healthcare Booking)
Design Rationale: Natural language appointment requests vary unpredictably in information density, requiring conditional slot identification with runtime adaptation (Section 5.2.1).
def booking_subtree(user_input, depth):
"""
ADAPTATION PATTERN: Dynamic Slot-Filling (W1 domain).
Characteristics (Section 5.2.1):
- Conditional slot extraction with variable missing-data prompts
- Natural language request variability requires runtime intent parsing
- Information density unpredictable (complete vs partial inputs)
Slot Structure: {doctor_type, date, time}
Validation: W1 shows 84% completion with dynamic adaptation
Token Budget: ≤40% total (from G.3.2 Q7)
"""
# DEPTH LIMIT ENFORCEMENT (T5 validation)
if depth > 3:
return fallback_response(
message="Booking request too complex. Please simplify.",
escalation_recommended=True
)
# DYNAMIC SLOT EXTRACTION (Level 1)
# Adapts to variable input completeness
slots = extract_slots(user_input) # Returns: {doctor_type, date, time}
# COMPLETENESS CHECK (Level 2)
# Different paths based on information density
if slots_complete(slots):
# Complete input: "Cardiology tomorrow at 2pm"
return confirm_booking(slots)
# Output: "Confirmed Cardiology, tomorrow, 2PM. ID [generated]"
else:
# ADAPTIVE CLARIFICATION (Level 3 - Maximum depth)
# Identifies specific missing slots dynamically
missing_slots = identify_missing_slots(slots)
# Example adaptive behavior (Section 5.2.1):
# Input: "I want to book an appointment"
# → Output: "Missing [time, date, type] for appointment"
return clarify_missing_slots(
missing=missing_slots,
partial_context=serialize_slots(slots), # T4: Explicit state passing
depth=depth + 1
)
Pattern 2: Semi-Static Deterministic Logic (W2 - Navigation)
Design Rationale: Navigation operates on structured coordinate systems with fixed spatial relationships, enabling mathematical transformation rules rather than NLP interpretation (Section 5.2.1).
def navigation_subtree(user_input, depth):
"""
ADAPTATION PATTERN: Semi-Static Deterministic (W2 domain).
Characteristics (Section 5.2.1):
- Deterministic coordinate calculations with fixed directional rules
- Structured spatial relationships enable mathematical transformations
- Predictable logic follows coordinate geometry, not natural language parsing
Logic: Stateless coordinate transformation (A1→B3 = North 2m, East 1m)
Validation: W2 shows 85% success with symbolic compression
Token Budget: ≤25% per path (constrained spatial reasoning)
"""
# DEPTH LIMIT ENFORCEMENT
if depth > 3:
return fallback_response(
message="Route too complex. Provide simpler waypoints.",
simplification_hint="Use landmarks: library, cafeteria, main entrance"
)
# DETERMINISTIC SPATIAL PARSING (Level 1)
# Follows fixed mathematical rules, not adaptive interpretation
route = parse_spatial_instructions(user_input)
# Returns: {start_pos, landmarks[], destination, direction}
# VALIDITY CHECK (Level 2)
# Coordinate transformation validation
if route_valid(route):
# SEMI-STATIC EXECUTION
# Fixed directional calculations from coordinate pairs
# Example (Section 5.2.1):
# Input: "Navigate from A1 to B3"
# → Output: "North 2m, East 1m"
# Input: "A1 to B3, avoid C2"
# → Output: "North 2m (avoid C2), East 1m"
return execute_navigation(route)
else:
# SPATIAL CLARIFICATION (Level 3)
# Still deterministic: requires structured coordinate/landmark
return clarify_spatial_reference(
message="Unclear location. Specify building/floor/landmark.",
current_context=route.start_pos, # Stateless context passing
expected_format="Use format: [Building][Floor][Room] or [Landmark]"
)
Implementation Note: While MCD maintains stateless prompt architecture (consistency principle), the underlying logic is deterministic coordinate transformation that could theoretically be hardcoded as functions. MCD embeds this logic in prompts for deployment flexibility (Section 5.2.1).
Pattern 3: Dynamic Heuristic Classification (W3 - Diagnostics)
Design Rationale: System diagnostics require adaptive pattern matching across multiple categories with variable complexity, demanding heuristic routing that adjusts to issue characteristics (Section 5.2.1).
def diagnostic_subtree(user_input, depth):
"""
ADAPTATION PATTERN: Dynamic Heuristic Classification (W3 domain).
Characteristics (Section 5.2.1):
- Heuristic category routing with priority-based step sequencing
- Issue complexity variation demands adaptive classification paths
- Multiple categories (Infrastructure, Software, Network) with varying priorities
Logic: Heuristic classification P1/P2/P3 with adaptive step sequences
Validation: W3 shows 91% accuracy with bounded scope
Token Budget: ≤30% (diagnostic reasoning requires more tokens)
"""
# DEPTH LIMIT ENFORCEMENT
if depth > 3:
return fallback_response(
message="Diagnostic too complex. Break into sub-issues.",
recommendation="Focus on primary symptom first"
)
# DYNAMIC SYMPTOM CLASSIFICATION (Level 1)
# Adapts to variable diagnostic information availability
symptoms = extract_symptoms(user_input)
# ADAPTIVE PRIORITY ASSIGNMENT (Level 2)
# Different routing based on symptom clarity and severity
if symptoms_clear(symptoms):
# HEURISTIC CLASSIFICATION
priority = classify_priority(symptoms) # P1/P2/P3
# Adaptive step sequencing based on category and priority
# Example (Section 5.2.1):
# Input: "Server crash"
# → Output: "Category: Infrastructure, Priority: P1,
# Steps: [Check logs→services→hardware]"
return diagnostic_recommendation(symptoms, priority)
else:
# ADAPTIVE CLARIFICATION (Level 3)
# Requests specific diagnostic information based on ambiguity type
# Example (Section 5.2.1):
# Input: "Something's slow"
# → Output: "Insufficient data for classification"
return request_symptom_details(
message="Provide: error code, frequency, impact on operations.",
bounded_scope="Focus on most critical issue only", # Prevent scope creep
classification_hint="Specify: Server/Network/Application/Database"
)
Architectural Decision Guide (Table 5.1 Application)
def select_adaptation_pattern(task_characteristics):
"""
Match implementation pattern to task structure (Section 5.2.1).
Critical Principle: Over-engineering navigation wastes tokens;
under-engineering diagnostics fails variable patterns.
"""
# PATTERN SELECTION DECISION TREE
if task_characteristics["information_density"] == "unpredictable":
if task_characteristics["requires_nlu_parsing"] == True:
return "DYNAMIC" # W1 Healthcare, W3 Diagnostics
# Rationale: Natural language variability demands runtime adaptation
elif task_characteristics["has_structured_relationships"] == True:
if task_characteristics["allows_mathematical_transform"] == True:
return "SEMI-STATIC" # W2 Navigation
# Rationale: Fixed spatial logic enables deterministic calculation
elif task_characteristics["requires_heuristic_classification"] == True:
if task_characteristics["complexity_varies"] == True:
return "DYNAMIC" # W3 Diagnostics
# Rationale: Issue patterns require adaptive routing
else:
return "DYNAMIC" # Default to dynamic for safety (handles variability)
G.4.2 Layer 2: Control Layer Decision Tree
Purpose: Route user inputs through appropriate processing paths based on complexity classification. Node complexity ≤5 decision points, path depth ≤3 levels (validated in T3/T7).
Route Selection Control Logic
def control_layer_router(user_input, context):
"""
Control layer decision tree architecture.
Constraints:
- Node complexity: ≤5 decision points per node
- Path depth: ≤3 levels maximum
- Exit conditions: Explicitly defined for all paths
- Fallback routes: From every decision point (T3/T7/T9 validation)
"""
# INPUT COMPLEXITY CLASSIFICATION
input_classification = classify_input_complexity(user_input)
# ROUTING DECISION TREE
if input_classification == "simple_query":
# Single-turn resolution, no state tracking needed
return direct_response_path(user_input)
elif input_classification == "complex_request":
# Multi-step workflow with state management
return multi_step_path(user_input, context)
elif input_classification == "ambiguous_input":
# Clarification required before processing
return clarification_path(user_input)
elif input_classification == "invalid_input":
# Error handling with recovery guidance
return error_handling_path(user_input)
else:
# FALLBACK: Unrecognized pattern
return fallback_escalation(
message="Unrecognized input pattern.",
suggestion="Rephrase or contact support."
)
def multi_step_path(user_input, context):
"""
Multi-step workflow implementation (e.g., booking, diagnostics).
Example: Healthcare booking workflow (W1)
Validation: Each step has explicit exit condition
State Management: Stateless with explicit context passing (T4)
"""
# DETERMINE CURRENT WORKFLOW STEP
step = determine_current_step(context)
# STEP 1: INTENT CLASSIFICATION
if step == "intent_classification":
intent = classify_intent(user_input)
context.update({"intent": intent, "step_count": 1})
return transition_to_step("slot_extraction")
# STEP 2: SLOT EXTRACTION
elif step == "slot_extraction":
slots = extract_slots(user_input)
context.update({"slots": slots, "step_count": 2})
if slots_complete(slots):
return transition_to_step("validation")
else:
# BOUNDED CLARIFICATION (≤2 iterations)
return clarification_path(identify_missing(slots))
# STEP 3: VALIDATION
elif step == "validation":
validated = validate_booking(context["slots"])
context.update({"validated": validated, "step_count": 3})
if validated:
return transition_to_step("confirmation")
else:
return error_handling_path("Validation failed: " + validated.error)
# STEP 4: CONFIRMATION
elif step == "confirmation":
return complete_booking(context["slots"])
# FALLBACK: Inconsistent workflow state
else:
return fallback_escalation(
message="Workflow state inconsistent.",
context_snapshot=context,
recovery="Restart from intent classification."
)
def ensure_fallback_coverage(control_tree):
"""
Validation function: Every node must have explicit fallback route.
Evidence: T3/T7/T9 show ≥80% controlled degradation with fallbacks
"""
for node in control_tree.all_nodes():
assert node.has_fallback() == True, \
f"CRITICAL: Node {node.id} missing fallback (T3/T7/T9 requirement)"
G.4.3 Layer 3: Execution Layer (Quantization-Aware)
Purpose: Select optimal quantization tier based on task complexity and hardware constraints, with dynamic routing Q1→Q4→Q8 when drift detected (T10 validation).
Quantization Tier Selection
def quantization_tier_selector(task_complexity, hardware_constraints):
"""
Quantization-aware execution with dynamic tier routing.
Based on T10 findings:
- Q4 optimal for 80% of tasks
- Q1→Q4 escalation when drift >10%
- Q4→Q8 escalation when performance inadequate (<80%)
Parameters:
task_complexity: "simple" | "moderate" | "complex"
hardware_constraints: {"ram_mb": int, "platform": str}
"""
# TASK COMPLEXITY ASSESSMENT
if task_complexity == "simple": # FAQ, basic classification
return try_q1_with_fallback()
elif task_complexity == "moderate": # Slot-filling, navigation
return start_with_q4()
elif task_complexity == "complex": # Multi-step reasoning
return start_with_q8()
else:
# HARDWARE OVERRIDE: Constraints supersede task complexity
return hardware_constraint_override(hardware_constraints)
def try_q1_with_fallback():
"""
Q1 tier: Ultra-minimal (Qwen2-0.5B, 300MB).
Strategy: Start with Q1 for efficiency, escalate if drift detected.
Validation: T10 shows 85% retention under Q1, 15% require escalation.
"""
# LOAD Q1 MODEL
model = load_model(tier="Q1", model_name="Qwen2-0.5B-Q1")
response = model.generate(prompt)
# SEMANTIC DRIFT DETECTION (T10 methodology)
drift_score = calculate_semantic_drift(response, expected_output)
if drift_score > 0.10: # T10 threshold: >10% drift
logger.warning(f"Q1 drift detected: {drift_score:.2f} > 0.10")
logger.info("Escalating to Q4 tier...")
return fallback_to_q4()
else:
logger.info(f"Q1 optimal efficiency: drift={drift_score:.2f}")
return response
def start_with_q4():
"""
Q4 tier: Optimal balance (TinyLlama-1.1B, 560MB).
Evidence: T8 validation shows Q4 optimal for browser/WASM
Performance: 95% task success rate, 430ms average latency
"""
# LOAD Q4 MODEL
model = load_model(tier="Q4", model_name="TinyLlama-1.1B-Q4")
response = model.generate(prompt)
# PERFORMANCE EVALUATION
performance_score = evaluate_performance(response)
if performance_score < 0.80: # Performance inadequate threshold
logger.warning(f"Q4 insufficient: performance={performance_score:.2f}")
logger.info("Escalating to Q8 tier...")
return escalate_to_q8()
else:
logger.info(f"Q4 validated sweet spot: performance={performance_score:.2f}")
return response
def start_with_q8():
"""
Q8 tier: Complex reasoning (Llama-3.2-1B, 800MB).
Use case: Multi-step diagnostics, complex spatial reasoning
Validation: Required Q4 justification per G.2.5 Anti-Pattern 3
"""
# LOAD Q8 MODEL
model = load_model(tier="Q8", model_name="Llama-3.2-1B-Q8")
response = model.generate(prompt)
# OVERPROVISIONING CHECK
if is_overprovisioned(response, task_complexity):
logger.info("Q8 overkill detected, downgrading to Q4...")
return downgrade_to_q4()
else:
logger.info("Q8 necessary for task complexity")
return response
def hardware_constraint_override(constraints):
"""
Hardware limitations override task complexity decisions.
Priority: Hardware constraints > Task complexity preferences
Evidence: T8 shows platform-specific optimal tiers
"""
ram_available = constraints["ram_mb"]
platform = constraints.get("platform", "unknown")
# CONSTRAINT 1: Severe RAM limitation
if ram_available < 256:
logger.warning(f"RAM {ram_available}MB < 256MB: Forcing Q1/Q4 only")
return force_q1_q4_only()
# CONSTRAINT 2: Moderate RAM limitation
elif 256 <= ram_available < 1024:
logger.info(f"RAM {ram_available}MB: Q4/Q8 acceptable")
return allow_q4_q8()
# CONSTRAINT 3: Browser/WASM platform
elif platform == "browser_wasm":
logger.info("Browser/WASM detected: Q4 optimal (T8 validation)")
return force_q4_tier()
# CONSTRAINT 4: Unconstrained
else:
logger.info(f"RAM {ram_available}MB >1GB: All tiers available")
return allow_all_tiers()
def dynamic_tier_router(prompt, initial_tier="Q4"):
"""
Continuous monitoring with automatic escalation/degradation.
Adaptive Strategy: Start conservative, adjust based on drift
Validation: T10 shows dynamic routing improves efficiency 18%
"""
current_tier = initial_tier
drift_history = []
max_iterations = 3 # Prevent infinite escalation loops
for iteration in range(max_iterations):
# GENERATE WITH CURRENT TIER
response = generate_with_tier(prompt, current_tier)
# CALCULATE DRIFT
drift = calculate_semantic_drift(response)
drift_history.append(drift)
# ESCALATION DECISION
if drift > 0.10 and current_tier < "Q8":
logger.info(f"Iteration {iteration}: Drift {drift:.2f} >10%, escalating")
current_tier = escalate_tier(current_tier)
continue
# DEGRADATION DECISION
elif drift < 0.05 and current_tier > "Q1" and is_overprovisioned():
logger.info(f"Iteration {iteration}: Drift {drift:.2f} <5%, downgrading")
current_tier = degrade_tier(current_tier)
continue
# STABLE TIER FOUND
else:
logger.info(f"Tier {current_tier} stable: drift={drift:.2f}")
return response
# MAX ITERATIONS REACHED
logger.warning(f"Max iterations reached, using tier {current_tier}")
return response
G.4 Output: Three-layer architecture with embedded decision logic and quantization-aware execution → PROCEED TO PHASE 5 (Appendix G.5)
Purpose: Validate MCD implementation against empirical thresholds from Chapters 6-7 using T1-T10 test methodologies and W1-W3 domain-specific protocols. All tests reference established baselines with quantified pass/fail criteria.
G.5.1 Core MCD Validation Suite (T1-T10 Protocols)
Test | Objective | Pass Threshold | Evidence Source |
---|---|---|---|
T1-Style | Approach effectiveness vs alternatives | ≥90% expected performance | Chapter 6.2 |
T4-Style | Stateless context reconstruction | ≥90% recovery (5/5 vs 2/5 implicit) | Section 6.3.4 |
T6-Style | Over-engineering detection (RI calculation) | RI ≤10, no components >20% overhead | Section 6.3.6 |
T7-Style | Constraint stress testing | ≥80% controlled failure, no hallucination | Section 6.3.7 |
T8-Style | Deployment environment (browser/WASM) | Zero crashes, <500MB RAM, <500ms latency | Section 6.3.8 |
T10-Style | Quantization tier validation (Q1→Q4→Q8) | Optimal tier selected ≥90% cases | Section 6.3.10 |
Implementation Note: Run each test with n=5 trials minimum per configuration. Calculate 95% confidence intervals for completion rates. Document all failures with root cause analysis.
G.5.2 Domain-Specific Validation (W1-W3 Protocols)
W1 Protocol (Healthcare Booking):
- Task domain deployment with comparative performance vs Few-Shot/Conversational
- Metrics: Completion rate, token efficiency, latency, UX score
- Target: ≥85% completion under Q4 constraints (Chapter 7.2)
W2 Protocol (Spatial Navigation):
- Real-world scenario execution under stateless constraints
- Metrics: Route accuracy, coordinate precision, safety communication
- Target: ≥80% successful navigation with transparent limitation acknowledgment (Chapter 7.3)
W3 Protocol (System Diagnostics):
- Failure mode documentation with priority classification (P1/P2/P3)
- Metrics: Diagnostic accuracy, bounded scope adherence, systematic troubleshooting
- Target: ≥85% correct priority assignment, no fabricated root causes (Chapter 7.4)
G.5.3 Multi-Dimensional Diagnostic Checks
Decision Tree Health Metrics:
- Average path length: ≤3 levels (T5 constraint)
- Branching factor: ≤4 per node (complexity limit)
- Fallback activation frequency: Monitor for >15% (indicates edge case gaps)
- Dead paths: Zero unused routes after test coverage
Context-Optimality Scoring:
- Resource-constrained: Efficiency score ≥80%
- User experience: UX score ≥75%
- Professional quality: Quality score ≥85%
Performance vs Complexity Analysis:
- Plot: Efficiency vs resource usage
- Identify: Pareto frontier for optimal trade-offs
- Validate: Token cost justified by measurable improvement
G.5.4 Final Deployment Decision Matrix
DEPLOYMENT_READINESS_CHECKLIST:
✓ Core Tests (T1-T10):
- All tests PASS with thresholds met
- Decision trees validated (depth ≤3, branches ≤4)
- Redundancy Index ≤10 for all components
✓ Domain Tests (W1-W3):
- Representative domain scenarios tested
- Comparative analysis vs baseline approaches documented
- Failure modes characterized with recovery strategies
✓ Context Requirements:
- Efficiency priority → Score ≥80%
- UX priority → Score ≥75%
- Quality priority → Score ≥85%
DECISION LOGIC:
IF all_core_tests == PASS AND domain_validation == PASS AND context_requirements == MET:
→ DEPLOY MCD AGENT ✅
→ Document: Performance baselines, monitoring thresholds
ELSE:
→ RETURN TO FAILED PHASE for redesign
→ Document: Specific failure modes, remediation plan
→ ITERATE: Fix issues, re-run validation
UNSUITABLE DETERMINATION:
IF multiple_iterations_fail OR fundamental_constraint_mismatch:
→ Recommend alternative frameworks (LangChain, AutoGPT)
→ Document: Justification with empirical evidence
G.5.5 Monitoring Integration Post-Deployment
Ongoing Validation (Production Environment):
- Semantic Drift Monitor: Continuous comparison across quantization tiers, alert if drift >10%
- Dynamic Tier Selection: Automatic Q1→Q4→Q8 escalation with performance tracking
- Performance Benchmarking: Weekly validation against established efficiency thresholds from Chapter 6
- Usage Pattern Analysis: Monthly review of component utilization, flag if any <10% (G.3.3 Q9)
G.5 Output: Validated MCD implementation ready for deployment with documented performance characteristics and monitoring plan.