Thesis Home

Designing Lightweight AI Agents for Edge Deployment

A Minimal Capability Framework with Insights from Literature Synthesis

Appendix G: MCD Framework Decision Tree Implementation

This appendix provides comprehensive implementation guidance for the MCD Framework Application Decision Tree introduced in Section 8.7.2. Practitioners applying MCD principles to real-world deployment scenarios should consult this appendix for detailed decision logic, validation workflows, and empirically-derived thresholds from Chapters 4-7.

Structure Overview:

G.1 – Phase 1: Context assessment questions and priority classification
G.2 – Phase 2: Approach selection decision trees with anti-pattern enforcement
G.3 – Phase 3: MCD principle validation workflows (minimality, rationality, degeneracy)
G.4 – Phase 4: Three-layer implementation with pseudocode examples
G.5 – Phase 5: Evidence-based validation test protocols

Each decision point includes empirical thresholds (e.g., token budgets, complexity limits, performance criteria) validated through browser-based simulations (T1-T10) and domain walkthroughs (W1-W3), ensuring practitioners can apply MCD framework with quantified deployment expectations.
In Reference to Chapter 8 - MCD Framework Decision Tree

G.1 Phase 1: Context Assessment & Requirements Analysis

Purpose: Establish deployment profile through systematic questioning, determining whether MCD principles align with task requirements and resource constraints.

Q1: Primary Deployment Context Classification

CONTEXT_DECISION_TREE:

  IF deployment IN [Edge Device, RAM <1GB, Offline, Battery-Powered]:
    → CONTEXT = CONSTRAINED
    → RATIONALE: Hardware limits require resource-efficient approaches
    → PROCEED TO Q2

  ELIF deployment IN [Browser, WebAssembly, Client-Side]:
    → CONTEXT = BROWSER_EDGE
    → RATIONALE: WASM environment validated in T8 (Q4 tier optimal)
    → PROCEED TO Q2

  ELIF deployment IN [Cloud, Full-Stack, RAM >2GB]:
    → EXIT_RECOMMENDATION: AutoGPT, LangChain, LangGraph
    → RATIONALE: Resource abundance enables richer frameworks
    → MCD not optimal for unconstrained environments

  ELIF deployment == Hybrid:
    → CONTEXT = HYBRID_CONSTRAINTS
    → PROCEED TO Q2 with detailed constraint profiling

Deployment Context Examples:

Constrained: Raspberry Pi, Jetson Nano, smartphone edge inference
Browser Edge: In-browser agents, PWAs, WebAssembly deployment
Hybrid: Progressive enhancement (edge-first with cloud fallback)

Q2: Optimization Priority Assignment

PRIORITY_MATRIX:

  [1] Resource Efficiency (EFFICIENCY_PRIORITY = HIGH):
      → Optimization: Token minimization, memory footprint, latency
      → Empirical validation: T1/T6 token efficiency analysis

  [2] User Experience Quality (UX_PRIORITY = HIGH):
      → Optimization: Natural language, conversation flow, error messages
      → Empirical validation: W1 UX scoring (89% conversational vs 68% MCD)

  [3] Professional Output (QUALITY_PRIORITY = HIGH):
      → Optimization: Accuracy, completeness, domain expertise
      → Empirical validation: W3 diagnostic quality (96% hybrid vs 84% MCD)

  [4] Educational/Learning (EDUCATION_PRIORITY = HIGH):
      → Optimization: Explanatory depth, pedagogical structure
      → Use case: Tutoring agents, learning assistants

  [5] Balanced Multi-Objective (HYBRID_PRIORITY = HIGH):
      → Optimization: Weighted balance across dimensions
      → Requires advanced prompt engineering (74% accessibility threshold)

Note: Priority selection determines approach selection in Phase 2.

Q3: Stateless Capability Assessment

STATELESS_VALIDATION_CHECKLIST:

  Task Requirements Analysis:
    [Q3.1] Persistent conversation history needed? YES/NO
    [Q3.2] Learning across sessions required? YES/NO
    [Q3.3] Cumulative knowledge updates required? YES/NO

  DECISION LOGIC:

    IF ALL_ANSWERS == NO:
      → Task = STATELESS_COMPATIBLE
      → T4 Validation: 5/5 stateless regeneration success
      → PROCEED TO Q4

    ELIF PARTIAL_YES (1-2 requirements):
      → Evaluate HYBRID_MCD_ARCHITECTURE
      → Design: Stateless core + external state manager
      → Document: State dependencies (Section 4.2)
      → WARNING: Increased complexity vs pure MCD

    ELSE (ALL_YES):
      → MCD NOT SUITABLE
      → RECOMMENDATION: RAG/Vector DB + LangChain
      → EXIT with architectural justification

Stateless Viability Examples:

✅ Suitable: FAQ, appointment booking, navigation, single-turn diagnostics
⚠️ Hybrid: Multi-turn conversations with session context
❌ Unsuitable: Personalized learning, customer relationship management

Q4: Token Budget Classification

TOKEN_BUDGET_DECISION_TREE:

  User specifies acceptable token budget:

  [1] budget < 60 tokens:
      → MODE = ULTRA_MINIMAL
      → RISK: T6 validation shows 60% failure rate <60 tokens
      → RECOMMENDATION: Relax constraints if feasible
      → IF MANDATORY: Use symbolic logic, IF-THEN routing

  [2] 60 ≤ budget ≤ 150 tokens:
      → MODE = MINIMAL (VALIDATED RANGE)
      → EVIDENCE: T1/T6 show 94% success rate maintenance
      → OPTIMAL: 75-85 token sweet spot (Section 8.3)

  [3] 150 < budget ≤ 512 tokens:
      → MODE = MODERATE
      → NOTE: Approaching 90-130 token capability plateau
      → CONSIDERATION: Diminishing returns beyond 90 tokens

  [4] budget > 512 tokens:
      → MODE = RESOURCE_ABUNDANT
      → EXIT_RECOMMENDATION: Non-MCD approaches likely optimal
      → RATIONALE: MCD sacrifices peak performance for constraints

  [5] budget = Variable/Dynamic:
      → MODE = ADAPTIVE
      → IMPLEMENTATION: Dynamic allocation (Section 5.3)
      → VALIDATION: Tier-based routing (Q1→Q4→Q8)

Empirical Token Budget Guidance (from T1/T6):

Minimum viable: 60 tokens (94% success floor)
Optimal range: 75-90 tokens (peak efficiency-to-performance)
Plateau threshold: 90-130 tokens (< 5% improvement beyond)

G.1 Output: Context profile fully documented → PROCEED TO PHASE 2 (Appendix G.2)

G.2 Phase 2: Prompt Engineering Approach Selection

Purpose: Select optimal prompt engineering approach based on context profile from Phase 1, using empirically-validated performance data from Chapters 6-7. Each priority (Efficiency, UX, Quality, Education, Hybrid) maps to specific approaches with quantified trade-offs.

Decision Framework: Priority-driven selection trees route practitioners to approaches validated through T1-T10 simulations and W1-W3 domain walkthroughs, with explicit anti-pattern enforcement preventing empirically-documented failure modes.

G.2.1 Efficiency Priority Decision Tree

When to Use: EFFICIENCY_PRIORITY = HIGH (from G.1 Q2) — Deployments prioritizing token minimization, memory footprint reduction, and latency optimization.

EFFICIENCY_APPROACH_SELECTOR:

  [Branch 1] Token Budget < 60 tokens (ULTRA_MINIMAL):
    → APPROACH: MCD STRUCTURED (MANDATORY)
    → PERFORMANCE: 92% efficiency, 81% context-optimal
    → VALIDATION: T1 approach comparison, T6 over-engineering detection
    → RATIONALE: Only viable approach at extreme constraints
    → QUANTIZATION: Force Q1 tier (Qwen2-0.5B, 300MB)
    → RISK: 60% failure rate if budget <60 (T6 evidence)

  [Branch 2] 60 ≤ Token Budget ≤ 150 (MINIMAL):
    → APPROACH: HYBRID MCD+FEW-SHOT
    → PERFORMANCE: 88% efficiency, 86% context-optimal
    → VALIDATION: T1/W1/W2/W3 cross-domain validation
    → RATIONALE: Balances efficiency with pattern learning
    → QUANTIZATION: Start Q4 tier, fallback to Q1 if needed
    → IMPLEMENTATION: MCD structure + 2-3 Few-Shot examples

  [Branch 3] Hardware RAM < 256MB (HARDWARE OVERRIDE):
    → APPROACH: MCD STRUCTURED (MANDATORY)
    → PERFORMANCE: Same as Branch 1
    → RATIONALE: Hardware constraint supersedes token budget
    → QUANTIZATION: Force Q1/Q4 tiers only
    → VALIDATION: T8 deployment environment testing
    → NOTE: Hardware limitations override task complexity

  [Branch 4] DEFAULT (Budget >150, RAM ≥256MB):
    → APPROACH: MCD STRUCTURED with Q4 tier
    → PERFORMANCE: 85% retention under Q1, 95% under Q4
    → FALLBACK: Escalate to Hybrid if performance <80%
    → QUANTIZATION: Q4 optimal (TinyLlama-1.1B, 560MB)
    → VALIDATION: T10 quantization tier validation

Practical Example:

Scenario: Edge device FAQ chatbot, 256MB RAM, 80-token budget
Selection: Branch 2 → Hybrid MCD+Few-Shot
Implementation: MCD slot-filling structure + 3 Few-Shot Q&A examples
Expected Performance: 88% efficiency, 430ms average latency (W1 data)

G.2.2 User Experience Priority Decision Tree

When to Use: UX_PRIORITY = HIGH (from G.1 Q2) — Deployments prioritizing natural language interaction, conversation flow, and user-friendly error handling.

UX_APPROACH_SELECTOR:

  [Branch 1] Deployment Constraints = Unconstrained (>2GB RAM, >512 tokens):
    → APPROACH: CONVERSATIONAL
    → PERFORMANCE: 89% user experience score
    → VALIDATION: W1 healthcare booking walkthrough
    → TRADEOFF: 1.5x token cost, 2.1x latency vs MCD
    → RATIONALE: Natural flow maximizes satisfaction when resources permit
    → WARNING: Degrades severely under constraint pressure (28% at <512 tokens)

  [Branch 2] Deployment Constraints = Moderate (512MB-2GB, 150-512 tokens):
    → APPROACH: SYSTEM ROLE PROFESSIONAL
    → PERFORMANCE: 82% UX, 78% context-optimal
    → VALIDATION: W1/W2 walkthroughs
    → BALANCE: Professional framing + constraint-awareness
    → QUANTIZATION: Q4 tier recommended
    → IMPLEMENTATION: Structured persona with graceful degradation

  [Branch 3] Deployment Constraints = Tight (<512MB, <150 tokens):
    → APPROACH: FEW-SHOT PATTERN
    → PERFORMANCE: 68% UX, 78% context-optimal
    → VALIDATION: W3 diagnostics walkthrough
    → JUSTIFICATION: Best UX achievable under strict constraints
    → QUANTIZATION: Q1/Q4 adaptive routing
    → NOTE: Conversational approach fails here (28% completion)

  [Branch 4] FALLBACK (Constraints = Severe):
    → APPROACH: MCD STRUCTURED with enhanced error messages
    → PERFORMANCE: 60% UX (baseline), 92% efficiency
    → COMPROMISE: Sacrifice conversational flow for reliability
    → ENHANCEMENT: Add user-friendly clarification templates
    → VALIDATION: T7 constraint stress test (80% controlled degradation)

Practical Example:

Scenario: Browser-based appointment booking, moderate constraints
Selection: Branch 2 → System Role Professional
Implementation: "Healthcare scheduling assistant" persona + structured prompts
Expected Performance: 82% UX, 1724ms latency (W1 data)

G.2.3 Quality Priority Decision Tree

When to Use: QUALITY_PRIORITY = HIGH (from G.1 Q2) — Deployments prioritizing accuracy, completeness, and domain expertise over efficiency or UX.

QUALITY_APPROACH_SELECTOR:

  [Branch 1] Context = Professional Domain (Healthcare, Legal, Finance):
    → APPROACH: SYSTEM ROLE PROFESSIONAL
    → PERFORMANCE: 86% completion, 82% UX
    → VALIDATION: W1 healthcare, W3 diagnostics
    → RATIONALE: Expertise framing improves accuracy perception
    → QUANTIZATION: Q4/Q8 tier for complex reasoning
    → DOMAINS: Healthcare, diagnostics, formal communication

  [Branch 2] Requirement = Technical Accuracy (>90% correctness):
    → APPROACH: HYBRID MULTI-STRATEGY
    → PERFORMANCE: 96% completion, 91% accuracy
    → VALIDATION: W3 system diagnostics (highest quality)
    → WARNING: Requires 75% engineering sophistication threshold
    → IMPLEMENTATION: MCD + Few-Shot + System Role coordination
    → QUANTIZATION: Q8 tier preferred (Llama-3.2-1B, 800MB)
    → TRADEOFF: 2.3x complexity vs MCD alone

  [Branch 3] Requirement = Balanced Quality (80-90% target):
    → APPROACH: FEW-SHOT PATTERN
    → PERFORMANCE: 84% completion, balanced across metrics
    → VALIDATION: W2 spatial navigation
    → RATIONALE: Pattern learning without full hybrid complexity
    → QUANTIZATION: Q4 tier optimal
    → ACCESSIBILITY: 89% engineering accessibility (vs 74% hybrid)

  [Branch 4] EVALUATION REQUIRED (Ambiguous quality needs):
    → DECISION POINT: Task complexity vs resource availability
    → IF complex_reasoning AND resources_available:
        → TRY: Hybrid Multi-Strategy
    → ELIF moderate_complexity:
        → TRY: Few-Shot Pattern
    → ELSE:
        → FALLBACK: MCD with domain-specific examples
    → VALIDATE: Run T1-style comparison before deployment

Practical Example:

Scenario: System diagnostics agent, technical accuracy critical
Selection: Branch 2 → Hybrid Multi-Strategy
Implementation: MCD routing + Few-Shot diagnostic examples + System Role expertise
Expected Performance: 96% completion, 91% accuracy (W3 data)

G.2.4 Hybrid Priority Decision Tree

When to Use: HYBRID_PRIORITY = HIGH (from G.1 Q2) — Deployments requiring balanced optimization across efficiency, UX, and quality.

HYBRID_APPROACH_SELECTOR:

  [Branch 1] Prompt Engineering Expertise = Advanced (ML engineering team):
    → APPROACH: HYBRID MULTI-STRATEGY
    → COORDINATION: MCD + Few-Shot + System Role
    → PERFORMANCE: Superior across all metrics (W1/W2/W3)
    → ACCESSIBILITY: 74% engineering threshold
    → QUANTIZATION: Dynamic tier routing (Q1→Q4→Q8)
    → MAINTENANCE: High complexity, requires ongoing tuning
    → VALIDATION: All T1-T10 tests + W1-W3 walkthroughs

  [Branch 2] Expertise = Moderate (Software engineering background):
    → APPROACH: FEW-SHOT + SYSTEM ROLE (Two-Strategy)
    → COORDINATION: Simpler than full hybrid
    → PERFORMANCE: Good balance without complexity overhead
    → ACCESSIBILITY: 82% engineering threshold
    → QUANTIZATION: Q4 tier with Q8 fallback
    → IMPLEMENTATION: System Role persona + Few-Shot examples

  [Branch 3] Expertise = Basic (Product/UX team):
    → APPROACH: MCD + FEW-SHOT
    → PROVEN COMBINATION: 88% efficiency, 86% context-optimal
    → JUSTIFICATION: Validated in W1/W2, accessible implementation
    → ACCESSIBILITY: 94% engineering threshold
    → QUANTIZATION: Start Q4, fallback Q1
    → MAINTENANCE: Low complexity, stable performance

  [Branch 4] ITERATIVE STRATEGY (Unknown expertise):
    → START: MCD STRUCTURED baseline
    → MEASURE: Performance across efficiency/UX/quality dimensions
    → ITERATE: Add Few-Shot examples incrementally
    → VALIDATE: T1 approach comparison after each iteration
    → STOP: When improvement <5% for 2 consecutive iterations
    → RESULT: Custom-tuned hybrid adapted to team capabilities

Practical Example:

Scenario: Navigation assistant, balanced requirements, moderate expertise
Selection: Branch 2 → Few-Shot + System Role
Implementation: "Navigation expert" persona + spatial reasoning examples
Expected Performance: Balanced 80%+ across efficiency/UX/quality (W2 data)

G.2.5 Anti-Pattern Enforcement (Critical Validation)

Purpose: Prevent empirically-validated failure modes that cause catastrophic degradation under constraint conditions.

FORBIDDEN_APPROACHES_VALIDATOR:

  [Anti-Pattern 1] Chain-of-Thought under Constraints:
    IF approach_includes(CoT) AND constraints == True:
      → REJECT: Empirically validated failures
      → EVIDENCE: T6/T7/T8 browser crashes, token overflow
      → COMPLETION RATE: 2/5 with CoT vs 5/5 with Few-Shot
      → ROOT CAUSE: Reasoning chains exceed token budgets
      → ALTERNATIVE: Replace with Few-Shot examples (T6 validation)
      → EXCEPTION: None — CoT universally incompatible with constraints

  [Anti-Pattern 2] Verbose Conversational under Budget Pressure:
    IF approach == Conversational AND token_budget < 512:
      → REJECT: 28% completion rate (W1 evidence)
      → EVIDENCE: Conversational requires 1.5x tokens vs MCD
      → FAILURE MODE: Natural language phrasing exceeds budgets
      → ALTERNATIVE: System Role Professional (82% UX at constraints)
      → THRESHOLD: Conversational viable only when budget ≥512

  [Anti-Pattern 3] Q8 without Q4 Justification:
    IF quantization == Q8 AND NOT performance_inadequacy_at_Q4:
      → REJECT: Violates minimality principle (Section 4.2)
      → EVIDENCE: T10 shows Q4 optimal for 80% of tasks
      → VALIDATION REQUIRED: Document Q4 failures before Q8 escalation
      → RATIONALE: Resource efficiency core to MCD philosophy
      → PROCESS: Try Q4 → Measure drift → Escalate if drift >10%

  [Anti-Pattern 4] Unbounded Clarification Loops:
    IF clarification_loops == Unbounded:
      → REJECT: 1/4 recovery rate, semantic drift (T5: 2/4 drift)
      → EVIDENCE: Loops >2 iterations cause confusion
      → FAILURE MODE: Progressive semantic drift accumulation
      → ALTERNATIVE: Bounded loops (≤2 iterations, explicit termination)
      → IMPLEMENTATION: Hard limit + graceful escalation message
      → VALIDATION: T3 structured fallback (4/5 success with bounds)

Critical Implementation Note: All four anti-patterns must be checked before deployment. Violations historically correlate with >70% failure rates in constraint conditions.

G.2 Output: Primary approach selected, validated, and anti-pattern checked → PROCEED TO PHASE 3 (Appendix G.3)

G.3 Phase 3: MCD Principle Application Workflows

Purpose: Systematically apply MCD's three core principles—Minimality by Default, Bounded Rationality, Degeneracy Detection—to validate and refine architectural designs from Phase 2. Each principle includes empirically-derived validation workflows with quantified thresholds from Chapters 4-7.

Critical Context: Phase 3 transforms selected approaches into constraint-compliant architectures through iterative component validation, ensuring every element justifies its token/memory cost through measurable performance contribution.

G.3.1 Step 1: Minimality by Default Validation

Principle Foundation: Remove all components unless empirical evidence demonstrates necessity (Section 4.2). Default assumption: simpler architectures outperform complex ones under constraints.

Q5: Component Necessity Assessment

For Each Component in [Memory, Tools/APIs, Orchestration Layers]:

Q5.1 Memory Component Validation

MEMORY_NECESSITY_TEST:

  Question: Can task complete without persistent state?

  TEST PROTOCOL (T4 Methodology):
    1. Implement stateless regeneration workflow
    2. Run 5 trials with explicit context reinjection
    3. Run 5 trials with implicit reference (baseline)
    4. Measure completion rate for both conditions

  DECISION LOGIC:

    IF stateless_completion_rate ≥ 90% (5/5 trials succeed):
      → ACTION: REMOVE memory component
      → EVIDENCE: T4 validation shows 5/5 stateless vs 2/5 implicit
      → BENEFIT: -200 tokens, -40MB RAM, +15% latency improvement
      → DOCUMENT: Stateless viability confirmed
      → IMPLEMENTATION: Use explicit slot reinjection (Section 4.2)

    ELSE (stateless_rate < 90%):
      → ACTION: KEEP memory, justify with fallback design
      → CALCULATE: Memory Fragility Score (Appendix E.2.2)
      → FORMULA: MFS = state_dependencies / total_interactions
      → THRESHOLD: If MFS > 40% → High fragility, redesign required
      → MITIGATION: Implement hybrid stateless core + external state

Practical Example:

Task: Healthcare appointment booking (W1)
Test Results: 5/5 stateless completions with {doctor_type, date, time} reinjection
Decision: Remove session memory, use explicit slot passing
Benefit: 200-token reduction, simplified architecture

Q5.2 Tool/API Component Validation

TOOL_UTILIZATION_TEST:

  Question: Utilization rate >10%? (T7 Degeneracy Threshold)

  MEASUREMENT PROTOCOL:
    1. Track tool invocations across test scenarios
    2. Calculate: utilization_rate = invocations / total_interactions
    3. Measure latency impact: latency_with_tool vs latency_baseline

  DECISION LOGIC:

    IF utilization_rate < 10%:
      → ACTION: REMOVE tool/API component
      → EVIDENCE: T7 shows <10% triggers degeneracy detection
      → RATIONALE: Maintenance overhead outweighs rare utility
      → DOCUMENT: Degeneracy threshold violated
      → BENEFIT: Reduced complexity, faster response times

    IF 10% ≤ utilization_rate < 30%:
      → ACTION: CONDITIONAL KEEP (monitor closely)
      → REQUIREMENT: Document specific use cases justifying inclusion
      → VALIDATE: Latency improvement must be >15% when triggered
      → WARNING: Borderline utility, candidate for future removal

    IF utilization_rate ≥ 30%:
      → ACTION: KEEP tool, document usage patterns
      → VALIDATE: Latency improvement justifies inclusion cost
      → MONITOR: Track utilization trends over deployment lifecycle

Practical Example:

Tool: Medical terminology API for appointment booking
Utilization: 8% (only triggered for ambiguous specialty names)
Decision: Remove API, use Few-Shot examples of common specialties
Benefit: -50ms average latency, simplified deployment

Q5.3 Orchestration Layer Validation

ORCHESTRATION_NECESSITY_TEST:

  Question: Does prompt-level routing suffice? (Section 5.3)

  TEST PROTOCOL:
    1. Implement IF-THEN routing directly in prompt
    2. Implement equivalent orchestration layer routing
    3. Run T3-style structured fallback test (5 trials each)
    4. Measure: completion rate, latency, token cost

  DECISION LOGIC:

    IF prompt_routing_success ≥ 80% (4/5 trials):
      → ACTION: REMOVE orchestration layer
      → EVIDENCE: T3 shows 4/5 structured fallback success
      → BENEFIT: -30 tokens overhead, -25ms latency
      → DOCUMENT: Prompt-native routing validated
      → IMPLEMENTATION: Use symbolic IF-THEN in prompt text

    ELIF prompt_routing_success 60-79% (3/5 trials):
      → ACTION: HYBRID APPROACH
      → DESIGN: Simple router for complex cases only
      → FALLBACK: Default to prompt routing when possible
      → JUSTIFY: Document specific failure modes requiring orchestration

    ELSE (prompt_routing < 60%):
      → ACTION: KEEP orchestration layer
      → JUSTIFY: Document complexity vs performance gain
      → VALIDATE: Calculate Redundancy Index (Step 3)
      → THRESHOLD: RI must be ≤10 to justify complexity

Practical Example:

Task: Navigation routing between {booking, navigation, diagnostic} intents
Prompt Routing: 4/5 successful classifications with IF-THEN structure
Orchestration Layer: 5/5 successes but +30 tokens, +25ms latency
Decision: Remove orchestration, use prompt-native IF-THEN
Benefit: Simpler architecture, validated performance

G.3.2 Step 2: Bounded Rationality Application

Principle Foundation: Limit reasoning complexity to ≤3 sequential steps; replace natural language reasoning chains with symbolic compression (Section 4.2).

Q6: Reasoning Chain Complexity Assessment

REASONING_COMPLEXITY_ANALYZER:

  Task Decomposition Protocol:
    1. Break task into atomic reasoning steps
    2. COUNT: number_of_sequential_steps
    3. IDENTIFY: dependencies between steps
    4. MEASURE: token cost per reasoning step

  COMPLEXITY DECISION TREE:

    IF sequential_steps > 3:
      → RISK_LEVEL = HIGH
      → EVIDENCE: T5 shows semantic drift in 2/4 cases beyond 3 steps
      → WARNING: Failure probability increases exponentially >3 steps
      → PROCEED TO MITIGATION OPTIONS

    ELIF sequential_steps = 3:
      → RISK_LEVEL = MODERATE
      → ACTION: Apply symbolic compression (Option 1)
      → VALIDATE: Ensure no cascading failures
      → MONITOR: Track drift rates in production

    ELIF sequential_steps < 3:
      → RISK_LEVEL = LOW
      → ACTION: PROCEED with bounded reasoning design
      → VALIDATION: Standard T1-style testing sufficient

Mitigation Options for High-Complexity Tasks (>3 steps)


COMPLEXITY_REDUCTION_STRATEGIES:

  [Option 1] Symbolic Compression:
    TECHNIQUE: Replace natural language with symbolic logic

    BEFORE (Natural Language, 45 tokens):
      "Think carefully about the route from your current location to 
       the destination, considering all landmarks and directions..."

    AFTER (Symbolic, 12 tokens):
      "Calculate: current_pos → landmarks → destination"

    VALIDATION: Maintains semantics, reduces token cost 73%
    EVIDENCE: W2 navigation shows equivalent accuracy
    BENEFIT: -33 tokens per reasoning step

    ⭐ ADAPTATION PATTERN NOTE:
       Symbolic compression effectiveness varies by domain structure (Section 5.2.1):
       - Semi-Static domains (W2 navigation): Deterministic rules enable aggressive compression
       - Dynamic domains (W1 booking, W3 diagnostics): Moderate compression with adaptive logic
       Implementation guidance: See G.4.1 Adaptation Pattern Classification


  [Option 2] Task Decomposition:
    TECHNIQUE: Split into independent sub-agents

    DESIGN:
      - Each sub-agent: ≤3 reasoning steps maximum
      - Coordination: Sequential execution, NOT chained reasoning
      - State passing: Explicit outputs → explicit inputs

    EXAMPLE (System Diagnostics):
      - Sub-agent 1: Symptom classification (2 steps)
      - Sub-agent 2: Priority assignment (2 steps)
      - Sub-agent 3: Action recommendation (2 steps)
      Total: 6 steps divided into 3 independent agents

    VALIDATION: T3 shows modular agents maintain 4/5 success rate
    TRADEOFF: +50ms coordination latency, but safer than chaining

  [Option 3] Chain-of-Thought Replacement (CRITICAL):
    RULE: IF CoT seems necessary → FORBIDDEN under constraints

    EVIDENCE: T6/T7/T8 show catastrophic CoT failures
      - T6: 2/5 completion rate with CoT vs 5/5 with Few-Shot
      - T7: Browser crashes with CoT under memory pressure
      - T8: Token overflow in 4/5 WASM deployments

    ALTERNATIVE: Few-Shot examples showing reasoning patterns

    BEFORE (CoT, 120 tokens):
      "Let's think step by step. First, I need to understand..."

    AFTER (Few-Shot, 60 tokens):
      Example 1: Input X → Output Y (reasoning implicit in examples)
      Example 2: Input A → Output B
      Apply to current: Input Z → Output ?

    VALIDATION: T6 shows 5/5 Few-Shot success vs 2/5 CoT
    BENEFIT: 2x token reduction, 100% reliability improvement

Q7: Token Budget Allocation

TOKEN_BUDGET_ALLOCATOR:

  Input: Total_Budget (from G.1 Q4)

  ALLOCATION FORMULA (Empirically Validated):
    Core_Logic:          40-60% of Total_Budget
    Fallback_Handling:   20-30% of Total_Budget
    Input_Processing:    10-20% of Total_Budget
    Buffer_Variations:   10-15% of Total_Budget

  VALIDATION CHECKS:

    CHECK 1: Budget sum must equal 100%
      IF SUM(allocations) ≠ 1.0:
        → ERROR: "Budget allocation must total 100%"
        → ACTION: Rebalance percentages

    CHECK 2: Core logic must dominate
      IF Core_Logic < 40% OR Core_Logic > 60%:
        → WARNING: "Unbalanced allocation may cause failures"
        → RECOMMENDATION: Shift tokens to core from buffer/input

    CHECK 3: Fallback budget adequate
      IF Fallback < 20%:
        → ERROR: "Insufficient fallback budget"
        → EVIDENCE: T3/T7 show ≥20% required for recovery

  WORKED EXAMPLE (Total_Budget = 80 tokens):

    Allocation Calculation:
      Core_Logic:       48 tokens  (60% - upper bound for complex task)
      Fallback:         20 tokens  (25% - mid-range for safety)
      Input:            8 tokens   (10% - minimal for slot extraction)
      Buffer:           4 tokens   ( 5% - tight but acceptable)
      ─────────────────────────────
      Total:            80 tokens  (100% ✓)

    Validation:
      ✓ Core dominates (60%)
      ✓ Fallback adequate (25%)
      ✓ Sum equals 100%
      → APPROVED for deployment

Critical Note: Token budgets <60 total require proportional adjustment but maintain relative percentages. For example, 50-token budget: Core 30 (60%), Fallback 10 (20%), Input 5 (10%), Buffer 5 (10%).

G.3.3 Step 3: Degeneracy Detection

Principle Foundation: Quantify component value through Redundancy Index; remove elements contributing <10% marginal improvement (T6 methodology).

Q8: Redundancy Index Calculation

REDUNDANCY_INDEX_PROTOCOL:

  FORMULA:
    RI = excess_tokens / marginal_correctness_improvement

  MEASUREMENT PROCEDURE:

    STEP 1: Establish Baseline
      - Implement minimal prompt (Section 4.2 guidance)
      - Run 5 test trials across representative scenarios
      - MEASURE:
        * task_success_rate_baseline (0-100%)
        * token_count_baseline
        * latency_baseline (ms)

    STEP 2: Test Enhanced Version
      - Add proposed component/feature to baseline
      - Run 5 test trials with identical scenarios
      - MEASURE:
        * task_success_rate_enhanced (0-100%)
        * token_count_enhanced
        * latency_enhanced (ms)

    STEP 3: Calculate Metrics
      excess_tokens = token_count_enhanced - token_count_baseline
      improvement = task_success_rate_enhanced - task_success_rate_baseline
      RI = excess_tokens / improvement

      latency_overhead = latency_enhanced - latency_baseline

  INTERPRETATION THRESHOLDS:

    IF RI > 10:
      → CLASSIFICATION: OVER-ENGINEERED
      → EVIDENCE: T6 verbose case study
        * Verbose prompt: 145 tokens
        * Minimal prompt: 58 tokens
        * Improvement: +0.2 on 0-4 scale (+5% absolute)
        * RI = (145-58) / 0.05 = 87 / 0.05 = 1,740
        * Conclusion: Extreme over-engineering
      → ACTION: Remove enhancement, revert to baseline
      → BENEFIT: Token savings without performance loss

    IF 5 ≤ RI ≤ 10:
      → CLASSIFICATION: BORDERLINE ACCEPTABLE
      → ACTION: Conditional keep with monitoring
      → REQUIREMENT: Document specific justification
      → REVIEW: Reassess after deployment data collection

    IF RI < 5:
      → CLASSIFICATION: JUSTIFIED COMPLEXITY
      → ACTION: Keep enhanced version
      → RATIONALE: Improvement justifies token cost
      → DOCUMENT: RI value for future reference

Worked Example:

CASE STUDY: Healthcare Booking Enhanced Clarification

Baseline Version:
  - Token count: 65 tokens
  - Success rate: 84% (21/25 trials)
  - Latency: 380ms

Enhanced Version (added multi-turn clarification):
  - Token count: 95 tokens
  - Success rate: 92% (23/25 trials)
  - Latency: 450ms

Calculation:
  excess_tokens = 95 - 65 = 30 tokens
  improvement = 0.92 - 0.84 = 0.08 (8%)
  RI = 30 / 0.08 = 375

  latency_overhead = 450 - 380 = +70ms

Interpretation:
  RI = 375 >> 10 → OVER-ENGINEERED
  Decision: Remove multi-turn clarification
  Alternative: Single-turn bounded clarification (RI = 6.2, acceptable)

Q9: Usage Pattern Analysis

USAGE_PATTERN_VALIDATOR:

  FOR EACH component_or_pathway IN architecture:

    METRIC: Utilization Rate
      utilization_rate = actual_uses / total_possible_uses

    DECISION LOGIC:

      IF utilization_rate < 10%:
        → FLAG: Unused or rarely-triggered component
        → ACTION: REMOVE component immediately
        → RATIONALE: Maintenance cost exceeds rare utility
        → DOCUMENT: "Degeneracy threshold violated"
        → CROSS-CHECK: Verify no edge-case dependencies

      IF 10% ≤ utilization_rate < 25%:
        → FLAG: Low-usage component
        → ACTION: Mark for review after deployment
        → MONITOR: Track trend over time (increasing/decreasing)
        → CONDITION: Keep if critical for edge cases

      IF utilization_rate ≥ 25%:
        → STATUS: VALIDATED
        → ACTION: Keep component
        → DOCUMENT: Usage patterns for long-term monitoring
        → OPTIMIZE: Consider frequency-based caching

  DEAD PATH DETECTION:

    FOR EACH decision_pathway IN prompt_logic:
      IF pathway_triggered_count == 0 across all test cases:
        → ALERT: "DEAD PATH IDENTIFIED"
        → INVESTIGATION: Why was pathway never triggered?
          * Unreachable condition?
          * Redundant with other pathways?
          * Test coverage gap?
        → ACTION OPTIONS:
          1. Remove dead pathway entirely
          2. Merge with active pathways
          3. Add test coverage if genuinely needed
        → UPDATE: Decision tree structure after removal

Practical Example:

CASE STUDY: Indoor Navigation Agent Path Analysis (W2 Domain)

Pathway Usage Results (n=100 navigation queries):
  - direct_route:           52 triggers (52% utilization) → KEEP ✓
  - obstacle_avoidance:     31 triggers (31% utilization) → KEEP ✓
  - multi_waypoint:         11 triggers (11% utilization) → KEEP ✓
  - accessibility_route:     4 triggers ( 4% utilization) → REMOVE ✗
  - emergency_exit:          2 triggers ( 2% utilization) → REMOVE ✗
  - scenic_route:            0 triggers ( 0% utilization) → REMOVE ✗ (DEAD PATH)

Actions Taken:
  1. Remove accessibility_route pathway (below 10% threshold)
     - Justification: Specialized requests should escalate to human assistance
  2. Remove emergency_exit pathway (below 10% threshold)
     - Justification: Safety-critical routing requires real-time fire alarm integration
  3. Remove scenic_route pathway (never triggered)
     - Justification: Dead path with no real-world usage patterns
  4. Token savings: -22 tokens from removed pathways
  5. Simplified decision tree: 6 branches → 3 branches
  6. Latency improvement: -15ms average

Result: Focused navigation agent maintains 94% route success (direct + obstacle + waypoint) 
        with 27% token reduction and improved response times

G.3 Output: Clean minimal architecture validated through three-principle workflow → PROCEED TO PHASE 4 (Appendix G.4)

G.4 Phase 4: Layer Implementation with Decision Trees

Purpose: Implement validated MCD architecture from Phase 3 through three-layer structure—Prompt Layer (intent classification/slot extraction), Control Layer (routing logic), Execution Layer (quantization-aware model selection). Each layer includes constraint validation and empirical thresholds from T3/T5/T10.

Critical Context: Layer separation enables modular testing, maintenance, and dynamic tier routing while maintaining stateless operation principles.

G.4.1 Layer 1: Prompt Layer Design (With Adaptation Patterns)

Purpose: Embed decision logic directly into prompt text using IF-THEN structures, intent classification trees, and slot extraction workflows. Implementation strategy varies by task structure following Table 5.1 adaptation pattern taxonomy from Section 5.2.1.

Critical Design Principle: Match prompt logic complexity to task structure—over-engineering navigation wastes tokens; under-engineering diagnostics fails variable patterns (Section 5.2.1).

Adaptation Pattern Classification (Table 5.1 Integration)

Before Implementation: Determine adaptation mechanism based on task characteristics.

Pattern Type	When to Use	Implementation Strategy	Validation Evidence
Dynamic	Natural language variability, unpredictable information density	Conditional slot extraction with runtime intent parsing	W1: 84% completion with dynamic slot-filling
Semi-Static	Structured relationships, mathematical transformations	Deterministic coordinate calculations with fixed rules	W2: 85% success with coordinate logic
Dynamic	Heuristic classification, variable complexity patterns	Adaptive category routing with priority-based sequencing	W3: 91% accuracy with heuristic classification

Intent Classification Decision Tree Structure

# Pseudocode for Prompt Layer Intent Classification
# Constraints: Depth ≤3, Branches ≤4, Token ≤25% budget per path

def intent_classification_tree(user_input):
    """
    ROOT-level intent detection with bounded complexity.

    Validation Constraints (T5/T3):
      - Maximum depth: ≤3 levels
      - Branching factor: ≤4 per node
      - Token allocation: ≤25% total budget per path
      - Fallback: Every path must have explicit recovery
    """

    # PRIMARY INTENT DETECTION (Level 0)
    primary_intent = classify_primary_intent(user_input)

    if primary_intent == "booking":
        # ADAPTATION PATTERN: Dynamic (Section 5.2.1, W1)
        return booking_subtree(user_input, depth=1)

    elif primary_intent == "navigation":
        # ADAPTATION PATTERN: Semi-Static (Section 5.2.1, W2)
        return navigation_subtree(user_input, depth=1)

    elif primary_intent == "diagnostic":
        # ADAPTATION PATTERN: Dynamic (Section 5.2.1, W3)
        return diagnostic_subtree(user_input, depth=1)

    else:  # DEFAULT FALLBACK (T3: 4/5 success with explicit fallback)
        return escalation_node(
            message="Intent unclear. Please specify: booking, navigation, or diagnostic.",
            retry_allowed=True,
            max_retries=2  # Bounded loops (G.2.5 Anti-Pattern 4)
        )

Pattern 1: Dynamic Slot-Filling (W1 - Healthcare Booking)

Design Rationale: Natural language appointment requests vary unpredictably in information density, requiring conditional slot identification with runtime adaptation (Section 5.2.1).

def booking_subtree(user_input, depth):
    """
    ADAPTATION PATTERN: Dynamic Slot-Filling (W1 domain).

    Characteristics (Section 5.2.1):
      - Conditional slot extraction with variable missing-data prompts
      - Natural language request variability requires runtime intent parsing
      - Information density unpredictable (complete vs partial inputs)

    Slot Structure: {doctor_type, date, time}
    Validation: W1 shows 84% completion with dynamic adaptation
    Token Budget: ≤40% total (from G.3.2 Q7)
    """

    # DEPTH LIMIT ENFORCEMENT (T5 validation)
    if depth > 3:
        return fallback_response(
            message="Booking request too complex. Please simplify.",
            escalation_recommended=True
        )

    # DYNAMIC SLOT EXTRACTION (Level 1)
    # Adapts to variable input completeness
    slots = extract_slots(user_input)  # Returns: {doctor_type, date, time}

    # COMPLETENESS CHECK (Level 2)
    # Different paths based on information density
    if slots_complete(slots):
        # Complete input: "Cardiology tomorrow at 2pm"
        return confirm_booking(slots)
        # Output: "Confirmed Cardiology, tomorrow, 2PM. ID [generated]"

    else:
        # ADAPTIVE CLARIFICATION (Level 3 - Maximum depth)
        # Identifies specific missing slots dynamically
        missing_slots = identify_missing_slots(slots)

        # Example adaptive behavior (Section 5.2.1):
        # Input: "I want to book an appointment"
        # → Output: "Missing [time, date, type] for appointment"

        return clarify_missing_slots(
            missing=missing_slots,
            partial_context=serialize_slots(slots),  # T4: Explicit state passing
            depth=depth + 1
        )

Design Rationale: Navigation operates on structured coordinate systems with fixed spatial relationships, enabling mathematical transformation rules rather than NLP interpretation (Section 5.2.1).

def navigation_subtree(user_input, depth):
    """
    ADAPTATION PATTERN: Semi-Static Deterministic (W2 domain).

    Characteristics (Section 5.2.1):
      - Deterministic coordinate calculations with fixed directional rules
      - Structured spatial relationships enable mathematical transformations
      - Predictable logic follows coordinate geometry, not natural language parsing

    Logic: Stateless coordinate transformation (A1→B3 = North 2m, East 1m)
    Validation: W2 shows 85% success with symbolic compression
    Token Budget: ≤25% per path (constrained spatial reasoning)
    """

    # DEPTH LIMIT ENFORCEMENT
    if depth > 3:
        return fallback_response(
            message="Route too complex. Provide simpler waypoints.",
            simplification_hint="Use landmarks: library, cafeteria, main entrance"
        )

    # DETERMINISTIC SPATIAL PARSING (Level 1)
    # Follows fixed mathematical rules, not adaptive interpretation
    route = parse_spatial_instructions(user_input)
    # Returns: {start_pos, landmarks[], destination, direction}

    # VALIDITY CHECK (Level 2)
    # Coordinate transformation validation
    if route_valid(route):
        # SEMI-STATIC EXECUTION
        # Fixed directional calculations from coordinate pairs
        # Example (Section 5.2.1):
        # Input: "Navigate from A1 to B3"
        # → Output: "North 2m, East 1m"

        # Input: "A1 to B3, avoid C2"
        # → Output: "North 2m (avoid C2), East 1m"

        return execute_navigation(route)

    else:
        # SPATIAL CLARIFICATION (Level 3)
        # Still deterministic: requires structured coordinate/landmark
        return clarify_spatial_reference(
            message="Unclear location. Specify building/floor/landmark.",
            current_context=route.start_pos,  # Stateless context passing
            expected_format="Use format: [Building][Floor][Room] or [Landmark]"
        )

Implementation Note: While MCD maintains stateless prompt architecture (consistency principle), the underlying logic is deterministic coordinate transformation that could theoretically be hardcoded as functions. MCD embeds this logic in prompts for deployment flexibility (Section 5.2.1).

Pattern 3: Dynamic Heuristic Classification (W3 - Diagnostics)

Design Rationale: System diagnostics require adaptive pattern matching across multiple categories with variable complexity, demanding heuristic routing that adjusts to issue characteristics (Section 5.2.1).

def diagnostic_subtree(user_input, depth):
    """
    ADAPTATION PATTERN: Dynamic Heuristic Classification (W3 domain).

    Characteristics (Section 5.2.1):
      - Heuristic category routing with priority-based step sequencing
      - Issue complexity variation demands adaptive classification paths
      - Multiple categories (Infrastructure, Software, Network) with varying priorities

    Logic: Heuristic classification P1/P2/P3 with adaptive step sequences
    Validation: W3 shows 91% accuracy with bounded scope
    Token Budget: ≤30% (diagnostic reasoning requires more tokens)
    """

    # DEPTH LIMIT ENFORCEMENT
    if depth > 3:
        return fallback_response(
            message="Diagnostic too complex. Break into sub-issues.",
            recommendation="Focus on primary symptom first"
        )

    # DYNAMIC SYMPTOM CLASSIFICATION (Level 1)
    # Adapts to variable diagnostic information availability
    symptoms = extract_symptoms(user_input)

    # ADAPTIVE PRIORITY ASSIGNMENT (Level 2)
    # Different routing based on symptom clarity and severity
    if symptoms_clear(symptoms):
        # HEURISTIC CLASSIFICATION
        priority = classify_priority(symptoms)  # P1/P2/P3

        # Adaptive step sequencing based on category and priority
        # Example (Section 5.2.1):
        # Input: "Server crash"
        # → Output: "Category: Infrastructure, Priority: P1, 
        #           Steps: [Check logs→services→hardware]"

        return diagnostic_recommendation(symptoms, priority)

    else:
        # ADAPTIVE CLARIFICATION (Level 3)
        # Requests specific diagnostic information based on ambiguity type

        # Example (Section 5.2.1):
        # Input: "Something's slow"
        # → Output: "Insufficient data for classification"

        return request_symptom_details(
            message="Provide: error code, frequency, impact on operations.",
            bounded_scope="Focus on most critical issue only",  # Prevent scope creep
            classification_hint="Specify: Server/Network/Application/Database"
        )

Architectural Decision Guide (Table 5.1 Application)

def select_adaptation_pattern(task_characteristics):
    """
    Match implementation pattern to task structure (Section 5.2.1).

    Critical Principle: Over-engineering navigation wastes tokens;
                       under-engineering diagnostics fails variable patterns.
    """

    # PATTERN SELECTION DECISION TREE
    if task_characteristics["information_density"] == "unpredictable":
        if task_characteristics["requires_nlu_parsing"] == True:
            return "DYNAMIC"  # W1 Healthcare, W3 Diagnostics
            # Rationale: Natural language variability demands runtime adaptation

    elif task_characteristics["has_structured_relationships"] == True:
        if task_characteristics["allows_mathematical_transform"] == True:
            return "SEMI-STATIC"  # W2 Navigation
            # Rationale: Fixed spatial logic enables deterministic calculation

    elif task_characteristics["requires_heuristic_classification"] == True:
        if task_characteristics["complexity_varies"] == True:
            return "DYNAMIC"  # W3 Diagnostics
            # Rationale: Issue patterns require adaptive routing

    else:
        return "DYNAMIC"  # Default to dynamic for safety (handles variability)

G.4.2 Layer 2: Control Layer Decision Tree

Purpose: Route user inputs through appropriate processing paths based on complexity classification. Node complexity ≤5 decision points, path depth ≤3 levels (validated in T3/T7).

Route Selection Control Logic

def control_layer_router(user_input, context):
    """
    Control layer decision tree architecture.

    Constraints:
        - Node complexity: ≤5 decision points per node
        - Path depth: ≤3 levels maximum
        - Exit conditions: Explicitly defined for all paths
        - Fallback routes: From every decision point (T3/T7/T9 validation)
    """

    # INPUT COMPLEXITY CLASSIFICATION
    input_classification = classify_input_complexity(user_input)

    # ROUTING DECISION TREE
    if input_classification == "simple_query":
        # Single-turn resolution, no state tracking needed
        return direct_response_path(user_input)

    elif input_classification == "complex_request":
        # Multi-step workflow with state management
        return multi_step_path(user_input, context)

    elif input_classification == "ambiguous_input":
        # Clarification required before processing
        return clarification_path(user_input)

    elif input_classification == "invalid_input":
        # Error handling with recovery guidance
        return error_handling_path(user_input)

    else:
        # FALLBACK: Unrecognized pattern
        return fallback_escalation(
            message="Unrecognized input pattern.",
            suggestion="Rephrase or contact support."
        )


def multi_step_path(user_input, context):
    """
    Multi-step workflow implementation (e.g., booking, diagnostics).

    Example: Healthcare booking workflow (W1)
    Validation: Each step has explicit exit condition
    State Management: Stateless with explicit context passing (T4)
    """

    # DETERMINE CURRENT WORKFLOW STEP
    step = determine_current_step(context)

    # STEP 1: INTENT CLASSIFICATION
    if step == "intent_classification":
        intent = classify_intent(user_input)
        context.update({"intent": intent, "step_count": 1})
        return transition_to_step("slot_extraction")

    # STEP 2: SLOT EXTRACTION
    elif step == "slot_extraction":
        slots = extract_slots(user_input)
        context.update({"slots": slots, "step_count": 2})

        if slots_complete(slots):
            return transition_to_step("validation")
        else:
            # BOUNDED CLARIFICATION (≤2 iterations)
            return clarification_path(identify_missing(slots))

    # STEP 3: VALIDATION
    elif step == "validation":
        validated = validate_booking(context["slots"])
        context.update({"validated": validated, "step_count": 3})

        if validated:
            return transition_to_step("confirmation")
        else:
            return error_handling_path("Validation failed: " + validated.error)

    # STEP 4: CONFIRMATION
    elif step == "confirmation":
        return complete_booking(context["slots"])

    # FALLBACK: Inconsistent workflow state
    else:
        return fallback_escalation(
            message="Workflow state inconsistent.",
            context_snapshot=context,
            recovery="Restart from intent classification."
        )


def ensure_fallback_coverage(control_tree):
    """
    Validation function: Every node must have explicit fallback route.

    Evidence: T3/T7/T9 show ≥80% controlled degradation with fallbacks
    """
    for node in control_tree.all_nodes():
        assert node.has_fallback() == True, \
            f"CRITICAL: Node {node.id} missing fallback (T3/T7/T9 requirement)"

G.4.3 Layer 3: Execution Layer (Quantization-Aware)

Purpose: Select optimal quantization tier based on task complexity and hardware constraints, with dynamic routing Q1→Q4→Q8 when drift detected (T10 validation).

Quantization Tier Selection

def quantization_tier_selector(task_complexity, hardware_constraints):
    """
    Quantization-aware execution with dynamic tier routing.

    Based on T10 findings:
        - Q4 optimal for 80% of tasks
        - Q1→Q4 escalation when drift >10%
        - Q4→Q8 escalation when performance inadequate (<80%)

    Parameters:
        task_complexity: "simple" | "moderate" | "complex"
        hardware_constraints: {"ram_mb": int, "platform": str}
    """

    # TASK COMPLEXITY ASSESSMENT
    if task_complexity == "simple":  # FAQ, basic classification
        return try_q1_with_fallback()

    elif task_complexity == "moderate":  # Slot-filling, navigation
        return start_with_q4()

    elif task_complexity == "complex":  # Multi-step reasoning
        return start_with_q8()

    else:
        # HARDWARE OVERRIDE: Constraints supersede task complexity
        return hardware_constraint_override(hardware_constraints)


def try_q1_with_fallback():
    """
    Q1 tier: Ultra-minimal (Qwen2-0.5B, 300MB).

    Strategy: Start with Q1 for efficiency, escalate if drift detected.
    Validation: T10 shows 85% retention under Q1, 15% require escalation.
    """

    # LOAD Q1 MODEL
    model = load_model(tier="Q1", model_name="Qwen2-0.5B-Q1")
    response = model.generate(prompt)

    # SEMANTIC DRIFT DETECTION (T10 methodology)
    drift_score = calculate_semantic_drift(response, expected_output)

    if drift_score > 0.10:  # T10 threshold: >10% drift
        logger.warning(f"Q1 drift detected: {drift_score:.2f} > 0.10")
        logger.info("Escalating to Q4 tier...")
        return fallback_to_q4()

    else:
        logger.info(f"Q1 optimal efficiency: drift={drift_score:.2f}")
        return response


def start_with_q4():
    """
    Q4 tier: Optimal balance (TinyLlama-1.1B, 560MB).

    Evidence: T8 validation shows Q4 optimal for browser/WASM
    Performance: 95% task success rate, 430ms average latency
    """

    # LOAD Q4 MODEL
    model = load_model(tier="Q4", model_name="TinyLlama-1.1B-Q4")
    response = model.generate(prompt)

    # PERFORMANCE EVALUATION
    performance_score = evaluate_performance(response)

    if performance_score < 0.80:  # Performance inadequate threshold
        logger.warning(f"Q4 insufficient: performance={performance_score:.2f}")
        logger.info("Escalating to Q8 tier...")
        return escalate_to_q8()

    else:
        logger.info(f"Q4 validated sweet spot: performance={performance_score:.2f}")
        return response


def start_with_q8():
    """
    Q8 tier: Complex reasoning (Llama-3.2-1B, 800MB).

    Use case: Multi-step diagnostics, complex spatial reasoning
    Validation: Required Q4 justification per G.2.5 Anti-Pattern 3
    """

    # LOAD Q8 MODEL
    model = load_model(tier="Q8", model_name="Llama-3.2-1B-Q8")
    response = model.generate(prompt)

    # OVERPROVISIONING CHECK
    if is_overprovisioned(response, task_complexity):
        logger.info("Q8 overkill detected, downgrading to Q4...")
        return downgrade_to_q4()

    else:
        logger.info("Q8 necessary for task complexity")
        return response


def hardware_constraint_override(constraints):
    """
    Hardware limitations override task complexity decisions.

    Priority: Hardware constraints > Task complexity preferences
    Evidence: T8 shows platform-specific optimal tiers
    """

    ram_available = constraints["ram_mb"]
    platform = constraints.get("platform", "unknown")

    # CONSTRAINT 1: Severe RAM limitation
    if ram_available < 256:
        logger.warning(f"RAM {ram_available}MB < 256MB: Forcing Q1/Q4 only")
        return force_q1_q4_only()

    # CONSTRAINT 2: Moderate RAM limitation
    elif 256 <= ram_available < 1024:
        logger.info(f"RAM {ram_available}MB: Q4/Q8 acceptable")
        return allow_q4_q8()

    # CONSTRAINT 3: Browser/WASM platform
    elif platform == "browser_wasm":
        logger.info("Browser/WASM detected: Q4 optimal (T8 validation)")
        return force_q4_tier()

    # CONSTRAINT 4: Unconstrained
    else:
        logger.info(f"RAM {ram_available}MB >1GB: All tiers available")
        return allow_all_tiers()


def dynamic_tier_router(prompt, initial_tier="Q4"):
    """
    Continuous monitoring with automatic escalation/degradation.

    Adaptive Strategy: Start conservative, adjust based on drift
    Validation: T10 shows dynamic routing improves efficiency 18%
    """

    current_tier = initial_tier
    drift_history = []
    max_iterations = 3  # Prevent infinite escalation loops

    for iteration in range(max_iterations):
        # GENERATE WITH CURRENT TIER
        response = generate_with_tier(prompt, current_tier)

        # CALCULATE DRIFT
        drift = calculate_semantic_drift(response)
        drift_history.append(drift)

        # ESCALATION DECISION
        if drift > 0.10 and current_tier < "Q8":
            logger.info(f"Iteration {iteration}: Drift {drift:.2f} >10%, escalating")
            current_tier = escalate_tier(current_tier)
            continue

        # DEGRADATION DECISION
        elif drift < 0.05 and current_tier > "Q1" and is_overprovisioned():
            logger.info(f"Iteration {iteration}: Drift {drift:.2f} <5%, downgrading")
            current_tier = degrade_tier(current_tier)
            continue

        # STABLE TIER FOUND
        else:
            logger.info(f"Tier {current_tier} stable: drift={drift:.2f}")
            return response

    # MAX ITERATIONS REACHED
    logger.warning(f"Max iterations reached, using tier {current_tier}")
    return response

G.4 Output: Three-layer architecture with embedded decision logic and quantization-aware execution → PROCEED TO PHASE 5 (Appendix G.5)

G.5 Phase 5: Evidence-Based Validation Test Protocols

Purpose: Validate MCD implementation against empirical thresholds from Chapters 6-7 using T1-T10 test methodologies and W1-W3 domain-specific protocols. All tests reference established baselines with quantified pass/fail criteria.

G.5.1 Core MCD Validation Suite (T1-T10 Protocols)

Test	Objective	Pass Threshold	Evidence Source
T1-Style	Approach effectiveness vs alternatives	≥90% expected performance	Chapter 6.2
T4-Style	Stateless context reconstruction	≥90% recovery (5/5 vs 2/5 implicit)	Section 6.3.4
T6-Style	Over-engineering detection (RI calculation)	RI ≤10, no components >20% overhead	Section 6.3.6
T7-Style	Constraint stress testing	≥80% controlled failure, no hallucination	Section 6.3.7
T8-Style	Deployment environment (browser/WASM)	Zero crashes, <500MB RAM, <500ms latency	Section 6.3.8
T10-Style	Quantization tier validation (Q1→Q4→Q8)	Optimal tier selected ≥90% cases	Section 6.3.10

Implementation Note: Run each test with n=5 trials minimum per configuration. Calculate 95% confidence intervals for completion rates. Document all failures with root cause analysis.

G.5.2 Domain-Specific Validation (W1-W3 Protocols)

W1 Protocol (Healthcare Booking):

Task domain deployment with comparative performance vs Few-Shot/Conversational
Metrics: Completion rate, token efficiency, latency, UX score
Target: ≥85% completion under Q4 constraints (Chapter 7.2)

W2 Protocol (Spatial Navigation):

Real-world scenario execution under stateless constraints
Metrics: Route accuracy, coordinate precision, safety communication
Target: ≥80% successful navigation with transparent limitation acknowledgment (Chapter 7.3)

W3 Protocol (System Diagnostics):

Failure mode documentation with priority classification (P1/P2/P3)
Metrics: Diagnostic accuracy, bounded scope adherence, systematic troubleshooting
Target: ≥85% correct priority assignment, no fabricated root causes (Chapter 7.4)

G.5.3 Multi-Dimensional Diagnostic Checks

Decision Tree Health Metrics:

Average path length: ≤3 levels (T5 constraint)
Branching factor: ≤4 per node (complexity limit)
Fallback activation frequency: Monitor for >15% (indicates edge case gaps)
Dead paths: Zero unused routes after test coverage

Context-Optimality Scoring:

Resource-constrained: Efficiency score ≥80%
User experience: UX score ≥75%
Professional quality: Quality score ≥85%

Performance vs Complexity Analysis:

Plot: Efficiency vs resource usage
Identify: Pareto frontier for optimal trade-offs
Validate: Token cost justified by measurable improvement

G.5.4 Final Deployment Decision Matrix

DEPLOYMENT_READINESS_CHECKLIST:

✓ Core Tests (T1-T10):
  - All tests PASS with thresholds met
  - Decision trees validated (depth ≤3, branches ≤4)
  - Redundancy Index ≤10 for all components

✓ Domain Tests (W1-W3):
  - Representative domain scenarios tested
  - Comparative analysis vs baseline approaches documented
  - Failure modes characterized with recovery strategies

✓ Context Requirements:
  - Efficiency priority → Score ≥80%
  - UX priority → Score ≥75%
  - Quality priority → Score ≥85%

DECISION LOGIC:
  IF all_core_tests == PASS AND domain_validation == PASS AND context_requirements == MET:
    → DEPLOY MCD AGENT ✅
    → Document: Performance baselines, monitoring thresholds

  ELSE:
    → RETURN TO FAILED PHASE for redesign
    → Document: Specific failure modes, remediation plan
    → ITERATE: Fix issues, re-run validation

UNSUITABLE DETERMINATION:
  IF multiple_iterations_fail OR fundamental_constraint_mismatch:
    → Recommend alternative frameworks (LangChain, AutoGPT)
    → Document: Justification with empirical evidence

G.5.5 Monitoring Integration Post-Deployment

Ongoing Validation (Production Environment):

Semantic Drift Monitor: Continuous comparison across quantization tiers, alert if drift >10%
Dynamic Tier Selection: Automatic Q1→Q4→Q8 escalation with performance tracking
Performance Benchmarking: Weekly validation against established efficiency thresholds from Chapter 6
Usage Pattern Analysis: Monthly review of component utilization, flag if any <10% (G.3.3 Q9)

G.5 Output: Validated MCD implementation ready for deployment with documented performance characteristics and monitoring plan.

Appendix F

Designing Lightweight AI Agents for Edge Deployment

Appendix G: MCD Framework Decision Tree Implementation

G.1 Phase 1: Context Assessment & Requirements Analysis

Q1: Primary Deployment Context Classification

Q2: Optimization Priority Assignment

Q3: Stateless Capability Assessment

Q4: Token Budget Classification

G.2 Phase 2: Prompt Engineering Approach Selection

G.2.1 Efficiency Priority Decision Tree

G.2.2 User Experience Priority Decision Tree

G.2.3 Quality Priority Decision Tree

G.2.4 Hybrid Priority Decision Tree

G.2.5 Anti-Pattern Enforcement (Critical Validation)

G.3 Phase 3: MCD Principle Application Workflows

G.3.1 Step 1: Minimality by Default Validation

Q5: Component Necessity Assessment

G.3.2 Step 2: Bounded Rationality Application

Q6: Reasoning Chain Complexity Assessment

Mitigation Options for High-Complexity Tasks (>3 steps)

Q7: Token Budget Allocation

G.3.3 Step 3: Degeneracy Detection

Q8: Redundancy Index Calculation

Q9: Usage Pattern Analysis

Practical Example:

G.4 Phase 4: Layer Implementation with Decision Trees

G.4.1 Layer 1: Prompt Layer Design (With Adaptation Patterns)

Adaptation Pattern Classification (Table 5.1 Integration)

Intent Classification Decision Tree Structure

Pattern 1: Dynamic Slot-Filling (W1 - Healthcare Booking)

Pattern 2: Semi-Static Deterministic Logic (W2 - Navigation)

Pattern 3: Dynamic Heuristic Classification (W3 - Diagnostics)

Architectural Decision Guide (Table 5.1 Application)

G.4.2 Layer 2: Control Layer Decision Tree

Route Selection Control Logic

G.4.3 Layer 3: Execution Layer (Quantization-Aware)

Quantization Tier Selection

G.5 Phase 5: Evidence-Based Validation Test Protocols

G.5.1 Core MCD Validation Suite (T1-T10 Protocols)

G.5.2 Domain-Specific Validation (W1-W3 Protocols)

G.5.3 Multi-Dimensional Diagnostic Checks

G.5.4 Final Deployment Decision Matrix

G.5.5 Monitoring Integration Post-Deployment

End

End of Appendix G: MCD Framework Decision Tree Implementation

- End of Appendixes -

Topics