Architecture Decision Records (ADR)
Version: 4.8 Context: Log of significant architectural decisions.
Index
- ADR-001: Hexagonal Architecture
- ADR-002: No Async in Physics
- ADR-003: Shared Secret Auth
- ADR-004: Postcard for Deterministic Hashing
- ADR-005: Feature Gating TypeScript Bindings
- ADR-006: Universal Domain Validation
- ADR-007: Lowercase Alpha Normalization
- ADR-008: Synthetic Corpus Injection
- ADR-009: Optimal Choice for Duplicate Keys
- ADR-010: Distributed Coordination via Valkey
- ADR-011: The Thin Client CLI (Agent Runner)
- ADR-012: The Control/Asset Plane Split
- ADR-013: Hybrid Development Environment
- ADR-014: Subdomain Architecture
- ADR-015: Data Decoupling and Testing Strategy
- ADR-016: Multi-Tiered Scoring Implementations
ADR-001: Hexagonal Architecture
- Status: Accepted
- Date: 2024-01-01
- Context: We need to swap databases and frontends without rewriting core logic.
- Decision: Use Ports & Adapters.
keyforge-physicsmust have ZERO dependencies onkeyforge-infra. - Consequences:
- (+) Core logic is purely testable.
- (-) Boilerplate for DTO conversion.
ADR-002: No Async in Physics
- Status: Accepted
- Date: 2024-01-15
- Context: Scoring is CPU-bound. Async overhead slows down tight loops.
- Decision:
keyforge-physicsandkeyforge-evolutionwill be synchronous. - Consequences:
- (+) Maximum CPU throughput.
- (-) Workers must use
spawn_blockingto prevent blocking the reactor.
ADR-003: Shared Secret Auth
- Status: Accepted
- Date: 2024-02-01
- Context: We need a simple way to secure a private cluster. OAuth is overkill.
- Decision: Use a static Shared Secret (Bearer Token).
- Consequences:
- (+) Simple to implement.
- (-) Key rotation requires restarting all nodes.
ADR-004: Postcard for Deterministic Hashing
- Status: Accepted
- Date: 2025-12-30
- Context:
bincodeis unmaintained. We need a stable, deterministic binary serialization format for generatingJobIdentifierhashes. - Decision: Replace
bincodewithpostcard. - Consequences:
- (+) Maintained,
no_stdcompatible, designed for embedded/deterministic use. - (-) Breaking change for existing Job IDs (hashes will change).
ADR-005: Feature Gating TypeScript Bindings
- Status: Accepted
- Date: 2025-12-30
- Context:
ts-rsemits warnings about#[serde]attributes it doesn't understand (likedeserialize_with). These warnings pollute the build log. - Decision: Gate
ts-rsbehind ats_bindingsfeature flag. - Consequences:
- (+) Clean build logs for standard development (
cargo test). - (+) Reduced dependency footprint for production builds.
- (-) Developers must explicitly run
cargo test --features ts_bindingsto verify bindings.
ADR-006: Universal Domain Validation
- Status: Accepted
- Date: 2025-12-31
- Context: Domain entities were being deserialized at various boundaries (API, Disk, WASM) without consistent validation.
- Decision: Implement the
Validatortrait for all Domain Entities inkeyforge-model. Enforce.validate()calls immediately after deserialization at all system boundaries. - Consequences:
- (+) Guarantees invalid data is caught at the IO boundary before entering the domain.
- (+) Centralizes validation logic in the Model layer.
- (-) Requires boilerplate
Validatorimplementations for simple structs.
ADR-007: Lowercase Alpha Normalization
- Status: Accepted
- Date: 2025-12-31
- Context: Keyboards send keycodes (e.g.,
KC_A), but text corpora usually contain lowercase characters ('a'). - Decision: Normalize all alphabetic keycodes to lowercase internally within the Domain Model. Uppercase is treated purely as a Presentation Layer concern.
- Consequences:
- (+) Simplifies scoring logic.
- (+) Consistent with standard text corpora.
- (-) UI must explicitly uppercase keys for display if desired.
ADR-008: Synthetic Corpus Injection
- Status: Accepted
- Date: 2026-01-02
- Context: Standard text corpora lack explicit
EnterandBackspaceevents, leading to suboptimal placement of these keys. - Decision: Inject synthetic frequency data for
EnterandBackspaceinto_stdcorpora at load time based on error rate and punctuation models. - Consequences:
- (+) Layouts optimize for realistic typing flows, including errors and formatting.
- (+)
BackspaceandEnterare pulled closer to the home row. - (-) Synthetic distribution is heuristic.
ADR-009: Optimal Choice for Duplicate Keys
- Status: Accepted
- Date: 2026-01-06
- Context: Keyboards often feature duplicate keys (e.g., split spacebars or bilateral modifiers). Previous logic used a "last one wins" or "distributed load" approach, which failed to model an "optimal typist" who chooses the best physical key for a given context.
- Decision: Implement "Optimal Choice" logic. For every monogram, bigram, and trigram, the engine dynamically searches all physical instances of the involved characters to find the specific key (or combination) that yields the absolute minimum cost.
- Consequences:
- (+) More realistic modeling of advanced and split layouts.
- (+) Provides the architectural foundation for future stateful keys (e.g., Repeat key).
- (+) Simplifies physics by removing
SpaceHandPreference(now handled by the typist model). - (-) Increased computational complexity in the fast-path
score_layoutdue to dynamic searching.
ADR-010: Distributed Coordination via Valkey
- Status: Accepted
- Date: 2026-01-07
- Context:
- Process Isolation: The Hive WebSocket event loop was process-local (
tokio::broadcast). If we scaled Hive to multiple instances, users connected to Instance A could not see events from Instance B. - Database Load: High-frequency ephemeral data (Agent Heartbeats at 1Hz, Real-time Score updates) was being written to PostgreSQL, causing write amplification and WAL bloat for data that has no long-term value.
- Consistency: Asset caches were managed locally. If an admin updated a file on one server, others would serve stale data.
- Decision: Introduce Valkey (an open-source Redis fork) as the Coordination Layer.
- Role: Acts as the "Nervous System" for the cluster.
- Mechanism:
- Heartbeats:
SETEXkeys with TTL (Auto-expiry for dead nodes). - Chatter:
PUBLISH/SUBSCRIBEfor real-time score updates across the cluster. - Manifest: Stores the authoritative hash of system assets to ensure consistency.
- Heartbeats:
- Consequences:
- (+) Scalability: Hive is now stateless and can scale horizontally.
- (+) Observability: Real-time visibility into Agent temperature/IPS without SQL polling.
- (+) Efficiency: Zero SQL writes for ephemeral state.
- (-) Complexity: Adds a 4th container to the stack.
- (-) Dependency: Hive requires Valkey to start (hard dependency for coordination).
- (-) Refactor: Requires
testcontainersfor integration testing.
ADR-011: The Thin Client CLI (Agent Runner)
- Status: Accepted
- Date: 2026-01-10
- Context: The CLI previously statically linked
keyforge-computeandkeyforge-physics. This created two execution environments: Local (CLI-embedded engine) and Remote (Agent-embedded engine). Ensuring parity was difficult and CLI compile times were slow. - Decision: Refactor the CLI into a "Thin Client". It performs IO and Asset Management but delegates all heavy lifting (
search,bench,validate) to thekeyforge-agentbinary via subprocess spawning. - Consequences:
- (+) Parity: Local and Remote runs use the exact same binary (
keyforge-agent). - (+) Build Times: CLI compiles faster as it drops heavy math dependencies.
- (-) UX: Requires user to have
keyforge-agentin PATH or side-loaded.
ADR-012: The Control/Asset Plane Split
- Status: Accepted
- Date: 2026-01-12
- Context: Hive currently handles both "Logic" (Jobs, Auth) and "Content" (Serving 50MB corpus files). High download traffic from agents/CLIs competes with the API's CPU resources.
- Decision: Split the architecture into two distinct services:
- Hive (Control Plane): Handles orchestration, auth, and state.
- Assets (Data Plane): A dedicated, stateless HTTP service that streams assets directly from Valkey.
- Consequences:
- (+) Scalability: Asset Server can be cached via CDN or scaled independently.
- (+) Security: Asset Server can be public (read-only), while Hive remains private/authenticated.
- (+) Architecture: Explicit separation of concerns.
- (-) Complexity: Clients must manage two base URLs (
KEYFORGE_API_URLandKEYFORGE_ASSET_URL).
ADR-013: Hybrid Development Environment
- Status: Accepted
- Date: 2026-01-13
- Context: Developing against a full Docker stack is slow (rebuild times). Developing purely locally misses infrastructure issues (HTTPS, CORS, Proxy rules).
- Decision: Implement a Hybrid Mode (
just serve). - Infrastructure: DB, Valkey, and Web Proxy run in Docker.
- Logic:
keyforge-hiveruns natively on the host (Port 3002). - Bridge:
socatcontainers proxy traffic from the Web Container (Port 3000/3001) to the Host (Port 3002/3003). - Consequences:
- (+) Speed: Instant compilation/restart for Rust code.
- (+) Parity: Frontend talks to "Production-like" HTTPS endpoints (
https://localhost:3000). - (-) Complexity: Requires
extra_hostsconfiguration for Linux support.
ADR-014: Subdomain Architecture
- Status: Accepted
- Date: 2026-01-13
- Context: Cloudflare Proxy (Orange Cloud) only supports specific ports (80, 443, etc.). Our previous architecture used ports 3000/3001 for API/Assets, which exposed the origin IP and bypassed DDoS protection.
- Decision: Move to a Subdomain-based Architecture on Port 443.
keyforge.infodungeon.com-> Frontend (Static).api.keyforge.infodungeon.com-> Hive API.assets.keyforge.infodungeon.com-> Asset Server.- Consequences:
- (+) Security: All traffic flows through Cloudflare WAF on Port 443.
- (+) Standards: No non-standard ports required for clients.
- (-) DNS: Requires managing multiple A records.
ADR-015: Data Decoupling and Testing Strategy
- Status: Accepted
- Date: 2026-01-17
- Context:
- Coupling: High rigidity in data models (
CostModel,SearchParams) meant that adding experimental parameters required full-stack refactors. - Testing: The test suite was brittle ("Inverted Pyramid"), with unit logic tested in integration scopes and a lack of clear intent documentation.
- Decision:
- Data-Driven Configuration: Use the Parameter Map pattern (e.g.,
HashMap<String, f32>viaserde(flatten)) for volatile configuration data. Consuming crates own the semantics; the structure is generic. - Testing Hierarchy: Enforce a strict separation:
- Unit Tests (
src/): Rigorous, exhaustive verification of logic and math. - Integration Tests (
tests/): Contract/Wiring verification only. - Zero Duplication: Unit logic MUST NOT be re-verified in the integration layer.
- Crate Affinity: Tests must reside in the crate that owns the integration point.
- Unit Tests (
- Documentation Standard: All tests must define Intent and Expected Result.
- Fixture-Based Testing: Use external JSON "Golden Files" instead of code-constructed test data.
- Consequences:
- (+) Flexibility: New physics parameters can be added without binary updates or breaking schema changes.
- (+) Stability: Tests are decoupled from internal implementation details.
- (+) Maintainability: Clearer ownership of logic vs. wiring.
- (-) Safety: Loss of compile-time type checking for configuration fields (mitigated by Runtime Schema Validation).
ADR-016: Multi-Tiered Scoring Implementations
- Status: Accepted
- Date: 2026-01-19
- Context:
- The Conflict: We have two opposing needs:
- Absolute Truth: UI analysis and Hive verification require a 100% trustworthy, bit-perfect score.
- Maximum Throughput: Search algorithms (Annealing) need to evaluate millions of layouts per second. Optimization techniques (SIMD, Cache Blocking) often introduce minor precision "drift" or are hardware-specific.
- Previous State: The "Oracle Pattern" enforced that the Optimized Engine match the Naive Oracle bit-for-bit. This capped optimization headroom.
- Decision: Adopt a Three-Tiered Implementation Strategy:
- Exact (The Oracle):
- Role: Single source of truth. Used for UI analysis, final verification, and "Gold Standard" checks.
- Constraint: Must be bit-perfect and simple.
- Generic Optimized (The Workhorse):
- Role: High-speed search on unknown hardware.
- Constraint: Portable Rust. Allows documented, bounded drift from the Oracle.
- Hardware Specific (The F1 Car):
- Role: Extreme optimization for specific CPUs (starting with Intel Family 6/Comet Lake).
- Constraint: Uses CPUID/Cache detection. optimization is theoretically unbounded. Allows drift.
- Consequences:
- (+) Performance: Unlocks hardware-specific tuning (L1/L2 cache blocking).
- (+) Reliability: "Exact" tier ensures user-facing numbers are always correct.
- (-) Complexity: Testing must now account for "allowed drift" rather than strict equality for search tiers.