Skip to content

Testing Standards & Verification

Version: 4.2 Context: Verification methodology, CDV, and Mutation Analysis.

1. The Oracle Pattern

To verify the correctness of our highly optimized, bit-twiddling ScoringEngine, we employ a Simple Oracle.

The Concept

  • Implementation: A naive, readable, inefficient implementation of the logic.
  • Usage: Property Tests generate random inputs and assert Optimized.result() == Oracle.result().

Physics Oracle

  • Optimized: Uses flattened Vec<f32>, pre-calculated lookups, and unsafe indices (checked at compile time).
  • Oracle: Uses HashMap<(char, char), f32> and standard iteration.
  • Invariant (Exact Tier): The "Exact" scoring engine must match the Oracle bit-for-bit.
  • Invariant (Search Tiers): "Search" engines (Generic/Hardware-Specific) may drift within a strict epsilon. The test harness must recognize which engine is being tested and adjust the assertion accordingly.

2. Coverage-Driven Verification (CDV)

We use code coverage (tarpaulin) to identify Logic Dark Matter—code branches generated by the LLM that have never been executed.

Criticality Tiering

  1. Tier 1 (The Nucleus): keyforge-physics, keyforge-evolution.
    • Requirement: 95%+ Branch Coverage + Property Testing.
  2. Tier 2 (The Contract): keyforge-protocol, keyforge-model.
    • Requirement: 100% Validation Coverage.
  3. Tier 3 (The Shell): keyforge-infra, keyforge-hive.
    • Requirement: Error Path Verification.

3. Mutation Testing (cargo-mutants)

We do not just test that the code works; we test that the tests work.

  • Tool: cargo-mutants.
  • Mechanism: The tool injects synthetic bugs (mutants) into the source code (e.g., changing + to -, or if x < y to if x <= y) and runs the test suite.
  • Success: The test suite FAILS (The mutant is "Caught").
  • Failure: The test suite PASSES (The mutant "Survived"). This indicates a gap in test coverage.

Rule: A PR affecting keyforge-physics cannot be merged if it introduces "Survived Mutants".

4. The Reflexion Loop

When a property test or mutation test fails, we enforce a reflection step:

  1. Input: Failing test case + Code.
  2. Task: "Explain the logic error in natural language."
  3. Action: "Generate the fix based on that explanation."

5. Performance Budget

Tier 1 code must adhere to strict latency budgets.

Component Budget (per op) Benchmark
ScoringEngine::score < 50ns cargo bench --bench physics
Mutation::swap < 5ns cargo bench --bench mutation
Anneal Step < 100ns Aggregate of swap + score + decision.

6. Test Types & Locations

Unit Tests

  • Location: Inside src/ (typically mod tests at the bottom of the file or in a sibling file).
  • Perspective: Internal. Sit on the internal side of an interface.
  • Scope: Test specific internal logic in isolation.
  • Access: Have access to private methods (pub(crate), private fields) and implementation details.

Integration Tests

  • Location: Inside tests/ folder of the crate.
  • Perspective: External. Sit on the external (public) side of an interface.
  • Scope: Interact with the component strictly through its defined public API. Verify that various parts of the system work together correctly as a whole.
  • Mechanism: Treat the component as a black box. No knowledge of internal workings or access to private state.