Testing Standards & Verification
Version: 4.2 Context: Verification methodology, CDV, and Mutation Analysis.
1. The Oracle Pattern
To verify the correctness of our highly optimized, bit-twiddling ScoringEngine, we employ a Simple Oracle.
The Concept
- Implementation: A naive, readable, inefficient implementation of the logic.
- Usage: Property Tests generate random inputs and assert
Optimized.result() == Oracle.result().
Physics Oracle
- Optimized: Uses flattened
Vec<f32>, pre-calculated lookups, andunsafeindices (checked at compile time). - Oracle: Uses
HashMap<(char, char), f32>and standard iteration. - Invariant (Exact Tier): The "Exact" scoring engine must match the Oracle bit-for-bit.
- Invariant (Search Tiers): "Search" engines (Generic/Hardware-Specific) may drift within a strict epsilon. The test harness must recognize which engine is being tested and adjust the assertion accordingly.
2. Coverage-Driven Verification (CDV)
We use code coverage (tarpaulin) to identify Logic Dark Matter—code branches generated by the LLM that have never been executed.
Criticality Tiering
- Tier 1 (The Nucleus):
keyforge-physics,keyforge-evolution.- Requirement: 95%+ Branch Coverage + Property Testing.
- Tier 2 (The Contract):
keyforge-protocol,keyforge-model.- Requirement: 100% Validation Coverage.
- Tier 3 (The Shell):
keyforge-infra,keyforge-hive.- Requirement: Error Path Verification.
3. Mutation Testing (cargo-mutants)
We do not just test that the code works; we test that the tests work.
- Tool:
cargo-mutants. - Mechanism: The tool injects synthetic bugs (mutants) into the source code (e.g., changing
+to-, orif x < ytoif x <= y) and runs the test suite. - Success: The test suite FAILS (The mutant is "Caught").
- Failure: The test suite PASSES (The mutant "Survived"). This indicates a gap in test coverage.
Rule: A PR affecting keyforge-physics cannot be merged if it introduces "Survived Mutants".
4. The Reflexion Loop
When a property test or mutation test fails, we enforce a reflection step:
- Input: Failing test case + Code.
- Task: "Explain the logic error in natural language."
- Action: "Generate the fix based on that explanation."
5. Performance Budget
Tier 1 code must adhere to strict latency budgets.
| Component | Budget (per op) | Benchmark |
|---|---|---|
ScoringEngine::score |
< 50ns | cargo bench --bench physics |
Mutation::swap |
< 5ns | cargo bench --bench mutation |
Anneal Step |
< 100ns | Aggregate of swap + score + decision. |
6. Test Types & Locations
Unit Tests
- Location: Inside
src/(typicallymod testsat the bottom of the file or in a sibling file). - Perspective: Internal. Sit on the internal side of an interface.
- Scope: Test specific internal logic in isolation.
- Access: Have access to private methods (
pub(crate), private fields) and implementation details.
Integration Tests
- Location: Inside
tests/folder of the crate. - Perspective: External. Sit on the external (public) side of an interface.
- Scope: Interact with the component strictly through its defined public API. Verify that various parts of the system work together correctly as a whole.
- Mechanism: Treat the component as a black box. No knowledge of internal workings or access to private state.