Testing
Testing
Status: stable reference
This document describes the test architecture, coverage matrix, determinism policy, and execution model for Concrete.
Test Architecture
The test system has four layers, ordered by cost:
| Layer | Tool | Cost | What it catches |
|---|---|---|---|
| Pass-level | Concrete/PipelineTest.lean (Lean executable) | <1s, no I/O | Parse errors, type errors, elaboration bugs, monomorphization bugs, SSA verify/cleanup invariants, emit correctness |
| Artifact | run_tests.sh --report, --codegen | ~1s each, no clang | Report output regressions, SSA structure, LLVM IR shape, codegen differentials |
| End-to-end | run_tests.sh positive/negative | ~0.5s each, needs clang | Full compile-and-run behavior, runtime correctness |
| Stress/integration | run_tests.sh integration tests | ~1s each, needs clang | Multi-feature interactions, deep call chains, realistic programs |
Pass-Level Lean Tests
Concrete/PipelineTest.lean exercises individual compiler passes directly on in-memory source strings. No subprocess, no clang, no file I/O.
Current coverage (32 tests):
- Parse (4): valid programs parse, malformed input rejected
- Frontend (8): parse→check→elaborate on structs/enums/traits/generics, type errors and undefined vars rejected
- Monomorphize (2): generic and trait programs monomorphize
- SSA Lowering (2):
lowerModuleproduces functions with blocks - SSA Verify (3):
ssaVerifyProgramaccepts valid SSA from simple/enum/generic programs - SSA Cleanup (2):
ssaCleanupProgramruns without crash, double-cleanup is idempotent - SSA Emit (2):
emitSSAProgramproduces LLVM IR, test mode works - Full pipeline (5): source → LLVM IR for 5 program shapes
- Layout/ABI (4): scalar sizes/alignments, builtin type sizes, repr(C) struct layout with field offsets, pass-by-pointer decisions
Run: lake build pipeline-test && .lake/build/bin/pipeline-test
Artifact Tests
Report and codegen tests consume cached compiler output without invoking clang.
- Report tests (~70 assertions): content checks across all 8
--reportmodes (caps, unsafe, layout, interface, mono, alloc, authority, proof) - Codegen differential tests: SSA optimization verification (constant folding, strength reduction), codegen structure (GEP offsets, enum tags, aggregate promotion), cross-representation consistency (packed struct, enum payload, Core→SSA agreement)
Compiler output is cached by (file, flags) key. Multi-assertion report tests reuse a single compilation (26/57 cache hits per fast run).
End-to-End Tests
Compile-and-run tests in lean_tests/:
- Positive (~230): compile, run, check exit code matches expected value
- Negative (~170): compile, expect specific error message in stderr
- Abort (1): compile, run, expect crash
- Test flag (4):
--testmode with pass/fail/mixed/submodule programs - O2 (~90): same programs compiled with
-O2, check same results - Cross-target (~25): IR verified to compile for x86_64 via
clang --target - Performance (1): regression check against saved baseline (>20% warning)
Stdlib Tests
222 #[test] functions across all stdlib modules, compiled through the real compiler path. Module-targeted testing via --stdlib-module <name>.
15 collection modules verified for test presence and correctness.
Integration / Stress Tests
Named real-program corpus (13 tests):
integration_text_processing.con— string/parsing pipelineintegration_data_structures.con— struct/enum data flowintegration_error_handling.con— Result/Option error chainsintegration_collection_pipeline.con— multi-collection pipeline with Vec, generics, enums, allocation patternsintegration_generic_pipeline.con(~150 lines) — 5-layer borrow chain, trait dispatch, complex enum matchingintegration_state_machine.con(~170 lines) — 4-state × 5-command nested match, struct construction in match armsintegration_compiler_stress.con(~200 lines) — deep generic instantiation, multi-trait dispatch, nested enum matchingintegration_multi_module.con+helper.con— cross-module types, traits, and enum matchingintegration_recursive_structures.con(~200 lines) — recursive expression evaluation, stack-based computationintegration_multi_file_calculator.con(~200 lines) — 3-module RPN evaluator with trait dispatchintegration_type_registry.con(~248 lines) — 3-module catalog with validation/metricsintegration_pipeline_processor.con(~223 lines) — 4-module data transformationintegration_stress_workload.con(~280 lines) — 4-module bytecode interpreter with 11-variant enum
Coverage Matrix
By failure mode
| Failure mode | Test layer | Test count | Key tests |
|---|---|---|---|
| Parser crash/hang | Fuzz | 500 iter | test_parser_fuzz.sh |
| Parse rejection | Pass-level + E2E | 4 + ~20 | parse/*, error_resolve_* |
| Name resolution | Pass-level + E2E | 2 + ~10 | frontend/*, error_resolve_*, module_* |
| Type/capability errors | Pass-level + E2E | 8 + ~60 | frontend/*, error_type_*, error_cap_*, error_borrow_* |
| Linearity violations | E2E | ~15 | error_unconsumed*, error_use_after_*, error_linear_*, linear_* |
| Elaboration/trait bugs | Pass-level + E2E | 2 + ~20 | mono/*, trait_*, generic_*, error_trait_* |
| Lowering invariants | Pass-level + E2E | 5 + ~30 | lower/*, verify/*, struct_*, enum_*, regress_* |
| SSA verification | Pass-level | 3 | verify/valid-* |
| SSA cleanup | Pass-level | 2 | cleanup/* |
| Codegen structure | Artifact | ~16 | codegen_*, struct/enum GEP/tag checks |
| LLVM emission | Pass-level + Artifact | 2 + ~10 | emit/*, cross-representation checks |
| Runtime behavior | E2E | ~180 | All positive run_ok tests |
| -O2 regressions | E2E | ~90 | All computation, struct/enum, linearity, borrow, generic, trait, string, heap, vec, complex, integration, phase3 programs |
| Report accuracy | Artifact | ~90 | report_integration.con, report_*_check.con, test_proof_*.con, phase3 consistency cross-checks |
| ABI / FFI | E2E | 9 | test_repr_c_*, test_fn_ptr_*, test_sizeof_*, test_ptr_round_trip, test_array_bounds, phase3_abi_interop |
| Layout/ABI | Pass-level | 4 | Scalar sizes, builtin sizes, repr(C), pass-by-ptr |
| Stdlib correctness | Stdlib | 184 | All #[test] functions |
| Collection integrity | Stdlib | 15 modules | Collection verification section |
| Multi-module | E2E | 22 | module_*, summary_*, module_file/ |
| Formatter | Property | 4 | fmt_parse_roundtrip.con, golden tests |
By compiler pass
| Pass | Direct pass-level tests | Owned E2E tests | Total coverage |
|---|---|---|---|
parse | 4 | ~20 | Strong |
resolve | (via frontend) | ~12 | Moderate |
check | (via frontend) | ~60 | Strong |
elab | (via frontend) | ~20 | Moderate |
core_check | (via frontend) | ~10 | Moderate |
mono | 2 | ~15 | Moderate |
lower | 2 | ~30 | Strong |
ssa_verify | 3 | ~5 | Moderate |
ssa_cleanup | 2 | ~5 | Moderate |
emit_ssa | 2 | ~180 | Strong |
layout | 4 | ~10 | Strong |
report | — | 44 | Strong |
format | — | 4 | Light |
Dependency-Aware Test Selection
--affected mode
run_tests.sh --affected detects changed files via git diff and selects only the test sections that exercise those compiler passes.
./run_tests.sh --affected # auto-detect from git diff
./run_tests.sh --affected Concrete/Lower.lean # explicit file list
The mapping from compiler source files to test sections lives in test_dep_map.toml. The mapping is conservative — when in doubt, more sections run.
Example mappings:
Concrete/Check.lean→ positive, negative, passlevel (type/cap/linearity tests)Concrete/Lower.lean→ positive, codegen, O2, passlevel (lowering + backend tests)Concrete/Report.lean→ report (report output tests only)std/src/*→ stdlib, collection (stdlib tests only)- Unknown files → full suite (safe fallback)
Explanation
After an --affected run, the summary shows which files triggered which sections:
=== Affected mode ===
changed files: Concrete/Lower.lean,Concrete/SSACleanup.lean
sections: codegen,O2,passlevel,positive
Structured Test Metadata
Manifest
test_manifest.toml is reference metadata for test cases. It is not consumed by run_tests.sh directly — test execution is driven by the shell script’s section structure. The manifest captures the metadata that shell sections encode implicitly, making it queryable and auditable for documentation and future tooling. Each test entry includes:
file— path relative to repo rootcategory— semantic category (unit,semantic,lowering,codegen,report,integration,stress,stdlib,regression,property,fuzz)kind— execution kind (run_ok,run_err,run_abort,run_test,check_report,check_codegen,check_O2,lean_pass)passes— which compiler passes this test exercisesprofile— run profile (fast,slow,network)expected— expected outputowner_pass— primary compiler pass whose correctness this test defendsneeds_clang— whether clang is neededmulti_module— whether the test usesmod X;file imports
Dependency Map
test_dep_map.toml maps compiler source files to affected test sections and categories. It is parsed by run_tests.sh --affected to select which test sections to run based on changed files.
Determinism and Flakiness Policy
Rules
Fixed seeds: all randomized tests use fixed seeds unless deliberately exploring.
test_parser_fuzz.shuses$RANDOMseeded from iteration count, not wall-clock time.No wall-clock dependence: no test depends on absolute time, execution speed, or timing.
std.timetests check only that monotonic clock moves forward, not that specific durations elapse.Timeout classes: tests have three implicit timeout tiers:
- Fast tests: 10s (most E2E tests)
- Slow tests: 30s (compilation + complex runtime)
- Network tests: 60s (TCP round-trip with fork/accept)
Network isolation by default:
--fastmode (the default) skips all network tests. Network tests run only under--full. TheSKIP_FLAKY_TCP_TEST=1environment variable provides an additional escape hatch.Stable temp directory handling: test artifacts use
$TMPDIRor/tmp, never the working directory. Compiler output cache uses a per-run temp directory cleaned up on exit.Parallel safety: all tests are independent. No test reads another test’s output or depends on execution order. The test runner uses job-based parallelism (
-j N) without shared mutable state between test processes.Quarantine/repair expectations: if a test becomes flaky:
- Identify the non-determinism source (timing, file system, network, uninitialized memory)
- Fix the root cause or move the test to
--fullonly - Do not delete or skip flaky tests without a tracking comment
- The
SKIP_FLAKY_TCP_TESTpattern is the model for temporary quarantine
Known flakiness risks
- TCP round-trip test: depends on
fork()+ local socket bind. Can fail under port contention or slow CI. Quarantined behind--fullandSKIP_FLAKY_TCP_TEST. - Parser fuzz: timeout-based crash detection could theoretically race on very slow machines. The 5s timeout is generous for the parser’s workload.
Compile-Time and Suite-Time Baselines
Current baselines (as of Phase D hardening)
| Metric | Value |
|---|---|
| Pass-level tests | <1s (32 tests, no I/O) |
Fast suite (--fast) | ~25-35s (~1427 tests, parallel) |
Full suite (--full) | ~40-50s (~1450 tests + 468 consistency checks, 1 skipped, includes network, cross-target, perf, consistency) |
| Cache hit rate | 26/57 compilations saved per fast run |
| Compiler build | ~30-45s (lake build) |
| lli-accelerated suite | ~12s (when LLI_PATH is set) |
Tracking
Suite time is not yet automatically tracked between runs. The baselines above are manual snapshots. Future work: record timing per section in the summary output and warn on regressions beyond a threshold.
Failure Isolation
Artifact preservation
Failed tests automatically save artifacts to .test-failures/:
- Timestamped output (stdout + stderr)
- Exact rerun command
- Compiler flags used
Example:
$ cat .test-failures/lean_tests_struct_basic_con
# Failure: lean_tests/struct_basic.con
# Time: 2026-03-13 14:22:01
# Rerun: .lake/build/bin/concrete lean_tests/struct_basic.con -o /tmp/test_rerun && /tmp/test_rerun
<compiler output>
Dependency gates
Report test groups use compile_gate() to skip downstream assertions when compilation fails. This prevents cascading false failures and makes the first error obvious.
Execution Modes
| Mode | Sections | Use case |
|---|---|---|
--fast (default) | all except network | Daily driver |
--full | everything | Pre-merge |
--affected | auto-detected | After specific compiler changes |
--affected FILE | mapped sections | Targeted rerun |
--filter PAT | all, filtered by path | Iterate on one area |
--stdlib | stdlib + collection | After stdlib changes |
--stdlib-module M | one stdlib module | Iterate on one module |
--O2 | O2 regressions | After lowering changes |
--codegen | codegen + O2 | After backend changes |
--report | report | After report changes |
--manifest | none (list only) | List all tests with categories |
Recommended Verification Flow
Daily driver (default)
lake build && ./run_tests.sh
Runs --fast mode: parallel on all cores, network tests skipped.
After changing a specific compiler pass
lake build && ./run_tests.sh --affected
Auto-detects changed files and runs only affected test sections.
Pre-merge (full coverage)
lake build && ./run_tests.sh --full
Targeted workflows
./run_tests.sh --filter struct_loop # iterate on one area
./run_tests.sh --stdlib # after touching std/src/
./run_tests.sh --stdlib-module map # iterate on one stdlib module
./run_tests.sh --O2 # after lowering changes
./run_tests.sh --codegen # after backend changes
./run_tests.sh --report # after report changes
./run_tests.sh -j 1 # debug ordering issues
Other suites
bash test_ssa.sh # SSA-specific backend coverage
bash test_parser_fuzz.sh # parser crash/hang fuzzing
bash test_fuzz.sh # structured fuzz: parser + typecheck + valid (1500+ programs)
bash test_fuzz.sh --valid 1000 # valid program generation only, custom iteration count
bash test_perf.sh --save # save performance baseline
bash test_perf.sh --compare # compare against baseline (requires prior --save)
bash test_mutation.sh --list # list 18 mutations without running
bash test_mutation.sh # run all mutations (~15 min, rebuilds compiler per mutation)
bash test_mutation.sh --mutation 5 # run a single mutation
Testing Phases
The test suite has evolved in three phases with distinct goals:
Phase 1: Feature and Bug Regression Tests (complete)
Goal: one test per feature, one test per bug. Catch regressions in individual language features and compiler passes.
Covers: type system, codegen, capabilities, trusted boundaries, linearity, modules, stdlib, golden tests, pass-level Lean tests. Phase-end snapshot: ~686 tests.
Phase 2: Edge Cases and Contract-Boundary Tests (complete)
Goal: push individual features toward their edges and test cross-boundary contracts.
Covers: ABI/FFI runtime tests (repr(C), fn pointers, sizeof, arrays), proof boundary assertions (--report proof with exact eligibility marking and exclusion reasons), optimization-sensitive codegen tests with O2 variants, cross-module resolution edge cases, parser/type-system edge cases. Phase-end snapshot: ~766 tests.
Phase 3: System-Level Validation and Regression Discipline (complete)
Goal: prove the compiler holds up under composed, realistic pressure — not just isolated feature tests.
3.1 Large mixed-feature programs (done)
6 programs in the 200-340 line range, each exercising 4+ features:
phase3_expression_evaluator.con— enums (expression tree), match, modules, traits, function pointersphase3_task_scheduler.con— enums (task state), structs, Vec, while loops, match, capabilities, deferphase3_data_pipeline.con— modules, generics with trait bounds, function pointers, Vec, linearityphase3_type_checker.con— enums (types/expressions), match, structs, Vec, modulesphase3_state_machine.con— enums (5 states, 5 events), structs, nested match, modules, trustedphase3_report_consistency.con— pure functions, capabilities, trusted, repr(C), generics, allocations
All return 42 and are registered as run_ok + O2 differential tests.
3.2 Differential testing (done)
~75 O2 differential tests covering all major program categories: computation, struct/enum, linearity, borrows, generics, traits, result/option, string, break/defer, heap, vec, complex programs, integration programs, bug regressions, hardening, and phase3 programs. Each test compiles with -O2 and verifies identical exit code to -O0.
3.3 Property / fuzz testing (done)
test_fuzz.sh — three modes, 500 iterations each (1500 total):
- Parser fuzz: random tokens, structured programs, deep nesting, many params, long expressions — must not crash
- Typechecker stress: type mismatches, undefined variables, missing returns — must not crash
- Valid program generation: arithmetic, conditional, loop, struct programs with computed expected values — must compile and produce correct output
3.4 Artifact / report consistency (done)
20 cross-check assertions in run_tests.sh verifying consistency across report modes:
- Proof-eligible functions have no capabilities in caps report
- Trusted functions appear in unsafe report AND excluded from proof
- Functions with capabilities appear in authority report, confirmed by caps report
- Generic functions appear in mono report with specializations
- Allocating functions appear in alloc report with correct patterns
- repr(C) structs appear in layout report with correct sizes
- Proof eligible count matches number of pure functions
3.5 Cross-target / ABI validation (done)
phase3_abi_interop.con + phase3_abi_interop.c — verifies sizeof/offsetof agreement and by-value struct passing between Concrete and C for repr(C) structs (Point, Rect, Aligned64). Linked and run as a custom test in run_tests.sh.
Cross-target IR verification: 25 representative programs compiled to LLVM IR, then verified to compile for x86_64 via clang -S --target=x86_64-unknown-linux-gnu. Runs in --full mode.
By-value struct FFI: small repr(C) structs (≤ 16 bytes) are flattened to integer registers per the ARM64 C ABI. Point (8 bytes) → one i64, Rect (16 bytes) → two i64s.
3.6 Performance regression gates (done)
test_perf.sh — measures compile time, runtime (avg of 3), IR line count, and binary size for 7 representative programs. Supports --save (save baseline to .perf-baseline) and --compare (compare against baseline, warn on >20% regression). Requires a .perf-baseline file created by --save; --compare without a baseline is a no-op. Integrated into run_tests.sh --full as an informational section (warnings printed but do not fail the suite).
test_mutation.sh — mutation testing for the compiler. Applies targeted source mutations one at a time, rebuilds the compiler (lake build), and runs the test suite (run_tests.sh --fast). If all tests pass after a mutation, the mutation survived — indicating a test gap. If any test fails, the mutation was killed — tests caught the change. Use --list to see all mutations without running, --mutation N to run one.
18 mutations across 7 compiler files:
- Layout.lean (6): tySize i32 4→8, tyAlign i32 4→1, unit size 0→4, string size 24→16, isPassByPtr string→false, isFFISafe rejects integers
- Shared.lean (2): isNumeric rejects floats, isInteger excludes i32
- Check.lean (3): disable use-after-move, disable loop-depth linearity, disable scope-exit linearity
- CoreCheck.lean (3): disable match exhaustiveness, disable capability check, allow break outside loop
- Lower.lean (1): arrayIndex GEP uses wrong type
- EmitSSA.lean (1): isReprCStruct always false
- SSAVerify.lean (2): disable aggregate phi check, disable phi predecessor check
Each mutation takes 30-60s (rebuild + test). Full run: ~15 minutes. Not part of run_tests.sh — run separately as a CI-occasional job.
3.7 Failure-quality testing (done)
5 diagnostic quality tests:
phase3_diag_multi_error.con— 3 independent type errors, all reported (bounded error recovery)phase3_diag_specific_location.con— error in deeply nested code, correct line reportedphase3_diag_no_cascade.con— single undeclared variable produces <5 errorsphase3_diag_hint_quality.con— capability error includes “hint:” textphase3_diag_type_mismatch.con— error includes “expected” and “got”
Phase 4: Memory/Ownership Pressure Tests (complete)
Goal: exercise the checker against the hardest memory/ownership cases — not just isolated error tests but composed patterns that force the checker, docs, and proof boundaries to agree.
4.1 Memory-model pressure (5 tests)
pressure_borrow_in_loop.con— fresh borrow block per loop iterationpressure_interleaved_linear.con— two linear vars with interleaved borrow/consumptionpressure_nested_linear_struct.con— nested linear struct consumed wholepressure_branch_create_consume.con— linear vars created/consumed within if/else branchespressure_match_linear_arms.con— all match arms consuming same pre-existing linear var
4.2 Borrow/aliasing pressure (3 tests)
pressure_sequential_mut_ref.con— sequential deref write, deref read, deref write, then call on borrow-block&mut Tpressure_param_ref_multiuse.con— parameter&mut Tref passed through multi-function chainpressure_borrow_then_consume.con— borrow linear struct for inspection, unfreeze, then consume
4.3 Cleanup/leak-boundary pressure (11 tests: 6 positive + 5 negative)
Positive:
pressure_defer_nested.con— two defers in LIFO orderpressure_defer_in_loop.con— defer in function called from looppressure_defer_with_borrow.con— defer combined with borrow blockpressure_destroy_wrapper.con— Destroy trait satisfies linearitypressure_linear_helper_consume.con— multi-hop linear consumption chainpressure_heap_defer_free.con— heap alloc with deferred free (arrow access while reserved)
Negative (error cases):
pressure_err_defer_then_move.con— move reserved-by-defer variablepressure_err_heap_leak.con— heap pointer never freedpressure_err_linear_no_destroy.con— linear struct goes out of scopepressure_err_destroy_then_use.con— use after destroy consumptionpressure_err_branch_leak.con— consumed in one branch, leaked in other
Phase 5: Self-Consistency Checks (complete)
Goal: verify that the compiler’s internal data structures agree with each other — proof obligations match re-derivation, diagnostics agree with obligation status, extraction results are consistent with eligibility, and fingerprints are coherent across entries and obligations.
--report consistency runs ProofCore.selfCheck with 15 cross-family invariants:
- OBL-KNOWN: every obligation references a known function (entry or excluded)
- OBL-STATUS: obligation status agrees with re-derivation via
deriveObligationStatus - PROVED-EXTRACTED: proved status requires extraction=Some
- PROVED-FP: proved status requires matching fingerprint
- PROVED-SPEC: proved status requires spec attachment
- STALE-FP: stale status requires mismatched fingerprint
- STALE-SPEC: stale status requires spec attachment
- ENTRY-FP: entry fingerprint matches obligation fingerprint
- EXTRACT-UNSUP: extracted=Some implies empty unsupported list
- BLOCKED-UNSUP: eligible + extracted=None implies non-empty unsupported list
- DEP-PROVED: dependencies only reference proved obligations
- DUP-NAME: no duplicate function names across entries and excluded
- DIAG-STATUS: diagnostic kinds agree with obligation status
- ENTRY-OBL: every entry has a corresponding obligation (no dropped obligations)
- EXCL-OBL: every excluded function has a corresponding obligation
All 468 compilable test programs pass with zero violations. Integrated into --full mode.
Phase 6: Verifier Passes (complete)
Goal: catch internal compiler invariant violations before bad state leaks downstream. Three verifier passes implemented in Concrete/Verify.lean:
- Post-Elab verifier (
verifyNoPlaceholders): detectsTy.placeholdersurviving elaboration. Available only via--report verify(not wired into pipeline). 14 programs with try/defer expressions retain placeholder types — documented intentional exception, resolved during lowering. - Post-Mono verifier (
verifyPostMono): hard gate — detectsTy.typeVarsurviving monomorphization. Blocks compilation. Skips generic definitions (only checks monomorphized copies). Wired intoPipeline.monomorphize. - LLVM IR validation (
validateLLVMIR): runsllvm-ason emitted.llfiles before clang. Gracefully skips if llvm-as not on PATH. Wired into all four compilation paths.
--report verify runs both post-Elab and post-Mono verifiers and reports results. All 385 non-error test programs pass with zero verifier errors. Integrated into --full mode.
Phase 7: Debug Bundle (complete)
Goal: capture everything needed to reproduce a compilation failure in a stable directory layout.
concrete debug-bundle <file.con> [-o dir] runs the full pipeline, accumulating artifacts at each stage. On failure or success, it writes a bundle containing: manifest.json (compiler version, source path, failure stage, artifact flags), source/ (original source files), diagnostics.txt, core.txt (Core IR), ssa.txt (SSA IR), llvm.ll (LLVM IR), consistency.txt (ProofCore self-check), and verify.txt (post-Elab verifier results). Each artifact is only present if the pipeline reached the stage that produces it.
The capture pipeline tracks 9 stages: parse, resolve, check, elaborate, coreCheck, mono, lower, emit, complete.
Phase 8: CI Trust Gate (complete)
Goal: run the four correctness contracts automatically in CI so regressions in determinism, self-consistency, terminology, and verifier invariants are caught before merge.
./scripts/tests/run_tests.sh --trust-gate runs only these sections:
- Determinism —
test_determinism.sh --quickacross 18 report modes, 6 query kinds, 3 IR emit modes, and snapshot comparison - Self-consistency —
--report consistencyon all 468 compilable programs (15 ProofCore invariants) - Terminology —
test_terminology_gate.shenforcing canonical proof/obligation status terms - Verifier passes —
--report verifyon all 385 non-error programs
The trust-gate CI job runs in parallel with the main test suite and SSA tests. Also available locally as make test-trust-gate.
Phase 9: Testcase Reducer (complete)
Goal: automatically shrink failing programs to minimal reproduction cases.
concrete reduce <file.con> --predicate <pred> [-o output] [--verbose] applies syntax-aware shrinking passes in a fixpoint loop: remove top-level items → remove statements → remove match arms → remove else branches. Repeats until no pass makes progress.
10 predicates covering all pipeline stages: parse-error, resolve-error, check-error, elab-error, core-check-error, mono-error, lower-error, consistency-violation, verify-warning, crash. Substring matching via colon syntax (e.g., check-error:expected Int). For un-parseable programs, falls back to line-based reduction.
Phase 10: Uniform Diagnostic Engine (complete)
Goal: every compiler phase emits the same structured Diagnostic shape.
All pipeline phases now use ExceptT Diagnostics instead of ExceptT String: Parser (throwParse), Mono (MonoM), Lower (throwLower), alongside Resolve, Check, Elab, CoreCheck, and SSAVerify which already used structured diagnostics. liftStringError eliminated entirely from the codebase. All build warnings fixed (zero-warning build enforced).
Phase 11: CI/CD Evidence Gates (complete)
Goal: verify proof, predictable, and report correctness in CI.
10 evidence gates added to the trust-gate CI section:
- Predictable check:
crypto_verifymust pass,thesis_demomust fail (has I/O) - Stale-proof check: no proof-bearing example (
crypto_verify,elf_header) has stale proofs - Proof-obligation status:
thesis_demohas at least one proved obligation - Report artifact generation: all 18
--reportmodes produce non-empty output - Trust-drift check: consistency and fingerprints pass on all proof-bearing examples
Trust-gate now covers 5 contract sections (determinism, consistency, terminology, verify, evidence) with 952 total checks.
Phase 12: Error Context Chains (complete)
Goal: compiler diagnostics accumulate human-readable context as errors propagate up.
Diagnostic now carries a context : List String field. Helpers: Diagnostic.addContext, Diagnostics.addContext, withContext (monadic), Except.addContext (pure). Rendering appends = while ... lines. Annotated at: Check (per-module, per-function), Elab (per-module, per-function), Lower (per-function).