Skip to content

Conversation

@anantham
Copy link
Owner

Summary

  • Convert mock-based tests to true integration tests to prevent Goodharting
  • Add full-field assertions for round-trip fidelity testing
  • Add error handling tests for HTTP failures, timeouts, and malformed responses
  • Add resilience tests for incomplete/null data in responses

Changes by Priority

P0 - Critical (tests were testing mocks, not code)

  • export-import.test.ts: Replaced mocked import test with true integration test
  • export-import.test.ts: Added full-field assertions for diffResults round-trip (all fields including hashes, markers, timestamps)

P1 - High (incomplete coverage)

  • export-import.test.ts: Converted diffResults/amendmentLogs export tests to real integration tests
  • registry.test.ts: Added HTTP error response tests (404, 500), timeout test, malformed JSON test

P2 - Medium (edge cases)

  • registry.test.ts: Added incomplete data handling tests (missing fields, null values, empty URLs)
  • comparisonService.test.ts: Added malformed response tests (empty choices, null content, truncated JSON)
  • comparisonService.test.ts: Improved provider config verification (full client config instead of just URL)

Test Metrics

File Before After Delta
export-import.test.ts 11 tests 11 tests Same count, much stronger assertions
registry.test.ts 5 tests 14 tests +9 tests
comparisonService.test.ts 9 tests 17 tests +8 tests

Test plan

  • All 552 tests pass
  • Each test file verified independently
  • No regressions in existing tests

Why this matters

These tests were "Goodharting" - optimizing for passing tests rather than validating actual behavior:

  • Mocking ImportOps.importFullSessionData then checking the mock was called proves nothing about import logic
  • Checking only 3 fields of a 15-field object misses serialization bugs
  • No error handling tests means production failures are discovered by users

🤖 Generated with Claude Code

MOTIVATION:
- Several tests were mocking the very functions they were supposed to test
- Tests only verified partial fields, missing serialization bugs
- Registry tests lacked error handling coverage for real-world failures
- ComparisonService tests didn't cover malformed LLM responses

APPROACH:
- Converted mock-based tests to true integration tests using fake-indexeddb
- Added full-field assertions for round-trip fidelity (all DiffResult fields)
- Added error handling tests (HTTP 404/500, timeouts, malformed JSON)
- Added resilience tests for incomplete/null data in responses

CHANGES:
- tests/current-system/export-import.test.ts:
  - P0: Replaced mocked import test with true integration test
  - P0: Added full-field assertions for diffResults round-trip
  - P1: Converted diffResults export test to integration
  - P1: Converted amendmentLogs tests to integration with round-trip
- tests/integration/registry.test.ts:
  - P1: Added HTTP error response tests (404, 500)
  - P1: Added network timeout test
  - P1: Added malformed JSON response test
  - P2: Added incomplete data handling tests (missing fields, null values)
- tests/services/comparisonService.test.ts:
  - P2: Added malformed response tests (empty choices, null content, etc.)
  - P2: Improved provider config verification (full client config)

IMPACT:
- Tests now catch real serialization bugs, not just "mock was called"
- Coverage for edge cases and error paths
- Higher confidence in export/import fidelity

TESTING:
- All 552 tests pass
- Verified each test file independently before full suite run

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Dec 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
lexicon-forge Ready Ready Preview, Comment Dec 26, 2025 2:11am

@anantham anantham merged commit dad57f9 into main Dec 27, 2025
3 checks passed
@anantham anantham deleted the fix/test-steel-manning branch January 1, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants