-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Task: Create Test Repository for Evaluating Documentation Suggestions
Goal: Build a realistic test repository with documentation and code that can be used to systematically evaluate JanusDoc's suggestion quality across different scenarios.
Success Criteria:
- Repository contains realistic documentation (API docs, guides, architecture)
- Multiple test PRs covering different change types
- Clear baseline for what suggestions should be made
- Can be used for regression testing as JanusDoc evolves
Repository Contents Needed:
- Documentation Structure (/docs):
- API reference (functions, classes, types)
- Getting started guide
- Architecture overview
- Configuration guide
- At least 10-15 markdown files - Codebase (TypeScript/JavaScript):
- Simple but realistic project (e.g., task manager, blog API, or CLI tool)
- Code that actually matches the documentation initially
- Multiple modules/files that docs reference - Test Scenarios via PRs (at least 6-8):
- Obvious updates needed: Add new API endpoint not documented
- Subtle changes: Rename function parameter that's referenced in docs
- Breaking changes: Change function signature documented in API ref
- New feature: Add feature that needs guide section
- Deprecation: Mark something as deprecated
- No update needed: Internal refactoring that doesn't affect public API
- Configuration change: Add new config option
- Behavior change: Modify how existing feature works - Expected Results Document:
- For each PR, document what suggestions JanusDoc should make
- Include specific doc files and sections that should be flagged
- Note edge cases or tricky scenarios
Deliverables:
- GitHub repository (use janusdoc-evals)
- README explaining the test structure
- Branches and PRs set up for each test scenario
- EXPECTED_RESULTS.md with baseline expectations
Bonus:
- Integration with https://www.evalite.dev/ for automated evaluation
- Evalite test suite to run JanusDoc on all test PRs and score against expected results
- Metrics tracking (precision, recall, false positives)
- Different documentation styles to test (API-first vs guide-first)
Metadata
Metadata
Assignees
Labels
No labels