WIP: proposal for self-learning from translation edits #430

ChrisPriebe · 2025-12-05T20:11:33Z

No description provided.

- Add root CLAUDE.md with reference to AGENTS.md - Add AGENTS.md with comprehensive project guidelines including: - Build/test/lint commands - Architecture overview and key files - VS Code extension patterns - Coding standards and common mistakes to avoid - Review standards from PR history - Add subdirectory CLAUDE.md files for major directories: - src/providers/ - Provider patterns - src/projectManager/ - Migrations and sync - src/tsServer/ - Language server - src/utils/ - Utility modules - webviews/codex-webviews/ - React UI patterns - Add Claude skills in .claude/skills/: - add-ui-component - Adding UI elements - improve-llm-copilot - LLM/AI modifications - add-vscode-command - Command registration - Remove CLAUDE.md and AGENTS.md from .gitignore

- Add Self-Learning Protocol section to CLAUDE.md: - On errors/struggles: update relevant CLAUDE.md - On user corrections: fix docs to prevent recurrence - On skill usage: review and update SKILL.md - Update AGENTS.md session-end requirements: - Self-learning triggers (file struggles, lookups, errors) - Guidelines (concise, generic, target specific docs) - Add file organization rule: plans/summaries to /plan/{slug} - Add Post-Session Review section to all skills

Comprehensive analysis of AgentDB's core systems and proposal for adapting them to Codex's Bible translation use case. Key components: - Episode recording for translation attempts - Pattern extraction from user edits - Enhanced example ranking based on historical effectiveness - Prompt enhancement with learned patterns - 6 validation/testing approaches Addresses unique challenges of ultra-low-resource languages where standard embeddings don't work.

…uiRn9H4kzXxjSjof' into claude/codex-self-learning-loop-01SfKRshCdcc4h6RZZYmtkWm

…nalysis - Move self-learning-loop-architecture.md to plan/self-learning-loop-architecture/ - Add detailed AgentDB reuse analysis with code evidence - Compare current token-overlap system with proposed self-learning approach - Identify reusable architecture patterns, code, and dependencies - Analyze RuVector Rust layer (34 crates) - recommend NOT using due to complexity - Provide concrete implementation recommendations with SQL schemas

…improvements Key findings: - Current system uses token overlap (fast, language-agnostic, but no semantics) - Target language can't be embedded (unknown), but SOURCE CAN - Available assets: Strong's numbers, Macula morphology, multiple source versions Proposed improvements (in order of implementation): 1. Source text embeddings (semantic similarity via known languages) 2. Strong's number integration (concept-level matching) 3. Historical effectiveness tracking (learning from user edits) 4. Hybrid ranking formula combining all signals Includes detailed analysis of current implementation, strengths/weaknesses, data flow diagrams, and risk mitigation strategies.

Deep code analysis of ruvector Rust monorepo (34 crates, 125,014 lines) and agentic-flow TypeScript packages to identify reusable components. Key findings: - SONA crate has MicroLoRA for instant per-verse adaptation - ReasoningBank provides K-means++ pattern discovery from trajectories - EWC (Elastic Weight Consolidation) prevents catastrophic forgetting - MMR provides diversity in few-shot example selection - Conformal prediction gives uncertainty quantification Novel features explained with code snippets and direct application to Codex translation memory: - MicroLoRA: rank 1-2 adaptation in <100μs - ReasoningBank: cluster translation attempts to discover patterns - EWC: don't forget Genesis patterns when learning Psalms Implementation recommendations with priority ordering (P0-P3).

Proposes a counting-based system instead of ML: - 3 tables: prediction_outcomes, example_effectiveness, word_mappings - 3 functions: recordOutcome, rankExamples, getLearnedPatterns - No neural networks, embeddings, or WASM dependencies - ~3 hours implementation vs weeks for complex approach Key insight: For Bible translation with consistent domain and human-in-the-loop review, simple counting captures learning better than neural approaches.

Complete design for self-learning translation assistance: 1. state-of-the-art-architecture.md - 6-component system: Semantic Index, Episode Memory, Pattern Engine, Adaptive Ranker, Prompt Composer, Feedback Loop - Source-anchored embeddings (Strong's + multilingual) - EWC consolidation on ranking weights - K-means++ clustering for pattern discovery - MMR for example diversity - 4-phase implementation roadmap 2. phase1-implementation-spec.md - Detailed 10-day implementation plan - Exact file locations and code changes - Database schema (SQLite) - Episode recording, effectiveness tracking, pattern extraction - Integration points with existing codebase 3. HANDOFF-SUMMARY.md - Complete project context and goals - All approaches considered with rationale - Links to all documentation - Success metrics and next steps Recommended approach: Hybrid design taking best ideas from RuVector/AgentDB but adapted for Bible translation domain and TypeScript implementation.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

claude added 9 commits December 5, 2025 18:28

Merge remote-tracking branch 'origin/claude/create-claude-md-016GEzQM…

573d789

…uiRn9H4kzXxjSjof' into claude/codex-self-learning-loop-01SfKRshCdcc4h6RZZYmtkWm

Copilot AI review requested due to automatic review settings December 5, 2025 20:11

Copilot AI reviewed Dec 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: proposal for self-learning from translation edits #430

WIP: proposal for self-learning from translation edits #430

ChrisPriebe commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP: proposal for self-learning from translation edits #430

Are you sure you want to change the base?

WIP: proposal for self-learning from translation edits #430

Conversation

ChrisPriebe commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants