Fix for page level chunk integration, and reduce integration prompt redundancy #40

twaugh · 2025-11-24T11:07:27Z

No description provided.

Page-level chunks (__PAGE__) are synthetic entries in the RAG index used for semantic search by page name/title/frontmatter. They don't correspond to real blocks in the file structure, so they can't be used as integration targets for actions like add_under or replace. When the LLM selects a __PAGE__ chunk as a target, it means "this knowledge belongs on this page" without specifying a particular block. The correct interpretation is add_section (new top-level section). Changes: - Detect targets ending with "::__PAGE__" after LLM ID translation - Normalize action to "add_section" regardless of LLM suggestion - Clear target_block_id (add_section has no specific target) - Add debug logging for normalization events This eliminates the "Target block not found: page::__PAGE__" errors and enables LLM to suggest integration into pages that have no blocks yet (only page-level metadata exists in RAG index). Impact: - Before: Integration fails with "target block not found" error - After: Page-level chunks correctly interpreted as add_section - Enables: Adding knowledge to empty but relevant pages Tests: - Added test_plan_integration_for_block_normalizes_page_level_chunks - Added test_plan_integration_for_block_preserves_regular_block_targets - All existing llm_wrappers tests pass Assisted-by: Claude Code

Page-level chunks (__PAGE__) store frontmatter in their context for semantic search quality during embedding. However, when formatting these chunks for LLM prompts, the frontmatter was duplicated: - Once in the <properties> section (parsed from page outline) - Again in the <block> content (from stored RAG chunk context) This wasted ~50-200 tokens per page depending on property count. Solution: Reuse existing _clean_context_for_llm() function from page_indexer.py to strip frontmatter from page-level chunks during prompt formatting. This elegant approach reuses tested code instead of duplicating logic. Changes: - Import _clean_context_for_llm() in llm_helpers.py - Detect page-level chunks (::__PAGE__) in format_chunks_for_llm() - Apply _clean_context_for_llm() to page-level chunks only - Regular blocks unchanged (already cleaned during indexing) - Frontmatter remains in <properties> section for LLM context Impact: - Before: "tags:: foo, bar" appears twice in prompt (properties + block) - After: "tags:: foo, bar" appears once (properties only) - Token savings: ~50-200 per page with properties - No impact on semantic search quality (frontmatter still embedded) Tests: - Added test_llm_helpers.py with 3 integration tests - Tests cover page-level chunk stripping, regular block preservation - All existing llm_wrappers tests pass (no regressions) Assisted-by: Claude Code

codecov-commenter · 2025-11-24T11:14:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.67%. Comparing base (f2f7048) to head (da0af47).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #40      +/-   ##
==========================================
+ Coverage   84.63%   84.67%   +0.04%     
==========================================
  Files          48       48              
  Lines        5128     5136       +8     
==========================================
+ Hits         4340     4349       +9     
+ Misses        788      787       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

twaugh added 2 commits November 24, 2025 10:56

twaugh merged commit f9def7f into main Nov 24, 2025
1 check passed

twaugh deleted the fix/page-level-chunk-integration branch November 24, 2025 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for page level chunk integration, and reduce integration prompt redundancy #40

Fix for page level chunk integration, and reduce integration prompt redundancy #40

Uh oh!

twaugh commented Nov 24, 2025

Uh oh!

codecov-commenter commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix for page level chunk integration, and reduce integration prompt redundancy #40

Fix for page level chunk integration, and reduce integration prompt redundancy #40

Uh oh!

Conversation

twaugh commented Nov 24, 2025

Uh oh!

codecov-commenter commented Nov 24, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants