Skip to content

Conversation

@twaugh
Copy link
Owner

@twaugh twaugh commented Nov 23, 2025

Page-level chunks (PAGE) were stored with block_id="PAGE" in ChromaDB metadata, causing all pages to map to the same short ID ("1") in LLM prompts. This meant the LLM couldn't distinguish between different page-level chunks when making integration decisions.

Changes:

  • Store full hybrid ID (e.g., "Andrew McNamara::PAGE") instead of just "PAGE" in metadata block_id field
  • Bump INDEX_SCHEMA_VERSION to 5 to trigger automatic reindex
  • Each page-level chunk now gets unique short ID (1, 2, 3, ...)

Impact:

  • Before: for all pages (collision)
  • After: , , (unique)
  • LLM can now reference specific page-level chunks correctly

The schema version bump ensures automatic index rebuild on next logsqueak extract/search command with no user intervention needed.

Assisted-by: Claude Code

Page-level chunks (__PAGE__) were stored with block_id="__PAGE__"
in ChromaDB metadata, causing all pages to map to the same short ID
("1") in LLM prompts. This meant the LLM couldn't distinguish between
different page-level chunks when making integration decisions.

Changes:
- Store full hybrid ID (e.g., "Andrew McNamara::__PAGE__") instead
  of just "__PAGE__" in metadata block_id field
- Bump INDEX_SCHEMA_VERSION to 5 to trigger automatic reindex
- Each page-level chunk now gets unique short ID (1, 2, 3, ...)

Impact:
- Before: <block id="1"> for all pages (collision)
- After: <block id="1">, <block id="2">, <block id="3"> (unique)
- LLM can now reference specific page-level chunks correctly

The schema version bump ensures automatic index rebuild on next
logsqueak extract/search command with no user intervention needed.

Assisted-by: Claude Code
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.61%. Comparing base (3681a2d) to head (f9765df).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #38   +/-   ##
=======================================
  Coverage   84.61%   84.61%           
=======================================
  Files          48       48           
  Lines        5088     5088           
=======================================
  Hits         4305     4305           
  Misses        783      783           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@twaugh twaugh merged commit d9cf8cc into main Nov 23, 2025
1 check passed
@twaugh twaugh deleted the fix/page-level-chunk-id-collision branch November 23, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants