Skip to content

Conversation

@twaugh
Copy link
Owner

@twaugh twaugh commented Sep 9, 2025

No description provided.

@codecov
Copy link

codecov bot commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 86.66835% with 527 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.78%. Comparing base (f03e11e) to head (03e2b59).
⚠️ Report is 85 commits behind head on master.

Files with missing lines Patch % Lines
src/patch_scanner.c 85.68% 149 Missing ⚠️
src/scanner_debug.c 69.34% 99 Missing ⚠️
src/grep.c 81.87% 89 Missing ⚠️
src/patchfilter.c 74.07% 56 Missing ⚠️
src/util.c 65.62% 44 Missing ⚠️
tests/scanner/test_basic.c 96.74% 37 Missing ⚠️
src/ls.c 93.81% 17 Missing ⚠️
tests/scanner/test_accumulated_headers.c 83.65% 17 Missing ⚠️
src/patch_common.c 89.13% 15 Missing ⚠️
src/filter.c 0.00% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   83.50%   85.78%   +2.28%     
==========================================
  Files           5       15      +10     
  Lines        4159     8112    +3953     
  Branches      985     1633     +648     
==========================================
+ Hits         3473     6959    +3486     
- Misses        686     1153     +467     
Flag Coverage Δ
unittests 85.78% <86.66%> (+2.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@twaugh twaugh force-pushed the new-parser branch 7 times, most recently from c0914b2 to c12fb43 Compare September 19, 2025 13:42
@twaugh twaugh force-pushed the new-parser branch 5 times, most recently from a164cb8 to 0795a09 Compare September 23, 2025 14:43
@twaugh twaugh force-pushed the new-parser branch 5 times, most recently from 47a6cb6 to a8ba0c3 Compare October 17, 2025 12:07
twaugh added 19 commits October 17, 2025 16:10
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Replace the redundant content/raw_line fields in patch_hunk_line with a
single 'line' field containing the full original line including prefix.

This eliminates confusion between content (without prefix) and raw_line
(with prefix) while providing consumers flexibility to extract content
as needed using (line + 1, length - 1).

Benefits:
- Single source of truth for line content
- Simpler API with fewer fields
- Exact preservation of original formatting
- More efficient memory usage

Updated scanner implementation, scanner_debug tool, and scanner tests
to use the new API. Fixed missing header dependencies in scanner test
Makefile to ensure proper rebuilds when API changes.

Also fix scanner test conditional in Makefile.am - move tests/scanner/run-test
into the USE_SCANNER_PATCHFILTER conditional block where it belongs.
The scanner test was incorrectly included unconditionally, causing
'make distcheck' failures when scanner-patchfilter is disabled.

Assisted-by: Cursor
Context diff format uses 'start,end' notation where the second number
is the ending line number, not a count. For example, '*** 6,9 ****'
means lines 6 through 9 (count = 4), not line 6 with count 9.

The scanner was incorrectly treating the second number as a count,
leading to wrong orig_count and new_count values. This caused issues
in tools that rely on accurate line counts.

Fix:
- Parse second number as end line number
- Calculate count as: end_line - start_line + 1
- Update comments to clarify the format

Test fix:
- Update test_context_diff_hunk_separator_handling to expect correct
  count (4) instead of buggy count (9) for '*** 6,9 ****'
- Remove outdated comment acknowledging the bug

This fixes context diff processing in grepdiff and other scanner-based
tools that depend on accurate hunk line counts.

Assisted-by: Cursor
Fix race condition where scanner tests tried to build object files
simultaneously with the main build system, causing intermittent
test failures.

Changes:
- Add scanner test programs to noinst_PROGRAMS in Makefile.am
- Define proper sources and dependencies for each test program
- Remove separate scanner/Makefile to eliminate build conflicts
- Simplify scanner test script to use integrated build
- Clean up distclean and EXTRA_DIST references

This ensures scanner tests are built by the main build system with
proper dependency ordering, eliminating the build system timing
issues that caused test failures after clean builds.

Assisted-by: Cursor
… handling

Add enum patch_line_context and context field to struct patch_hunk_line
to explicitly represent which file version a line belongs to:
- PATCH_CONTEXT_BOTH: Normal lines (applies to both old and new versions)
- PATCH_CONTEXT_OLD: Lines representing the old file state
- PATCH_CONTEXT_NEW: Lines representing the new file state

For context diffs, this eliminates ambiguity about changed lines ('!'):
- Old section lines are emitted with PATCH_CONTEXT_OLD
- New section lines are emitted with PATCH_CONTEXT_NEW
- Each line is emitted exactly once with appropriate context

This simplifies consumer logic by providing explicit context information
instead of requiring manual handling of context diff dual-nature semantics.

Updated scanner_debug utility and documentation to reflect the new API.
Fixed test expectations to match the corrected emission behavior.

Assisted-by: Cursor
Add content and content_length fields to struct patch_hunk_line to provide
format-agnostic access to clean line content without prefixes or spaces.

The new fields complement the existing line/length fields:
- line/length: Full original line including prefix (for debugging/display)
- content/content_length: Clean content without prefix or format-specific spaces

Implementation details:
- Unified diffs: content = line + 1 (skip prefix)
- Context diffs: content = line + 2 (skip prefix + space)
- Memory efficient: content points into existing line buffer
- Handles buffer copying correctly for context diff buffering

This eliminates the need for consumers to handle format-specific quirks
when extracting line content, making the API much cleaner and more
intuitive.

Updated scanner_debug to use the new clean content field for display.

Assisted-by: Cursor
Add comprehensive tests for the context field in patch_hunk_line:

- test_context_field_unified_diff(): Verifies that all line types in
  unified diffs are marked as PATCH_CONTEXT_BOTH (context, removed, added)

- test_context_field_context_diff(): Verifies correct context assignment
  in context diffs:
  * Context/removed/added lines: PATCH_CONTEXT_BOTH
  * Changed lines from old section: PATCH_CONTEXT_OLD
  * Changed lines from new section: PATCH_CONTEXT_NEW

These tests ensure the context field correctly distinguishes between
old and new file versions for context diff changed lines while properly
marking other lines as applying to both versions.

Assisted-by: Cursor
Add comprehensive tests for the clean content field in patch_hunk_line:

- test_content_field_unified_diff(): Verifies clean content extraction
  for unified diffs, ensuring content field excludes prefixes while
  line field preserves the full original line

- test_content_field_context_diff(): Verifies clean content extraction
  for context diffs, ensuring content field excludes both prefix AND
  the format-specific space while handling buffered lines correctly

Key test coverage:
* Raw line vs clean content comparison for all line types
* Unified diff: content = line + 1 (skip prefix only)
* Context diff: content = line + 2 (skip prefix + space)
* Buffered line handling (context diff old section)
* Direct line handling (context diff new section)

These tests ensure the content field provides format-agnostic clean
content extraction, eliminating consumer complexity.

Assisted-by: Cursor
Replace the stub grepdiff implementation with a full scanner-based version
that supports all core grepdiff features:

- Pattern matching with POSIX/PCRE regex support
- Output modes: list filenames, full files, or matching hunks only
- Match filtering: --only-match=rem|add|mod|all
- Numbered line modes: --as-numbered-lines=before|after
- File filtering: include/exclude patterns, strip/add prefixes
- Git diff support with --git-prefixes option
- Context and unified diff format support

Test Results: 9/10 grepdiff tests passing
- grepdiff-original-line-numbers marked as expected failure
  (requires --as-numbered-lines=original-* options not implemented)

This implementation leverages the scanner's clean content API for
robust and format-agnostic diff processing.

Assisted-by: Cursor
Extract duplicated match filter logic into a centralized line_passes_filter()
helper function, eliminating ~80 lines of code duplication across 5 locations.

Fix incorrect interaction between --output-matching and --only-match options:
- --output-matching=file: Now correctly shows entire files that contain matches
- --output-matching=hunk: Now correctly shows entire hunks that contain matches
- --only-match: Now only affects which lines are considered for matching,
  not which lines are output within matching files/hunks

The --only-match option was incorrectly being applied as an output filter,
causing incomplete hunks/files to be displayed. Match filters should only
determine whether a file/hunk "has matches", not filter individual lines
during output.

All grepdiff tests continue to pass with the corrected behavior.

Assisted-by: Cursor
Extract common functionality from grepdiff and lsdiff into a new patch_common
module to eliminate code duplication and provide a foundation for future tools.

New shared infrastructure:
- patch_common.h/c: Centralized module with 15+ shared global variables
- Extended function API with callback support for tool-specific filtering
- Unified option parsing infrastructure ready for expansion

Code reduction achieved:
- ~70 lines eliminated from duplicated code across grep.c and ls.c
- 15+ global variables (show_line_numbers, verbose, git_prefix_mode, etc.)
- Core functions (should_display_file, display_filename) now shared
- File counter and line offset tracking centralized

Architecture improvements:
- should_display_file_extended(): Callback-based filtering for tool-specific logic
- display_filename_extended(): Optional status display support for lsdiff
- Backward compatibility maintained - existing simple functions still work
- Clean separation between shared and tool-specific functionality

Benefits:
- Future tools (interdiff, combinediff) can leverage shared infrastructure
- Consistent behavior across all tools (option handling, pattern matching)
- Bug fixes in shared code automatically benefit all tools
- Reduced maintenance burden and improved code organization

All 205 tests continue to pass. Both grepdiff and lsdiff functionality
preserved with no behavioral changes.

Assisted-by: Cursor
- Add comprehensive artifact collection for all test logs and test-arena contents
- Replace truncated failure output with structured, focused summaries
- Add automatic verbose re-run of failed tests for immediate diagnostics
- Upload test artifacts with 30-day retention for detailed investigation
- Create structured failure reports with build configuration context
- Enhance both regular test and distcheck failure handling
- Eliminate the 20-file limit that was causing important details to be lost

This addresses the issue where CI test failures were providing truncated
output that made debugging difficult. Now failures provide both immediate
focused summaries in CI logs and complete diagnostic information via
downloadable artifacts.
…r-based implementation

Add support for --as-numbered-lines=original-before and --as-numbered-lines=original-after
options in the scanner-based grepdiff implementation. These options display line numbers
from the original diff headers rather than adjusted line numbers.

Features implemented:
- original-before: Shows original line numbers from the "before" side of diff headers
- original-after: Shows original line numbers from the "after" side of diff headers
- Proper header filtering for both unified and context diff formats
- Support for both hunk and file output modes

The implementation uses hunk->orig_offset and hunk->new_offset from diff headers
(e.g., line 8 and 12 from "@@ -8 +12 @@") instead of calculated line numbers.

Tests: Removes tests/grepdiff-original-line-numbers/run-test from XFAIL_TESTS
as the functionality is now fully implemented. All 205 tests pass.
This change eliminates code duplication by centralizing common option
parsing logic in patch_common.c, making both tools use the shared
infrastructure that was previously unused.

Changes:
- Enhanced patch_common.c to support -I/-X options (include/exclude from file)
- Refactored ls.c to use init_common_options(), parse_common_option(),
  cleanup_common_options(), add_common_long_options(), and get_common_short_options()
- Refactored grep.c to use the same common option parsing functions
- Removed ~70 lines of duplicate option parsing code between the two files
- Added proper constants for option array sizes (MAX_COMMON_OPTIONS,
  MAX_TOOL_OPTIONS, MAX_TOTAL_OPTIONS) with runtime safety checks
- Fixed unused variable warnings

Benefits:
- Common options are now handled consistently in one place
- Easier to maintain and extend common functionality
- Reduced code duplication and improved organization
- Runtime protection against accidentally exceeding option array limits
- All existing functionality preserved and tested

Assisted-by: Cursor
- Fix event type descriptions to match actual scanner behavior
- Correct context diff changed line behavior: different content, not same content
- Add concrete example patch content directly in documentation
- Update all examples to use consistent --verbose flag format
- Include both unified and context diff examples with actual scanner_debug output
- Fix line numbers, event counts, and output format to match real binary behavior

The documentation now accurately reflects the actual scanner_debug utility
behavior, making it reliable for debugging and learning purposes.

Assisted-by: Cursor
- Update copyright year from 2024 to 2025 in all new scanner-related files:
  - src/ls.c, src/patch_scanner.c, src/patch_scanner.h
  - src/scanner_debug.c, tests/scanner/test_basic.c

- Remove tests/scanner/README.md which incorrectly claimed most scanner
  functionality was "TODO" when it's actually fully implemented with
  comprehensive test coverage including:
  - Complete unified/context diff parsing
  - Full Git extended header support
  - Binary patch handling and error conditions
  - Line number tracking and edge case handling

The scanner implementation is mature and well-tested, making the outdated
documentation misleading for developers.

Assisted-by: Cursor
@twaugh twaugh merged commit 7b7dcca into master Oct 17, 2025
8 checks passed
@twaugh twaugh deleted the new-parser branch October 17, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants