New scanning parser #116

twaugh · 2025-09-09T09:51:12Z

No description provided.

codecov · 2025-09-12T12:24:40Z

Codecov Report

❌ Patch coverage is 86.66835% with 527 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.78%. Comparing base (f03e11e) to head (03e2b59).
⚠️ Report is 85 commits behind head on master.

Files with missing lines	Patch %	Lines
src/patch_scanner.c	85.68%	149 Missing ⚠️
src/scanner_debug.c	69.34%	99 Missing ⚠️
src/grep.c	81.87%	89 Missing ⚠️
src/patchfilter.c	74.07%	56 Missing ⚠️
src/util.c	65.62%	44 Missing ⚠️
tests/scanner/test_basic.c	96.74%	37 Missing ⚠️
src/ls.c	93.81%	17 Missing ⚠️
tests/scanner/test_accumulated_headers.c	83.65%	17 Missing ⚠️
src/patch_common.c	89.13%	15 Missing ⚠️
src/filter.c	0.00%	2 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   83.50%   85.78%   +2.28%     
==========================================
  Files           5       15      +10     
  Lines        4159     8112    +3953     
  Branches      985     1633     +648     
==========================================
+ Hits         3473     6959    +3486     
- Misses        686     1153     +467

Flag	Coverage Δ
unittests	`85.78% <86.66%> (+2.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Assisted-by: Cursor

…cating it Assisted-by: Cursor

Assisted-by: Cursor

Assisted-by: Claude Code

Replace the redundant content/raw_line fields in patch_hunk_line with a single 'line' field containing the full original line including prefix. This eliminates confusion between content (without prefix) and raw_line (with prefix) while providing consumers flexibility to extract content as needed using (line + 1, length - 1). Benefits: - Single source of truth for line content - Simpler API with fewer fields - Exact preservation of original formatting - More efficient memory usage Updated scanner implementation, scanner_debug tool, and scanner tests to use the new API. Fixed missing header dependencies in scanner test Makefile to ensure proper rebuilds when API changes. Also fix scanner test conditional in Makefile.am - move tests/scanner/run-test into the USE_SCANNER_PATCHFILTER conditional block where it belongs. The scanner test was incorrectly included unconditionally, causing 'make distcheck' failures when scanner-patchfilter is disabled. Assisted-by: Cursor

Context diff format uses 'start,end' notation where the second number is the ending line number, not a count. For example, '*** 6,9 ****' means lines 6 through 9 (count = 4), not line 6 with count 9. The scanner was incorrectly treating the second number as a count, leading to wrong orig_count and new_count values. This caused issues in tools that rely on accurate line counts. Fix: - Parse second number as end line number - Calculate count as: end_line - start_line + 1 - Update comments to clarify the format Test fix: - Update test_context_diff_hunk_separator_handling to expect correct count (4) instead of buggy count (9) for '*** 6,9 ****' - Remove outdated comment acknowledging the bug This fixes context diff processing in grepdiff and other scanner-based tools that depend on accurate hunk line counts. Assisted-by: Cursor

Fix race condition where scanner tests tried to build object files simultaneously with the main build system, causing intermittent test failures. Changes: - Add scanner test programs to noinst_PROGRAMS in Makefile.am - Define proper sources and dependencies for each test program - Remove separate scanner/Makefile to eliminate build conflicts - Simplify scanner test script to use integrated build - Clean up distclean and EXTRA_DIST references This ensures scanner tests are built by the main build system with proper dependency ordering, eliminating the build system timing issues that caused test failures after clean builds. Assisted-by: Cursor

… handling Add enum patch_line_context and context field to struct patch_hunk_line to explicitly represent which file version a line belongs to: - PATCH_CONTEXT_BOTH: Normal lines (applies to both old and new versions) - PATCH_CONTEXT_OLD: Lines representing the old file state - PATCH_CONTEXT_NEW: Lines representing the new file state For context diffs, this eliminates ambiguity about changed lines ('!'): - Old section lines are emitted with PATCH_CONTEXT_OLD - New section lines are emitted with PATCH_CONTEXT_NEW - Each line is emitted exactly once with appropriate context This simplifies consumer logic by providing explicit context information instead of requiring manual handling of context diff dual-nature semantics. Updated scanner_debug utility and documentation to reflect the new API. Fixed test expectations to match the corrected emission behavior. Assisted-by: Cursor

Add content and content_length fields to struct patch_hunk_line to provide format-agnostic access to clean line content without prefixes or spaces. The new fields complement the existing line/length fields: - line/length: Full original line including prefix (for debugging/display) - content/content_length: Clean content without prefix or format-specific spaces Implementation details: - Unified diffs: content = line + 1 (skip prefix) - Context diffs: content = line + 2 (skip prefix + space) - Memory efficient: content points into existing line buffer - Handles buffer copying correctly for context diff buffering This eliminates the need for consumers to handle format-specific quirks when extracting line content, making the API much cleaner and more intuitive. Updated scanner_debug to use the new clean content field for display. Assisted-by: Cursor

Add comprehensive tests for the context field in patch_hunk_line: - test_context_field_unified_diff(): Verifies that all line types in unified diffs are marked as PATCH_CONTEXT_BOTH (context, removed, added) - test_context_field_context_diff(): Verifies correct context assignment in context diffs: * Context/removed/added lines: PATCH_CONTEXT_BOTH * Changed lines from old section: PATCH_CONTEXT_OLD * Changed lines from new section: PATCH_CONTEXT_NEW These tests ensure the context field correctly distinguishes between old and new file versions for context diff changed lines while properly marking other lines as applying to both versions. Assisted-by: Cursor

Add comprehensive tests for the clean content field in patch_hunk_line: - test_content_field_unified_diff(): Verifies clean content extraction for unified diffs, ensuring content field excludes prefixes while line field preserves the full original line - test_content_field_context_diff(): Verifies clean content extraction for context diffs, ensuring content field excludes both prefix AND the format-specific space while handling buffered lines correctly Key test coverage: * Raw line vs clean content comparison for all line types * Unified diff: content = line + 1 (skip prefix only) * Context diff: content = line + 2 (skip prefix + space) * Buffered line handling (context diff old section) * Direct line handling (context diff new section) These tests ensure the content field provides format-agnostic clean content extraction, eliminating consumer complexity. Assisted-by: Cursor

Replace the stub grepdiff implementation with a full scanner-based version that supports all core grepdiff features: - Pattern matching with POSIX/PCRE regex support - Output modes: list filenames, full files, or matching hunks only - Match filtering: --only-match=rem|add|mod|all - Numbered line modes: --as-numbered-lines=before|after - File filtering: include/exclude patterns, strip/add prefixes - Git diff support with --git-prefixes option - Context and unified diff format support Test Results: 9/10 grepdiff tests passing - grepdiff-original-line-numbers marked as expected failure (requires --as-numbered-lines=original-* options not implemented) This implementation leverages the scanner's clean content API for robust and format-agnostic diff processing. Assisted-by: Cursor

Extract duplicated match filter logic into a centralized line_passes_filter() helper function, eliminating ~80 lines of code duplication across 5 locations. Fix incorrect interaction between --output-matching and --only-match options: - --output-matching=file: Now correctly shows entire files that contain matches - --output-matching=hunk: Now correctly shows entire hunks that contain matches - --only-match: Now only affects which lines are considered for matching, not which lines are output within matching files/hunks The --only-match option was incorrectly being applied as an output filter, causing incomplete hunks/files to be displayed. Match filters should only determine whether a file/hunk "has matches", not filter individual lines during output. All grepdiff tests continue to pass with the corrected behavior. Assisted-by: Cursor

Extract common functionality from grepdiff and lsdiff into a new patch_common module to eliminate code duplication and provide a foundation for future tools. New shared infrastructure: - patch_common.h/c: Centralized module with 15+ shared global variables - Extended function API with callback support for tool-specific filtering - Unified option parsing infrastructure ready for expansion Code reduction achieved: - ~70 lines eliminated from duplicated code across grep.c and ls.c - 15+ global variables (show_line_numbers, verbose, git_prefix_mode, etc.) - Core functions (should_display_file, display_filename) now shared - File counter and line offset tracking centralized Architecture improvements: - should_display_file_extended(): Callback-based filtering for tool-specific logic - display_filename_extended(): Optional status display support for lsdiff - Backward compatibility maintained - existing simple functions still work - Clean separation between shared and tool-specific functionality Benefits: - Future tools (interdiff, combinediff) can leverage shared infrastructure - Consistent behavior across all tools (option handling, pattern matching) - Bug fixes in shared code automatically benefit all tools - Reduced maintenance burden and improved code organization All 205 tests continue to pass. Both grepdiff and lsdiff functionality preserved with no behavioral changes. Assisted-by: Cursor

- Add comprehensive artifact collection for all test logs and test-arena contents - Replace truncated failure output with structured, focused summaries - Add automatic verbose re-run of failed tests for immediate diagnostics - Upload test artifacts with 30-day retention for detailed investigation - Create structured failure reports with build configuration context - Enhance both regular test and distcheck failure handling - Eliminate the 20-file limit that was causing important details to be lost This addresses the issue where CI test failures were providing truncated output that made debugging difficult. Now failures provide both immediate focused summaries in CI logs and complete diagnostic information via downloadable artifacts.

…r-based implementation Add support for --as-numbered-lines=original-before and --as-numbered-lines=original-after options in the scanner-based grepdiff implementation. These options display line numbers from the original diff headers rather than adjusted line numbers. Features implemented: - original-before: Shows original line numbers from the "before" side of diff headers - original-after: Shows original line numbers from the "after" side of diff headers - Proper header filtering for both unified and context diff formats - Support for both hunk and file output modes The implementation uses hunk->orig_offset and hunk->new_offset from diff headers (e.g., line 8 and 12 from "@@ -8 +12 @@") instead of calculated line numbers. Tests: Removes tests/grepdiff-original-line-numbers/run-test from XFAIL_TESTS as the functionality is now fully implemented. All 205 tests pass.

This change eliminates code duplication by centralizing common option parsing logic in patch_common.c, making both tools use the shared infrastructure that was previously unused. Changes: - Enhanced patch_common.c to support -I/-X options (include/exclude from file) - Refactored ls.c to use init_common_options(), parse_common_option(), cleanup_common_options(), add_common_long_options(), and get_common_short_options() - Refactored grep.c to use the same common option parsing functions - Removed ~70 lines of duplicate option parsing code between the two files - Added proper constants for option array sizes (MAX_COMMON_OPTIONS, MAX_TOOL_OPTIONS, MAX_TOTAL_OPTIONS) with runtime safety checks - Fixed unused variable warnings Benefits: - Common options are now handled consistently in one place - Easier to maintain and extend common functionality - Reduced code duplication and improved organization - Runtime protection against accidentally exceeding option array limits - All existing functionality preserved and tested Assisted-by: Cursor

- Fix event type descriptions to match actual scanner behavior - Correct context diff changed line behavior: different content, not same content - Add concrete example patch content directly in documentation - Update all examples to use consistent --verbose flag format - Include both unified and context diff examples with actual scanner_debug output - Fix line numbers, event counts, and output format to match real binary behavior The documentation now accurately reflects the actual scanner_debug utility behavior, making it reliable for debugging and learning purposes. Assisted-by: Cursor

- Update copyright year from 2024 to 2025 in all new scanner-related files: - src/ls.c, src/patch_scanner.c, src/patch_scanner.h - src/scanner_debug.c, tests/scanner/test_basic.c - Remove tests/scanner/README.md which incorrectly claimed most scanner functionality was "TODO" when it's actually fully implemented with comprehensive test coverage including: - Complete unified/context diff parsing - Full Git extended header support - Binary patch handling and error conditions - Line number tracking and edge case handling The scanner implementation is mature and well-tested, making the outdated documentation misleading for developers. Assisted-by: Cursor

twaugh force-pushed the new-parser branch from 423a548 to f82cf0e Compare September 12, 2025 12:21

twaugh force-pushed the new-parser branch 7 times, most recently from c0914b2 to c12fb43 Compare September 19, 2025 13:42

twaugh force-pushed the new-parser branch 5 times, most recently from a164cb8 to 0795a09 Compare September 23, 2025 14:43

twaugh force-pushed the new-parser branch 5 times, most recently from 47a6cb6 to a8ba0c3 Compare October 17, 2025 12:07

twaugh added 11 commits October 17, 2025 16:10

Initial minimal implementation of new scanner

104677d

Assisted-by: Cursor

New scanner: add header parsing

6a77ea1

Assisted-by: Cursor

New scanner: code clean-up

b630c27

Assisted-by: Cursor

New scanner: proper hunk parsing

03ba791

Assisted-by: Cursor

New scanner: some code dedupiclication

7fde4a4

Assisted-by: Cursor

New scanner: additional testing, and integrate into 'make check'

6bab015

Assisted-by: Cursor

New scanner: better recognition of git diff format

4e3bb69

Assisted-by: Cursor

New scanner: context diff support

f5835d5

Assisted-by: Cursor

New scanner: line number tracking fix

b23a9bb

Assisted-by: Cursor

New scanner: fix for git diff without hunks

fb03717

Assisted-by: Cursor

Implement some utility functions for file existence/status

2cfb1ec

Assisted-by: Cursor

twaugh added 19 commits October 17, 2025 16:10

New lsdiff: remove code duplication relating to candidate name gathering

0278e1b

Assisted-by: Cursor

New parser: additional scanner testing for binary format

24ddef2

Assisted-by: Cursor

More lsdiff tests

f5429ee

Assisted-by: Cursor

New parser: fix some memory leaks

07ad102

Assisted-by: Cursor

New lsdiff: fix a memory leak

48a933d

Assisted-by: Cursor

New parser: add scanner-debug tests

8799825

Assisted-by: Cursor

More lsdiff tests

35423f1

Assisted-by: Cursor

New lsdiff: use diff.c git-prefix-stripping function instead of dupli…

53174e9

…cating it Assisted-by: Cursor

New lsdiff: skeleton support for different modes like filterdiff has

5382314

Assisted-by: Cursor

Move some common functions from ls.c to patchfilter.c

a3dabfd

Assisted-by: Claude Code

twaugh force-pushed the new-parser branch from ec03a6b to 8a98272 Compare October 17, 2025 15:11

twaugh added 5 commits October 17, 2025 16:30

twaugh force-pushed the new-parser branch from 8a98272 to 7be3e62 Compare October 17, 2025 15:31

twaugh force-pushed the new-parser branch from a825276 to 03e2b59 Compare October 17, 2025 15:56

twaugh merged commit 7b7dcca into master Oct 17, 2025
8 checks passed

twaugh deleted the new-parser branch October 17, 2025 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New scanning parser #116

New scanning parser #116

Uh oh!

twaugh commented Sep 9, 2025

Uh oh!

codecov bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New scanning parser #116

New scanning parser #116

Uh oh!

Conversation

twaugh commented Sep 9, 2025

Uh oh!

codecov bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Sep 12, 2025 •

edited

Loading