-
Notifications
You must be signed in to change notification settings - Fork 28
New scanning parser #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
New scanning parser #116
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #116 +/- ##
==========================================
+ Coverage 83.50% 85.78% +2.28%
==========================================
Files 5 15 +10
Lines 4159 8112 +3953
Branches 985 1633 +648
==========================================
+ Hits 3473 6959 +3486
- Misses 686 1153 +467
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c0914b2 to
c12fb43
Compare
a164cb8 to
0795a09
Compare
47a6cb6 to
a8ba0c3
Compare
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Cursor
…cating it Assisted-by: Cursor
Assisted-by: Cursor
Assisted-by: Claude Code
Replace the redundant content/raw_line fields in patch_hunk_line with a single 'line' field containing the full original line including prefix. This eliminates confusion between content (without prefix) and raw_line (with prefix) while providing consumers flexibility to extract content as needed using (line + 1, length - 1). Benefits: - Single source of truth for line content - Simpler API with fewer fields - Exact preservation of original formatting - More efficient memory usage Updated scanner implementation, scanner_debug tool, and scanner tests to use the new API. Fixed missing header dependencies in scanner test Makefile to ensure proper rebuilds when API changes. Also fix scanner test conditional in Makefile.am - move tests/scanner/run-test into the USE_SCANNER_PATCHFILTER conditional block where it belongs. The scanner test was incorrectly included unconditionally, causing 'make distcheck' failures when scanner-patchfilter is disabled. Assisted-by: Cursor
Context diff format uses 'start,end' notation where the second number is the ending line number, not a count. For example, '*** 6,9 ****' means lines 6 through 9 (count = 4), not line 6 with count 9. The scanner was incorrectly treating the second number as a count, leading to wrong orig_count and new_count values. This caused issues in tools that rely on accurate line counts. Fix: - Parse second number as end line number - Calculate count as: end_line - start_line + 1 - Update comments to clarify the format Test fix: - Update test_context_diff_hunk_separator_handling to expect correct count (4) instead of buggy count (9) for '*** 6,9 ****' - Remove outdated comment acknowledging the bug This fixes context diff processing in grepdiff and other scanner-based tools that depend on accurate hunk line counts. Assisted-by: Cursor
Fix race condition where scanner tests tried to build object files simultaneously with the main build system, causing intermittent test failures. Changes: - Add scanner test programs to noinst_PROGRAMS in Makefile.am - Define proper sources and dependencies for each test program - Remove separate scanner/Makefile to eliminate build conflicts - Simplify scanner test script to use integrated build - Clean up distclean and EXTRA_DIST references This ensures scanner tests are built by the main build system with proper dependency ordering, eliminating the build system timing issues that caused test failures after clean builds. Assisted-by: Cursor
… handling
Add enum patch_line_context and context field to struct patch_hunk_line
to explicitly represent which file version a line belongs to:
- PATCH_CONTEXT_BOTH: Normal lines (applies to both old and new versions)
- PATCH_CONTEXT_OLD: Lines representing the old file state
- PATCH_CONTEXT_NEW: Lines representing the new file state
For context diffs, this eliminates ambiguity about changed lines ('!'):
- Old section lines are emitted with PATCH_CONTEXT_OLD
- New section lines are emitted with PATCH_CONTEXT_NEW
- Each line is emitted exactly once with appropriate context
This simplifies consumer logic by providing explicit context information
instead of requiring manual handling of context diff dual-nature semantics.
Updated scanner_debug utility and documentation to reflect the new API.
Fixed test expectations to match the corrected emission behavior.
Assisted-by: Cursor
Add content and content_length fields to struct patch_hunk_line to provide format-agnostic access to clean line content without prefixes or spaces. The new fields complement the existing line/length fields: - line/length: Full original line including prefix (for debugging/display) - content/content_length: Clean content without prefix or format-specific spaces Implementation details: - Unified diffs: content = line + 1 (skip prefix) - Context diffs: content = line + 2 (skip prefix + space) - Memory efficient: content points into existing line buffer - Handles buffer copying correctly for context diff buffering This eliminates the need for consumers to handle format-specific quirks when extracting line content, making the API much cleaner and more intuitive. Updated scanner_debug to use the new clean content field for display. Assisted-by: Cursor
Add comprehensive tests for the context field in patch_hunk_line: - test_context_field_unified_diff(): Verifies that all line types in unified diffs are marked as PATCH_CONTEXT_BOTH (context, removed, added) - test_context_field_context_diff(): Verifies correct context assignment in context diffs: * Context/removed/added lines: PATCH_CONTEXT_BOTH * Changed lines from old section: PATCH_CONTEXT_OLD * Changed lines from new section: PATCH_CONTEXT_NEW These tests ensure the context field correctly distinguishes between old and new file versions for context diff changed lines while properly marking other lines as applying to both versions. Assisted-by: Cursor
Add comprehensive tests for the clean content field in patch_hunk_line: - test_content_field_unified_diff(): Verifies clean content extraction for unified diffs, ensuring content field excludes prefixes while line field preserves the full original line - test_content_field_context_diff(): Verifies clean content extraction for context diffs, ensuring content field excludes both prefix AND the format-specific space while handling buffered lines correctly Key test coverage: * Raw line vs clean content comparison for all line types * Unified diff: content = line + 1 (skip prefix only) * Context diff: content = line + 2 (skip prefix + space) * Buffered line handling (context diff old section) * Direct line handling (context diff new section) These tests ensure the content field provides format-agnostic clean content extraction, eliminating consumer complexity. Assisted-by: Cursor
Replace the stub grepdiff implementation with a full scanner-based version that supports all core grepdiff features: - Pattern matching with POSIX/PCRE regex support - Output modes: list filenames, full files, or matching hunks only - Match filtering: --only-match=rem|add|mod|all - Numbered line modes: --as-numbered-lines=before|after - File filtering: include/exclude patterns, strip/add prefixes - Git diff support with --git-prefixes option - Context and unified diff format support Test Results: 9/10 grepdiff tests passing - grepdiff-original-line-numbers marked as expected failure (requires --as-numbered-lines=original-* options not implemented) This implementation leverages the scanner's clean content API for robust and format-agnostic diff processing. Assisted-by: Cursor
Extract duplicated match filter logic into a centralized line_passes_filter() helper function, eliminating ~80 lines of code duplication across 5 locations. Fix incorrect interaction between --output-matching and --only-match options: - --output-matching=file: Now correctly shows entire files that contain matches - --output-matching=hunk: Now correctly shows entire hunks that contain matches - --only-match: Now only affects which lines are considered for matching, not which lines are output within matching files/hunks The --only-match option was incorrectly being applied as an output filter, causing incomplete hunks/files to be displayed. Match filters should only determine whether a file/hunk "has matches", not filter individual lines during output. All grepdiff tests continue to pass with the corrected behavior. Assisted-by: Cursor
Extract common functionality from grepdiff and lsdiff into a new patch_common module to eliminate code duplication and provide a foundation for future tools. New shared infrastructure: - patch_common.h/c: Centralized module with 15+ shared global variables - Extended function API with callback support for tool-specific filtering - Unified option parsing infrastructure ready for expansion Code reduction achieved: - ~70 lines eliminated from duplicated code across grep.c and ls.c - 15+ global variables (show_line_numbers, verbose, git_prefix_mode, etc.) - Core functions (should_display_file, display_filename) now shared - File counter and line offset tracking centralized Architecture improvements: - should_display_file_extended(): Callback-based filtering for tool-specific logic - display_filename_extended(): Optional status display support for lsdiff - Backward compatibility maintained - existing simple functions still work - Clean separation between shared and tool-specific functionality Benefits: - Future tools (interdiff, combinediff) can leverage shared infrastructure - Consistent behavior across all tools (option handling, pattern matching) - Bug fixes in shared code automatically benefit all tools - Reduced maintenance burden and improved code organization All 205 tests continue to pass. Both grepdiff and lsdiff functionality preserved with no behavioral changes. Assisted-by: Cursor
- Add comprehensive artifact collection for all test logs and test-arena contents - Replace truncated failure output with structured, focused summaries - Add automatic verbose re-run of failed tests for immediate diagnostics - Upload test artifacts with 30-day retention for detailed investigation - Create structured failure reports with build configuration context - Enhance both regular test and distcheck failure handling - Eliminate the 20-file limit that was causing important details to be lost This addresses the issue where CI test failures were providing truncated output that made debugging difficult. Now failures provide both immediate focused summaries in CI logs and complete diagnostic information via downloadable artifacts.
…r-based implementation Add support for --as-numbered-lines=original-before and --as-numbered-lines=original-after options in the scanner-based grepdiff implementation. These options display line numbers from the original diff headers rather than adjusted line numbers. Features implemented: - original-before: Shows original line numbers from the "before" side of diff headers - original-after: Shows original line numbers from the "after" side of diff headers - Proper header filtering for both unified and context diff formats - Support for both hunk and file output modes The implementation uses hunk->orig_offset and hunk->new_offset from diff headers (e.g., line 8 and 12 from "@@ -8 +12 @@") instead of calculated line numbers. Tests: Removes tests/grepdiff-original-line-numbers/run-test from XFAIL_TESTS as the functionality is now fully implemented. All 205 tests pass.
This change eliminates code duplication by centralizing common option parsing logic in patch_common.c, making both tools use the shared infrastructure that was previously unused. Changes: - Enhanced patch_common.c to support -I/-X options (include/exclude from file) - Refactored ls.c to use init_common_options(), parse_common_option(), cleanup_common_options(), add_common_long_options(), and get_common_short_options() - Refactored grep.c to use the same common option parsing functions - Removed ~70 lines of duplicate option parsing code between the two files - Added proper constants for option array sizes (MAX_COMMON_OPTIONS, MAX_TOOL_OPTIONS, MAX_TOTAL_OPTIONS) with runtime safety checks - Fixed unused variable warnings Benefits: - Common options are now handled consistently in one place - Easier to maintain and extend common functionality - Reduced code duplication and improved organization - Runtime protection against accidentally exceeding option array limits - All existing functionality preserved and tested Assisted-by: Cursor
- Fix event type descriptions to match actual scanner behavior - Correct context diff changed line behavior: different content, not same content - Add concrete example patch content directly in documentation - Update all examples to use consistent --verbose flag format - Include both unified and context diff examples with actual scanner_debug output - Fix line numbers, event counts, and output format to match real binary behavior The documentation now accurately reflects the actual scanner_debug utility behavior, making it reliable for debugging and learning purposes. Assisted-by: Cursor
- Update copyright year from 2024 to 2025 in all new scanner-related files: - src/ls.c, src/patch_scanner.c, src/patch_scanner.h - src/scanner_debug.c, tests/scanner/test_basic.c - Remove tests/scanner/README.md which incorrectly claimed most scanner functionality was "TODO" when it's actually fully implemented with comprehensive test coverage including: - Complete unified/context diff parsing - Full Git extended header support - Binary patch handling and error conditions - Line number tracking and edge case handling The scanner implementation is mature and well-tested, making the outdated documentation misleading for developers. Assisted-by: Cursor
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.