Skip to content

Conversation

@pan93412
Copy link
Collaborator

@pan93412 pan93412 commented Oct 31, 2025

This pull request introduces enhancements to the spelling and case rule detection logic, as well as improvements to documentation and database interfaces. The main focus is to support alternative case forms for terms, refine the exclusion logic for spellchecking, and clarify setup instructions.

Documentation improvements:

  • Updated README.md to use correct repository URLs, add shell/env block formatting, and clarify installation/startup steps. Redundant setup instructions were removed for clarity. [1] [2] [3]
  • Added documentation for "alternative case forms" in the case rule section, explaining that terms like App can also be written as app.

Database and model changes:

  • The CaseRule interface and schema now support an optional alternatives field (array of strings) for alternative case forms.
  • Updated the ICaseDatabase interface and its implementations (CaseDatabase, MockCaseDatabase) so addRule accepts an optional alternatives parameter. [1] [2] [3]

Spellchecking logic improvements:

  • Refactored the Determiner class in determiner.ts to modularize and improve exclusion logic for URLs, paths, code blocks, and mentions. Added helper methods for building/merging exclusion ranges and for checking matches outside excluded ranges.
  • Enhanced case rule checking to support alternative forms, ensuring that only incorrect case usages outside excluded ranges are flagged.

I also resolved the case-checking issues evident in the CI. There should be no spelling or casing problems in this codebase now.

Minor dependency and import updates:

  • Adjusted imports in determiner.ts to include the new CaseRule type.

Fixed #26
Fixed https://discord.com/channels/1259032762422136902/1259032897373999144/1433821827364425758

Summary by CodeRabbit

  • New Features

    • Added support for alternative terms/casings when defining case rules.
    • Improved handling of special content areas—rules now exclude URLs, code blocks, inline code, and Markdown links from checks.
  • Documentation

    • Updated setup instructions with corrected repository references and better formatting.
  • Tests

    • Added comprehensive test coverage for edge cases including URLs, code blocks, and Markdown scenarios.

@pan93412 pan93412 self-assigned this Oct 31, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 31, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This PR extends the case-rule system to support alternative term variations by adding an alternatives field to the CaseRule model and database layer. It simultaneously refactors the Determiner's spelling-check logic to use a range-based exclusion system that skips URLs, code blocks, and other contexts where style rules should not apply.

Changes

Cohort / File(s) Summary
Configuration
.gitignore
Added package-lock.json to ignore list with note on pnpm usage.
Documentation
README.md
Updated repository URLs (Edit-Mr/WinstonBot), adjusted clone and directory references, added language hints to code blocks, documented CaseRule casing alternatives, and removed trailing setup content.
Data Model
models.ts
Extended CaseRule interface and schema with optional alternatives: string[] | null field and corresponding validator.
Database Interface
database-interfaces.ts
Updated ICaseDatabase.addRule() signature to accept optional alternatives?: string[] | null parameter alongside term.
Database Implementation
database.ts, database-test-utils.ts
Updated CaseDatabase.addRule() and MockCaseDatabase.addRule() to accept and persist the optional alternatives parameter; inserted documents now include { term, alternatives, _id } structure.
Spelling Checker Refactoring
determiner.ts
Introduced URL_REGEX field and range-based exclusion system via new private methods (#buildExcludedRanges, #addMatchedRanges, #mergeRanges, #createIsExcludedChecker, #getSanitizedContent); refactored checkSpelling, #checkGeneralRules, and #checkCaseRules to skip URLs, code blocks, paths, and other excluded ranges; imported CaseRule type from models.
Test Coverage
determiner.test.ts
Added comprehensive tests covering HTTP-rule exclusions in code blocks, URL/Markdown handling, inline code blocks, mixed URL/Markdown scenarios, and edge cases around casing and alternatives.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Determiner
    participant RangeBuilder
    participant RuleChecker
    participant Validator

    Caller->>Determiner: checkSpelling(content)
    activate Determiner
    Determiner->>RangeBuilder: `#buildExcludedRanges`(content)
    activate RangeBuilder
    RangeBuilder->>RangeBuilder: `#addMatchedRanges` (URLs)
    RangeBuilder->>RangeBuilder: `#addPathRanges`
    RangeBuilder->>RangeBuilder: `#addCodeBlockRanges`
    RangeBuilder->>RangeBuilder: `#mergeRanges`
    RangeBuilder-->>Determiner: excludedRanges[]
    deactivate RangeBuilder
    
    Determiner->>Determiner: `#getSanitizedContent` (remove excluded)
    Determiner->>Determiner: `#createIsExcludedChecker`
    
    rect rgb(220, 240, 255)
    Note over Determiner,RuleChecker: Check General Rules
    Determiner->>RuleChecker: `#checkGeneralRules`(sanitized, isExcluded)
    RuleChecker->>Validator: validate matches not in excluded ranges
    Validator-->>RuleChecker: mistakes[]
    RuleChecker-->>Determiner: generalMistakes[]
    end
    
    rect rgb(220, 240, 255)
    Note over Determiner,RuleChecker: Check Case Rules
    Determiner->>RuleChecker: `#checkCaseRules`(sanitized, isExcluded)
    RuleChecker->>Validator: validate term + alternatives in excluded ranges
    Validator-->>RuleChecker: mistakes[]
    RuleChecker-->>Determiner: caseMistakes[]
    end
    
    Determiner->>Determiner: merge all mistakes
    Determiner-->>Caller: mistakes[]
    deactivate Determiner
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • determiner.ts: Core refactoring introduces five new private methods with range-based exclusion logic; requires careful verification of merge logic, boundary condition handling, and integration with existing rule-checking routines.
  • determiner.test.ts: Extensive test additions demand validation that all code-block, URL, and Markdown scenarios are correctly handled; edge cases around alternatives and casing merit specific attention.
  • Database layer changes (database-interfaces.ts, database.ts, database-test-utils.ts, models.ts): While more straightforward, the cascading signature changes across multiple layers and the schema validator logic should be verified for consistency.

Possibly related PRs

Suggested reviewers

  • Edit-Mr

Poem

🐰 Hop hop, the rules now know their way,
Through URLs and code they'll gently stray,
Alternative terms in harmony rest,
Ranges excluded—the speller's test—
No more false alarms where they don't belong,
The checker now dances to a smarter song! 🎵

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'URL and case rules fix' directly addresses the main changes: URL exclusion logic and case rule enhancements with alternatives support.
Linked Issues check ✅ Passed PR addresses bug #26 by refactoring case rule detection to support alternatives and improved exclusion logic for URLs and code blocks.
Out of Scope Changes check ✅ Passed Changes include README updates (repository URLs, documentation), database interface refactoring for alternatives support, and improved exclusion logic—all aligned with supporting alternative case forms and fixing URL/code block handling.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch alternatives-for-caserules

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0842106 and 190cb33.

📒 Files selected for processing (8)
  • .gitignore (1 hunks)
  • README.md (3 hunks)
  • database-interfaces.ts (1 hunks)
  • database-test-utils.ts (1 hunks)
  • database.ts (1 hunks)
  • determiner.test.ts (2 hunks)
  • determiner.ts (3 hunks)
  • models.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T12:45:58.617Z
Learnt from: pan93412
Repo: Edit-Mr/WinstonBot PR: 28
File: determiner.ts:167-178
Timestamp: 2025-10-31T12:45:58.617Z
Learning: In the WinstonBot project's determiner.ts file, the `searchWord` variable in spelling rules intentionally supports regex patterns. The rules are stored in MongoDB as controlled input, so constructing RegExp from these patterns is by design and not a security concern.

Applied to files:

  • database-test-utils.ts
  • determiner.test.ts
  • models.ts
  • determiner.ts
🧬 Code graph analysis (2)
determiner.test.ts (2)
database-test-utils.ts (2)
  • MockSpellingDatabase (16-56)
  • MockCaseDatabase (61-91)
determiner.ts (1)
  • Determiner (15-275)
determiner.ts (2)
zhhanttw.ts (1)
  • detectChineseType (9-23)
models.ts (2)
  • SpellingRuleType (20-20)
  • CaseRule (47-51)
🪛 ast-grep (0.39.7)
determiner.ts

[warning] 56-56: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(this.#URL_REGEX, "g")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 211-211: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 216-216: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, "gi")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 247-247: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(lowerTerm, "gi")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (21)
.gitignore (1)

54-55: LGTM!

The addition of package-lock.json to .gitignore is appropriate for projects using pnpm as their package manager, since pnpm generates its own lockfile (pnpm-lock.yaml).

README.md (2)

27-52: LGTM!

The addition of language identifiers (shell, env) to code blocks improves syntax highlighting and readability in rendered Markdown.


86-86: LGTM!

The documentation now clearly mentions the alternative case forms feature (替代大小寫形式), which aligns well with the alternatives field added to the CaseRule model.

determiner.test.ts (4)

265-310: LGTM!

Excellent test coverage for HTTP-related case rules within code blocks and URLs. The tests properly validate that:

  • HTTP terms inside inline and multi-line code blocks are excluded from case checking
  • HTTP terms outside code blocks are still checked
  • URLs containing HTTP-like terms are properly excluded

417-518: LGTM!

Comprehensive test coverage for URL and path exclusion logic, covering multiple URL schemes (http, https, rtmp) and different path formats (relative, parent directory, absolute). The test validating that text outside URLs is still checked (lines 484-495) is particularly important for preventing false negatives.


520-622: LGTM!

Thorough test coverage for @ mention exclusion logic. The tests properly validate that @mentions with alphanumeric characters and underscores are excluded while text outside @mentions is still checked.


624-802: LGTM!

Excellent test coverage for the new alternatives feature. The tests comprehensively validate:

  • Valid alternatives are accepted (lines 625-639)
  • Invalid forms are flagged (lines 641-658)
  • Edge cases with null, undefined, and empty arrays (lines 660-711)
  • Multiple alternatives (lines 713-745)
  • Interaction with code block exclusions (lines 747-779)
  • Multiple rules with different alternatives (lines 781-801)

This thorough coverage gives confidence that the alternatives feature works correctly.

models.ts (2)

50-50: Type definition accepts both null and undefined.

The field accepts both null and undefined for optional values. While this was noted in a previous review, the maintainer appears to have chosen to keep both. This works correctly across the codebase (database layer, mocks, and tests all handle both values).


59-70: LGTM!

The validator correctly allows undefined, null, or an array of strings. The validation logic is sound and aligns with the TypeScript interface definition.

database-test-utils.ts (1)

76-77: LGTM!

The mock implementation correctly accepts and stores the alternatives parameter, maintaining consistency with the updated interface and the real database implementation.

database-interfaces.ts (1)

83-85: LGTM!

The interface extension to support alternatives is clean and well-documented. The optional parameter maintains backward compatibility.

database.ts (1)

140-144: LGTM!

The implementation correctly extends the database layer to persist alternatives alongside the term. The cache invalidation logic appropriately follows the insertion.

determiner.ts (9)

19-21: LGTM!

The URL_REGEX is now defined without the global flag, and a new instance with the 'g' flag is created when needed (line 57). This addresses potential statefulness issues with shared regex instances.


48-67: LGTM!

The range-based exclusion system is well-architected. Breaking down the logic into building, merging, and checking exclusion ranges makes the code modular and testable. The comprehensive test coverage validates this approach.


69-81: LGTM!

The #addMatchedRanges helper method is clean and reusable. The optional overlap checker parameter provides flexibility for different matching strategies.


83-114: Consider improving URL-in-path detection logic.

The current logic tests beforePath substring for URL patterns, but this may not correctly detect if the path is actually part of a URL (as noted in a previous review). The :// delimiter might span the boundary between beforePath and the current match.

However, since you previously indicated this approach is acceptable ("應該是還好"), and comprehensive tests are passing, the current implementation appears to work for the project's use cases.

If you'd like to make the detection more robust, consider checking if there's an overlapping URL match that contains the path range, rather than testing a substring:

// Alternative approach: check for overlapping URL ranges
let isPartOfURL = false;
const urlRegex = new RegExp(this.#URL_REGEX.source, 'g');
let urlMatch: RegExpExecArray | null;
while ((urlMatch = urlRegex.exec(content)) !== null) {
	const urlStart = urlMatch.index;
	const urlEnd = urlStart + urlMatch[0].length;
	if (start < urlEnd && end > urlStart) {
		isPartOfURL = true;
		break;
	}
}
if (isPartOfURL) {
	continue;
}

116-137: LGTM!

The range merging logic is correct and efficient. Sorting ranges by start position and merging overlapping/adjacent ranges in a single pass is the standard algorithm for this operation.


139-171: LGTM!

Excellent optimization! Automatically selecting between linear search (for ≤10 ranges) and binary search (for >10 ranges) provides good performance characteristics for both small and large exclusion lists. The binary search implementation is correct.


173-188: LGTM!

The sanitization logic correctly removes excluded ranges while preserving the remaining content.


190-232: LGTM!

The refactored general rules checking is well-structured. First checking the sanitized content for efficiency, then validating matches in the original content while skipping excluded positions is a sound approach.

Based on learnings, the RegExp construction from rule.wrong is intentional and safe since rules come from controlled MongoDB storage.


234-274: LGTM!

The case rules checking correctly implements the alternatives feature. Line 242 properly handles null/undefined alternatives with the nullish coalescing operator (??), and the validation logic correctly checks if a matched form is in the valid set before flagging it as an error.

Based on learnings, the RegExp construction from lowerTerm is safe as it comes from the controlled CaseRule database.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pan93412 pan93412 requested a review from Copilot October 31, 2025 15:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for alternative case forms in case rules and refactors the spell-checking logic. The main changes include:

  • Adding an alternatives field to CaseRule to allow multiple valid case forms (e.g., "JavaScript" can also be "javascript")
  • Refactoring the Determiner class to use a more modular approach with separate methods for building excluded ranges, checking general rules, and checking case rules
  • Improving URL detection to use a more comprehensive regex pattern that covers various URL schemes (not just http/https)
  • Adding comprehensive test coverage for the new alternatives feature and URL/path exclusion logic

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
models.ts Adds alternatives field to CaseRule interface and schema with validation
determiner.ts Refactors spell-checking logic into modular methods and improves URL regex to support multiple schemes
determiner.test.ts Adds comprehensive test coverage for alternatives feature and URL/path exclusion scenarios
database.ts Updates addRule method to accept alternatives parameter
database-test-utils.ts Updates mock database to support alternatives parameter
database-interfaces.ts Updates interface definition to include alternatives parameter
README.md Updates documentation with correct repository URLs, adds shell syntax highlighting, documents alternatives feature, removes duplicate setup instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

determiner.ts Outdated

// https://datatracker.ietf.org/doc/html/rfc1738#section-2.1
// https://datatracker.ietf.org/doc/html/rfc1738#section-3.1
#URL_REGEX = /[a-zA-Z0-9+.-]*:\/\/[^\s]+/g;
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL regex is defined as a class property with the g flag, which makes it stateful. When reused multiple times (as in lines 57 and 98), the regex's lastIndex property won't reset between different strings, causing unpredictable matching behavior. This is a common bug with shared regex instances. Either remove the g flag and create new regex instances when needed, or reset lastIndex before each use.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link
Contributor

Copilot AI commented Nov 5, 2025

@pan93412 I've opened a new pull request, #30, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits November 5, 2025 20:02
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Yi-Jyun Pan <me@pan93.com>
@pan93412 pan93412 requested a review from Copilot November 5, 2025 12:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pan93412 pan93412 marked this pull request as ready for review November 5, 2025 12:15
@pan93412 pan93412 requested a review from Edit-Mr November 5, 2025 12:15
@Edit-Mr Edit-Mr merged commit 2356530 into main Nov 5, 2025
10 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] app

3 participants