Fix two out-of-bounds read issues when handling truncated UTF-8 input #1005
+110
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Two independent out-of-bounds read issues were identified in OpenCC's UTF-8 processing logic when handling malformed or truncated UTF-8 sequences.
MaxMatchSegmentation:
NextCharLength() could return a value larger than the remaining input size.
The previous logic subtracted this value from a size_t length counter,
potentially causing underflow and subsequent out-of-bounds reads.
Conversion:
Similar length handling could allow reads past the end of the input buffer
during dictionary matching, potentially propagating unintended bytes to the
conversion output.
This patch fixes both issues by:
The changes preserve existing behavior for valid UTF-8 input and add test coverage for truncated UTF-8 sequences.
These issues may have security implications when processing untrusted input and are classified as heap out-of-bounds reads (CWE-125).