feat: 增强中文数字转阿拉伯数字的规范化规则 #320
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
修改内容
本次PR增强了中文文本中数字的规范化处理,主要包括:
1. 连续中文数字处理
11、12、1322、2315、16(用、分隔)2. 新增范围识别规则
21-22(两个连续完整数字的范围)3. 数字+英文混合处理
4a级景区4. 单位词处理优化
6万(数字与单位正确分离)5. 日期关键词扩展
测试用例示例
"十一二三月份"→ 输出:"11、12、13月份""四a级景区和六万游客"→ 输出:"4a级景区和6万游客""二零二三财年"→ 输出:"2023财年"(需配合日期识别)"从二十一二十二章"→ 输出:"从21-22章"解决的问题
影响范围