-
Notifications
You must be signed in to change notification settings - Fork 563
feat(content_safety): add support to auto select multilingual refusal bot messages #1530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…age support Detect user input language and return refusal messages in the same language when content safety rails block unsafe content. Supports 9 languages: English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, and Thai.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| DEFAULT_REFUSAL_MESSAGES: Dict[str, str] = { | ||
| "en": "I'm sorry, I can't respond to that.", | ||
| "es": "Lo siento, no puedo responder a eso.", | ||
| "zh": "抱歉,我无法回应。", | ||
| "de": "Es tut mir leid, darauf kann ich nicht antworten.", | ||
| "fr": "Je suis désolé, je ne peux pas répondre à cela.", | ||
| "hi": "मुझे खेद है, मैं इसका जवाब नहीं दे सकता।", | ||
| "ja": "申し訳ありませんが、それには回答できません。", | ||
| "ar": "عذراً، لا أستطيع الرد على ذلك.", | ||
| "th": "ขออภัย ฉันไม่สามารถตอบได้", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we later had other multilingual rails, would we be repeating this mechanism in each rail? Or just the set of supported languages per rail? I don't think we need to do it now (since we don't have other multilingual rails to test it), but we should be aware of what refactoring would be needed to move the below language detection to a shared level.
| try: | ||
| from fast_langdetect import detect | ||
|
|
||
| result = detect(text, k=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does fast-langdetect ever return a full locale with dialect, like en-US versus en? I don't see it in the docs, but I do see some upper/lowercase inconsistency.
|
This looks really good @Pouyanpi ! I have a few comments:
Not needed in this PR, but I'm thinking of RAG prompts where we have LLM instructions, user query, and relevant context chunks are all in a flattened prompt. These prompts can be pretty long (up to 7k tokens in some cases). This isn't needed for this PR, but I would be interested in a follow-on where we sample part of a prompt before running classification on the sample (e.g. 200 chars). This would be an optional config field. Customers would then have a knob to trade off accuracy vs latency for language detection. |
Description
Detect user input language and return refusal messages in the same language when content safety rails block unsafe content. Supports 9 languages: English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, and Thai.
TODO:
Language Detection Benchmark Results
Datasets Used
Chinese samples in Nemotron are all REDACTED; Chinese coverage validated via papluca dataset.
Overall Accuracy comparison
Latency comparison (μs)
Per Language Accuracy (fast-langdetect)
Per-Language Accuracy (lingua)
Why fast-langdetect?
https://github.com/LlmKira/fast-langdetect
Error analysis
Most errors occur with:
The action correctly falls back to English (en) for unsupported detected languages.
Benchmark Scripts
checkout to temp/lang-detect-benchmark branch
Located in eval/language_detection/:
make sure to have datasets and pandas installed: