feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Pouyanpi · 2025-12-02T16:42:22Z

Description

Detect user input language and return refusal messages in the same language when content safety rails block unsafe content. Supports 9 languages: English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, and Thai.

TODO:

add tests
complete and report benchmarking

Language Detection Benchmark Results

Datasets Used

Dataset	Description	Samples	Languages
papluca	language-identification	40,500	9 (all supported)
nemotron	NVIDIA Nemotron-Safety-Guard-Dataset-v3	336,283	8 (missing zh*)

Chinese samples in Nemotron are all REDACTED; Chinese coverage validated via papluca dataset.

Overall Accuracy comparison

Dataset	Samples	fast-langdetect	lingua	detect_language action
papluca	40,500	99.71%	99.79%	99.71%
nemotron	336,283	99.35%	99.46%	99.42%

Latency comparison (μs)

Dataset	fast-langdetect Avg	fast-langdetect P95	lingua Avg	lingua P95	Action Avg	Action P95
papluca	12.12	15.54	116.21	205.29	25.77	28.75
nemotron	11.53	15.50	162.59	377.92	26.25	28.71

Per Language Accuracy (fast-langdetect)

Language	papluca	nemotron
ar (Arabic)	98.87%	99.63%
de (German)	99.93%	99.39%
en (English)	100.00%	99.03%
es (Spanish)	100.00%	99.04%
fr (French)	99.98%	99.25%
hi (Hindi)	98.76%	99.60%
ja (Japanese)	100.00%	99.61%
th (Thai)	99.93%	99.29%
zh (Chinese)	99.93%	N/A

Per-Language Accuracy (lingua)

Language	papluca	nemotron
ar (Arabic)	99.84%	99.75%
de (German)	100.00%	99.55%
en (English)	99.93%	99.00%
es (Spanish)	99.98%	99.43%
fr (French)	99.82%	99.35%
hi (Hindi)	98.80%	99.81%
ja (Japanese)	100.00%	99.69%
th (Thai)	99.78%	99.12%
zh (Chinese)	99.93%	N/A

Why fast-langdetect?

https://github.com/LlmKira/fast-langdetect

MIT license and Creative Commons Attribution-Share-Alike License 3.0.
comparable accuracy: within 0.1-0.5% of lingua across all datasets (99.35% vs 99.46% on 336k samples)
10-14x faster: average latency ~12μs vs ~140μs
simpler integration: single lightweight dependency
no cold start issues: unlike lingua which requires model building
no dependency issue in future

Error analysis

Most errors occur with:

short text (single words): insufficient context for detection
mixed language content: text containing English within non-English context
similar language confusion: Spanish vs Galician, Hindi vs Marathi, Arabic vs Persian

The action correctly falls back to English (en) for unsupported detected languages.

Benchmark Scripts

checkout to temp/lang-detect-benchmark branch

Located in eval/language_detection/:
make sure to have datasets and pandas installed:

poetry run pip install pandas datasets

# run all benchmarks
poetry run python eval/language_detection/run_benchmarks.py

# Or run individually
poetry run python eval/language_detection/benchmark.py --dataset papluca --mode action --report eval/language_detection/reports/

…age support Detect user input language and return refusal messages in the same language when content safety rails block unsafe content. Supports 9 languages: English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, and Thai.

codecov · 2025-12-02T16:49:30Z

Codecov Report

❌ Patch coverage is 44.44444% with 25 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/library/content_safety/actions.py	35.89%	25 Missing ⚠️

📢 Thoughts on this report? Let us know!

fix

cparisien · 2025-12-03T14:25:34Z

nemoguardrails/library/content_safety/actions.py

+DEFAULT_REFUSAL_MESSAGES: Dict[str, str] = {
+    "en": "I'm sorry, I can't respond to that.",
+    "es": "Lo siento, no puedo responder a eso.",
+    "zh": "抱歉，我无法回应。",
+    "de": "Es tut mir leid, darauf kann ich nicht antworten.",
+    "fr": "Je suis désolé, je ne peux pas répondre à cela.",
+    "hi": "मुझे खेद है, मैं इसका जवाब नहीं दे सकता।",
+    "ja": "申し訳ありませんが、それには回答できません。",
+    "ar": "عذراً، لا أستطيع الرد على ذلك.",
+    "th": "ขออภัย ฉันไม่สามารถตอบได้",
+}


If we later had other multilingual rails, would we be repeating this mechanism in each rail? Or just the set of supported languages per rail? I don't think we need to do it now (since we don't have other multilingual rails to test it), but we should be aware of what refactoring would be needed to move the below language detection to a shared level.

cparisien · 2025-12-03T14:40:42Z

nemoguardrails/library/content_safety/actions.py

+    try:
+        from fast_langdetect import detect
+
+        result = detect(text, k=1)


Does fast-langdetect ever return a full locale with dialect, like en-US versus en? I don't see it in the docs, but I do see some upper/lowercase inconsistency.

tgasser-nv · 2025-12-03T14:59:47Z

This looks really good @Pouyanpi ! I have a few comments:

Could you commit the evaluation scripts as well in the final PR (for reproducibility?)
What does the "Action" column in the latency report refer to? Is this the latency end-to-end when fast-langdetect embedded in a Guardrails action? It approximately doubles the mean and p95.
* Is it possible to customize refusal texts in Colang-only, or does it need a Python change? Just saw this is in the RailsConfig, that's perfect.
Could you calculate percentiles of prompt-length (ideally in tokens but characters is fine too) for each of the datasets?

Not needed in this PR, but I'm thinking of RAG prompts where we have LLM instructions, user query, and relevant context chunks are all in a flattened prompt. These prompts can be pretty long (up to 7k tokens in some cases). This isn't needed for this PR, but I would be interested in a follow-on where we sample part of a prompt before running classification on the sample (e.g. 200 chars). This would be an optional config field. Customers would then have a knob to trade off accuracy vs latency for language detection.

Pouyanpi added 2 commits December 3, 2025 12:29

add tests

8b18bb5

fix

fix

1a6ea08

cparisien reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Uh oh!

Pouyanpi commented Dec 2, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

cparisien Dec 3, 2025

Uh oh!

cparisien Dec 3, 2025

Uh oh!

tgasser-nv commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Are you sure you want to change the base?

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Uh oh!

Conversation

Pouyanpi commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

TODO:

Language Detection Benchmark Results

Datasets Used

Overall Accuracy comparison

Latency comparison (μs)

Per Language Accuracy (fast-langdetect)

Per-Language Accuracy (lingua)

Error analysis

Benchmark Scripts

Uh oh!

codecov bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cparisien Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

cparisien Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

tgasser-nv commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Pouyanpi commented Dec 2, 2025 •

edited

Loading

codecov bot commented Dec 2, 2025 •

edited

Loading

tgasser-nv commented Dec 3, 2025 •

edited

Loading