-
Notifications
You must be signed in to change notification settings - Fork 34
Description
❌ This issue is not open for contribution. Visit Contributing guidelines to learn about the contributing process and how to find suitable issues.
Overview
Migrate le_utils/constants/languages.py from the legacy JSON-as-data approach to the modern spec + code generation system. This is the most complex module with 1,141 lines of language data, custom namedtuple properties, and multiple helper functions.
Context
Currently, le_utils/constants/languages.py uses the legacy approach:
- Loads
resources/languagelookup.json(22,820 bytes, 1,141 lines!) - Custom Language namedtuple with
code,id, andfirst_native_nameproperties - Multiple helper functions:
getlang(),getlang_by_name(),getlang_by_native_name(),getlang_by_alpha2() - RTL language list:
RTL_LANG_CODES - No JavaScript export available
Current Structure
File: le_utils/resources/languagelookup.json (1,141 language entries)
{
"aa": {
"name": "Afar",
"native_name": "Afaraf"
},
"en": {
"name": "English",
"native_name": "English"
},
"es-MX": {
"name": "Spanish (Mexico)",
"native_name": "Español (México)"
},
...
}Python module has:
- Custom Language namedtuple with properties:
codeproperty: combinesprimary_codeandsubcode(e.g., "en-US")idproperty: alias forcodefirst_native_nameproperty: first name from comma-separated list
- Helper functions for lookups by various criteria
- RTL language codes list
Target Spec Format
Create spec/constants-languages.json with all language data:
{
"namedtuple": {
"name": "Language",
"fields": ["native_name", "primary_code", "subcode", "name"],
"properties": {
"code": "return '{}-{}'.format(self.primary_code, self.subcode) if self.subcode else self.primary_code",
"id": "return self.code",
"first_native_name": "return self.native_name.split(',')[0]"
}
},
"rtl_codes": ["ar", "arq", "dv", "he", "fa", "ps", "ur", "yi"],
"constants": {
"aa": {
"name": "Afar",
"native_name": "Afaraf"
},
"en": {
"name": "English",
"native_name": "English"
},
"es-MX": {
"name": "Spanish (Mexico)",
"native_name": "Español (México)"
}
}
}Copy all 1,141 entries from languagelookup.json. The generation script will parse language codes (e.g., "es-MX") into primary_code="es" and subcode="MX".
Note: The properties metadata tells the generation script to add @property methods to the namedtuple class.
Generation Script Enhancement
Update scripts/generate_from_specs.py to handle:
- Namedtuple properties from
propertiesmetadata - RTL codes list from
rtl_codesmetadata - Helper functions for language lookups:
getlang(code)- lookup by codegetlang_by_name(name)- case-insensitive lookup by English namegetlang_by_native_name(native_name)- case-insensitive lookupgetlang_by_alpha2(alpha2)- lookup by 2-letter code
Generated Output Example
Python (le_utils/constants/languages.py):
# Generated by scripts/generate_from_specs.py
from collections import namedtuple
class Language(namedtuple("Language", ["native_name", "primary_code", "subcode", "name"])):
@property
def code(self):
return "{}-{}".format(self.primary_code, self.subcode) if self.subcode else self.primary_code
@property
def id(self):
return self.code
@property
def first_native_name(self):
return self.native_name.split(",")[0]
RTL_LANG_CODES = ["ar", "arq", "dv", "he", "fa", "ps", "ur", "yi"]
LANGUAGELIST = [
Language(native_name="Afaraf", primary_code="aa", subcode=None, name="Afar"),
Language(native_name="English", primary_code="en", subcode=None, name="English"),
Language(native_name="Español (México)", primary_code="es", subcode="MX", name="Spanish (Mexico)"),
# ... (1,141 total)
]
_LANGUAGELOOKUP = {lang.code: lang for lang in LANGUAGELIST}
_LANGUAGELOOKUP_BY_NAME = {lang.name.lower(): lang for lang in LANGUAGELIST}
_LANGUAGELOOKUP_BY_NATIVE_NAME = {lang.native_name.lower(): lang for lang in LANGUAGELIST}
_LANGUAGELOOKUP_BY_ALPHA2 = {lang.primary_code: lang for lang in LANGUAGELIST if not lang.subcode}
def getlang(code, default=None):
return _LANGUAGELOOKUP.get(code) or default
def getlang_by_name(name, default=None):
return _LANGUAGELOOKUP_BY_NAME.get(name.lower()) or default
def getlang_by_native_name(native_name, default=None):
return _LANGUAGELOOKUP_BY_NATIVE_NAME.get(native_name.lower()) or default
def getlang_by_alpha2(alpha2, default=None):
return _LANGUAGELOOKUP_BY_ALPHA2.get(alpha2) or defaultJavaScript (js/Languages.js):
// Generated by scripts/generate_from_specs.py
export const RTL_LANG_CODES = ["ar", "arq", "dv", "he", "fa", "ps", "ur", "yi"];
export const LanguagesList = [
{ native_name: "Afaraf", primary_code: "aa", subcode: null, name: "Afar", code: "aa", first_native_name: "Afaraf" },
{ native_name: "English", primary_code: "en", subcode: null, name: "English", code: "en", first_native_name: "English" },
{ native_name: "Español (México)", primary_code: "es", subcode: "MX", name: "Spanish (Mexico)", code: "es-MX", first_native_name: "Español (México)" },
// ...
];
export const LanguagesMap = new Map(
LanguagesList.map(lang => [lang.code, lang])
);
export function getLanguage(code) {
return LanguagesMap.get(code) || null;
}
export function getLanguageByName(name) {
return LanguagesList.find(lang => lang.name.toLowerCase() === name.toLowerCase()) || null;
}
export function getLanguageByNativeName(nativeName) {
return LanguagesList.find(lang => lang.native_name.toLowerCase() === nativeName.toLowerCase()) || null;
}
export function getLanguageByAlpha2(alpha2) {
return LanguagesList.find(lang => lang.primary_code === alpha2 && !lang.subcode) || null;
}Testing Updates
Files: tests/test_languages.py and tests/test_getlangs.py
Update to test against spec:
spec_path = os.path.join(os.path.dirname(__file__), "..", "spec", "constants-languages.json")
with open(spec_path) as f:
spec = json.load(f)
languagelookup = spec["constants"]
# Verify all 1,141 languages
# Test helper functions
# Test Language properties (code, id, first_native_name)
# Test RTL_LANG_CODES listHow to Run Tests
pytest tests/test_languages.py -v
pytest tests/test_getlangs.py -v
pytest tests/ -vAcceptance Criteria
-
spec/constants-languages.jsoncreated with all 1,141 language entries - Added
propertiesmetadata for code, id, first_native_name - Added
rtl_codesmetadata -
scripts/generate_from_specs.pyenhanced to generate namedtuple properties -
make buildsuccessfully generates Python and JavaScript files - Generated
le_utils/constants/languages.pyhas:- Language namedtuple with 4 fields and 3 properties
-
RTL_LANG_CODESlist -
LANGUAGELISTwith all 1,141 languages - Helper functions (getlang, getlang_by_name, etc.)
- Lookup dicts
- Generated
js/Languages.jshas:-
RTL_LANG_CODESexport -
LanguagesListwith computed properties (code, first_native_name) -
LanguagesMapfor lookups - Helper functions (getLanguage, getLanguageByName, etc.)
-
- Tests updated to test against spec
- All tests pass
-
resources/languagelookup.jsondeleted
Disclosure
🤖 This issue was written by Claude Code, under supervision, review and final edits by @rtibbles 🤖
