-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hello, I am trying to let MetaMap process some translated german texts, which include words with the letter 'ß'.
After analyzing why the JSON output breaks, I found out that the character 'ß' seems to cause an error, if it is included in a word (not a standalone character).
Example request:
from skr_web_api import Submission, METAMAP_INTERACTIVE_URL
args = "-AI -R SNOMEDCT_US_2022_03_01 --JSONf 2 -V USAbase -Z 2022AA"
inst = Submission(email, apikey)
inst.init_mm_interactive('This is a test with Straße', args=args)
response = inst.submit()
When I decode the content of the response via response.content.decode(), it returns a broken JSON string (broken, since it does not clsoe at the end and seems cut off):
/dmzfiler/II_Group/MetaMap2020/public_mm/bin/SKRrun.20 /dmzfiler/II_Group/MetaMap2020/public_mm/bin/metamap20.BINARY.Linux --lexicon db -Z 2022AA --silent -AI -R SNOMEDCT_US_2022_03_01 --JSONf 2 -V USAbase
{"AllDocuments":[
{
"Document": {
"CmdLine": {
"Command": "metamap --lexicon db -Z 2022AA --silent -AI -R SNOMEDCT_US_2022_03_01 --JSONf 2 -V USAbase",
"Options": [
{
"OptName": "lexicon",
"OptValue": "db"
},
{
"OptName": "mm_data_year",
"OptValue": "2022AA"
},
{
"OptName": "silent"
},
{
"OptName": "strict_model"
},
{
"OptName": "show_cuis"
},
{
"OptName": "restrict_to_sources",
"OptValue": ["SNOMEDCT_US_2022_03_01"]
},
{
"OptName": "JSONf",
"OptValue": "2"
},
{
"OptName": "mm_data_version",
"OptValue": "USAbase"
},
{
"OptName": "infile",
"OptValue": "user_input"
},
{
"OptName": "outfile",
"OptValue": "user_output"
}]
},
"AAs": [],
"Negations": [],
"Utterances": [
{
"PMID": "USER",
"UttSection": "tx",
"UttNum": "1",
"UttText": [
Somewhat of fix would be possible by replacing the character 'ß' with 'ss' to avoid this issue, but I am not sure if the results will be the same as with the online version of MetaMap, since words containing 'ß' are not a problem there:
Request:
User Information: fu-sung.kim-benjamin.tang@rwth-aachen.de
Run Time: 12/06/2022 06:12:29
MetaMap Version Used: metamap20
MetaMap Options: -A+ -R SNOMEDCT_US_2022_03_01 --JSONf 2 -V USAbase
Knowledge Source Used: 2022AA
Input Text:
This is a test with Straße
Output:
{
"Document": {
"CmdLine": {
"Command": "metamap --lexicon db -Z 2022AA -A+ -R SNOMEDCT_US_2022_03_01 --JSONf 2 -V USAbase /usr/local/apache/htdocs/II/Scheduler/foo/inter_12062022_06:12:29_95743_fu-sung.kim-benjamin.tang@rwth-aachen.de_124752701.tmp /usr/local/apache/htdocs/II/Scheduler/foo/inter_12062022_06:12:29_95743_fu-sung.kim-benjamin.tang@rwth-aachen.de_124752701.out",
"Options": [
{
"OptName": "lexicon",
"OptValue": "db"
},
{
"OptName": "mm_data_year",
"OptValue": "2022AA"
},
{
"OptName": "strict_model"
},
{
"OptName": "bracketed_output"
},
{
"OptName": "restrict_to_sources",
"OptValue": ["SNOMEDCT_US_2022_03_01"]
},
{
"OptName": "JSONf",
"OptValue": "2"
},
{
"OptName": "mm_data_version",
"OptValue": "USAbase"
},
{
"OptName": "infile",
"OptValue": "/usr/local/apache/htdocs/II/Scheduler/foo/inter_12062022_06:12:29_95743_fu-sung.kim-benjamin.tang@rwth-aachen.de_124752701.tmp"
},
{
"OptName": "outfile",
"OptValue": "/usr/local/apache/htdocs/II/Scheduler/foo/inter_12062022_06:12:29_95743_fu-sung.kim-benjamin.tang@rwth-aachen.de_124752701.out"
}]
},
"AAs": [],
"Negations": [],
"Utterances": [
{
"PMID": "inter_12062022_06:12:29_95743_fu-sung.kim-benjamin.tang@rwth-aachen.de_124752701.tmp",
"UttSection": "tx",
"UttNum": "1",
"UttText": "This is a test with Straße",
"UttStartPos": "0",
"UttLength": "26",
"Phrases": [
{
"PhraseText": "This",
"SyntaxUnits": [
{
"SyntaxType": "pron",
"LexMatch": "this",
"InputMatch": "This",
"LexCat": "pron",
"Tokens": ["this"]
}],
"PhraseStartPos": "0",
"PhraseLength": "4",
"Candidates": [],
"Mappings": []
},
{
"PhraseText": "is",
"SyntaxUnits": [
{
"SyntaxType": "aux",
"LexMatch": "is",
"InputMatch": "is",
"LexCat": "aux",
"Tokens": ["is"]
}],
"PhraseStartPos": "5",
"PhraseLength": "2",
"Candidates": [],
"Mappings": []
},
{
"PhraseText": "a test with Straße",
"SyntaxUnits": [
{
"SyntaxType": "det",
"LexMatch": "a",
"InputMatch": "a",
"LexCat": "det",
"Tokens": ["a"]
},
{
"SyntaxType": "head",
"LexMatch": "test",
"InputMatch": "test",
"LexCat": "noun",
"Tokens": ["test"]
},
{
"SyntaxType": "prep",
"LexMatch": "with",
"InputMatch": "with",
"LexCat": "prep",
"Tokens": ["with"]
},
{
"SyntaxType": "mod",
"InputMatch": "Straße",
"LexCat": "noun",
"Tokens": ["straße"]
}],
"PhraseStartPos": "8",
"PhraseLength": "18",
"Candidates": [],
"Mappings": [
{
"MappingScore": "-770",
"MappingCandidates": [
{
"CandidateScore": "-770",
"CandidateCUI": "C0022885",
"CandidateMatched": "Laboratory procedures",
"CandidatePreferred": "Laboratory Procedures",
"MatchedWords": ["test"],
"SemTypes": ["lbpr"],
"MatchMaps": [
{
"TextMatchStart": "2",
"TextMatchEnd": "2",
"ConcMatchStart": "1",
"ConcMatchEnd": "1",
"LexVariation": "0"
}],
"IsHead": "yes",
"IsOverMatch": "no",
"Sources": ["SNOMEDCT_US"],
"ConceptPIs": [
{
"StartPos": "10",
"Length": "4"
}],
"Status": "0",
"Negated": "0"
}]
},
{
"MappingScore": "-770",
"MappingCandidates": [
{
"CandidateScore": "-770",
"CandidateCUI": "C0392366",
"CandidateMatched": "Tests (qualifier value)",
"CandidatePreferred": "Tests (qualifier value)",
"MatchedWords": ["test"],
"SemTypes": ["inpr"],
"MatchMaps": [
{
"TextMatchStart": "2",
"TextMatchEnd": "2",
"ConcMatchStart": "1",
"ConcMatchEnd": "1",
"LexVariation": "0"
}],
"IsHead": "yes",
"IsOverMatch": "no",
"Sources": ["SNOMEDCT_US"],
"ConceptPIs": [
{
"StartPos": "10",
"Length": "4"
}],
"Status": "0",
"Negated": "0"
}]
},
{
"MappingScore": "-770",
"MappingCandidates": [
{
"CandidateScore": "-770",
"CandidateCUI": "C0456984",
"CandidateMatched": "Test finding",
"CandidatePreferred": "Test Result",
"MatchedWords": ["test"],
"SemTypes": ["lbtr"],
"MatchMaps": [
{
"TextMatchStart": "2",
"TextMatchEnd": "2",
"ConcMatchStart": "1",
"ConcMatchEnd": "1",
"LexVariation": "0"
}],
"IsHead": "yes",
"IsOverMatch": "no",
"Sources": ["SNOMEDCT_US"],
"ConceptPIs": [
{
"StartPos": "10",
"Length": "4"
}],
"Status": "0",
"Negated": "0"
}]
}]
}]
}]
}
}
]}
Can this be fixed by adjusting the MetaMap API to match the procedure of the MetaMap Online version?