Skip to content

CDISC CORE Exits when encounterd non-ASCII Characters #936

@MkJagg

Description

@MkJagg

I have several xpt datasets that use the non ASCII mu character to represent unit i.e. μm.
The controlled term for this unit is “um”.
If I run the datasets through CDISC Core 0.8.1 instead of reporting a non controlled term, CDISC CORE aborts early (it doesn’t check all remaining xpt files in the folder) with the following error.
File "pyreadstat\_readstat_parser.pyx", line 300, in pyreadstat._readstat_parser.convert_readstat_to_python_value
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte
[26824] Failed to execute script 'core' due to unhandled exception!

If I run the same xpt file with the μ removed, CDISC CORE proceeds and outputs as expected.

Expected result: CDISC CORE would detect the non controlled term and report a rule failure for the value. All remaining xpt files in the folder would also be processed. CDISC CORE would not abort early.

Was using standard – sending, version 3.1 but get the same result no matter what version used to check. Also xlsx/json output has no effect. Also with or without CT version has no effect

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requestedreportingOutput reports and error handling

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions