-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Hello,
i am using xml2map since a while now and it was very robust and stable!
From the very beginning i tried to prepare the xml containing string with the following function strings.ToValidUTF8
Function ToValidUTF8
// ToValidUTF8 returns a copy of the string s with each run of invalid UTF-8 byte sequences
// replaced by the replacement string, which may be empty.
func ToValidUTF8(s, replacement string) string {
var b Builder
for i, c := range s {
if c != utf8.RuneError {
continue
}
_, wid := utf8.DecodeRuneInString(s[i:])
if wid == 1 {
b.Grow(len(s) + len(replacement))
b.WriteString(s[:i])
s = s[i:]
break
}
}
// Fast path for unchanged input
if b.Cap() == 0 { // didn't call b.Grow above
return s
}
invalid := false // previous byte was from an invalid UTF-8 sequence
for i := 0; i < len(s); {
c := s[i]
if c < utf8.RuneSelf {
i++
invalid = false
b.WriteByte(c)
continue
}
_, wid := utf8.DecodeRuneInString(s[i:])
if wid == 1 {
i++
if !invalid {
invalid = true
b.WriteString(replacement)
}
continue
}
invalid = false
b.WriteString(s[i : i+wid])
i += wid
}
return b.String()
}I am executing xml2map like this
// Prepare bytes a string
str := string(*b)
// Strip Bad UTF-8
str = strings.ToValidUTF8(str, "")
decoder := xml2map.NewDecoder(strings.NewReader(str))
result, err := decoder.Decode()
if err != nil {
zap.L().Error("Could not unmarshal XML", zap.Error(err), zap.String("XML", str))
return err
}Since a few days it seems that there is a uncaught case i have a hard time to chase down which seems to be causing problems with illegal character code U+000B
Do you have a robust way to make strip out every character xml2map has problems with?
Thanks a lot in advance
Metadata
Metadata
Assignees
Labels
No labels