Skip to content

XML syntax error on line 213: illegal character code U+000B #11

@SHU-red

Description

@SHU-red

Hello,
i am using xml2map since a while now and it was very robust and stable!

From the very beginning i tried to prepare the xml containing string with the following function strings.ToValidUTF8

Function ToValidUTF8
// ToValidUTF8 returns a copy of the string s with each run of invalid UTF-8 byte sequences
// replaced by the replacement string, which may be empty.
func ToValidUTF8(s, replacement string) string {
	var b Builder

	for i, c := range s {
		if c != utf8.RuneError {
			continue
		}

		_, wid := utf8.DecodeRuneInString(s[i:])
		if wid == 1 {
			b.Grow(len(s) + len(replacement))
			b.WriteString(s[:i])
			s = s[i:]
			break
		}
	}

	// Fast path for unchanged input
	if b.Cap() == 0 { // didn't call b.Grow above
		return s
	}

	invalid := false // previous byte was from an invalid UTF-8 sequence
	for i := 0; i < len(s); {
		c := s[i]
		if c < utf8.RuneSelf {
			i++
			invalid = false
			b.WriteByte(c)
			continue
		}
		_, wid := utf8.DecodeRuneInString(s[i:])
		if wid == 1 {
			i++
			if !invalid {
				invalid = true
				b.WriteString(replacement)
			}
			continue
		}
		invalid = false
		b.WriteString(s[i : i+wid])
		i += wid
	}

	return b.String()
}

I am executing xml2map like this

// Prepare bytes a string
str := string(*b)

// Strip Bad UTF-8
str = strings.ToValidUTF8(str, "")

decoder := xml2map.NewDecoder(strings.NewReader(str))
	result, err := decoder.Decode()
	if err != nil {

		zap.L().Error("Could not unmarshal XML", zap.Error(err), zap.String("XML", str))
		return err
	}

Since a few days it seems that there is a uncaught case i have a hard time to chase down which seems to be causing problems with illegal character code U+000B

Do you have a robust way to make strip out every character xml2map has problems with?

Thanks a lot in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions