Skip to content

Infinite loop when inline code contains two or more </ #1578

@btucker

Description

@btucker

Text containing two or more `</` sequences causes an infinite loop.

import markdown
markdown.markdown("`</` and `</`")  # Hangs

The same can also be reproduced with opening a second html comment without closing the first, eg:

import markdown
markdown.markdown("<!-- <!--")  # Hangs

Versions

  • Affected: markdown 3.8.2+ (introduced in 9980cb5)
  • Tested on Python 3.10, 3.11, 3.12, 3.13, 3.14

Cause

Commit 9980cb5 added handling for unclosed comments to fix Python 3.14 compatibility. The intent was: when an unclosed comment is detected, emit < as literal text and restart parsing. The implementation adds:

def handle_comment(self, data: str):
    if self.rawdata[i:i + 3] != '-->':  # unclosed comment?
        self.handle_data('<')
        self.override_comment_update = True
        return

def updatepos(self, i: int, j: int) -> int:
    if self.override_comment_update:
        self.override_comment_update = False
        i = 0  # reset to start
        j = 1
    return super().updatepos(i, j)

The problem: resetting to position 0 restarts parsing from the beginning. With two `</`` sequences, the parser:

  1. Hits first </`` → resets to position 1 (after emitting <`)
  2. Continues, hits second `</`` at position 9
  3. Resets to position 1 again
  4. Second `</`` is still at position 9 → infinite loop

Why </`` triggers this: per HTML5, </followed by a non-letter is a "bogus comment" (Python's html.parser usesendtagopen = re.compile('</[a-zA-Z]')). The stdlib HTMLParser.goahead()` flow is:

  1. endtagopen.match() fails for </ not followed by a letter, so parse_endtag() is never called.
  2. The fallback "bogus comment" path calls handle_comment(rawdata[i+2:]) (i.e., data after </).
  3. The custom handle_comment() interprets this as "unclosed comment" (no -->) and sets override_comment_update = True.
  4. updatepos() then resets to (0, 1), which rewinds the scan to the start of the buffer.

So the loop is caused by the global rewind in updatepos(), not by bogus comments themselves. The fix should preserve the "emit < and retry" intent but only rewind to just after the < that triggered the bogus comment (e.g., store the triggering index and advance to index + 1), instead of resetting to the start of the buffer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions