-
Notifications
You must be signed in to change notification settings - Fork 894
Description
Text containing two or more `</` sequences causes an infinite loop.
import markdown
markdown.markdown("`</` and `</`") # HangsThe same can also be reproduced with opening a second html comment without closing the first, eg:
import markdown
markdown.markdown("<!-- <!--") # HangsVersions
- Affected: markdown 3.8.2+ (introduced in 9980cb5)
- Tested on Python 3.10, 3.11, 3.12, 3.13, 3.14
Cause
Commit 9980cb5 added handling for unclosed comments to fix Python 3.14 compatibility. The intent was: when an unclosed comment is detected, emit < as literal text and restart parsing. The implementation adds:
def handle_comment(self, data: str):
if self.rawdata[i:i + 3] != '-->': # unclosed comment?
self.handle_data('<')
self.override_comment_update = True
return
def updatepos(self, i: int, j: int) -> int:
if self.override_comment_update:
self.override_comment_update = False
i = 0 # reset to start
j = 1
return super().updatepos(i, j)The problem: resetting to position 0 restarts parsing from the beginning. With two `</`` sequences, the parser:
- Hits first
</`` → resets to position 1 (after emitting<`) - Continues, hits second `</`` at position 9
- Resets to position 1 again
- Second `</`` is still at position 9 → infinite loop
Why </`` triggers this: per HTML5, </followed by a non-letter is a "bogus comment" (Python's html.parser usesendtagopen = re.compile('</[a-zA-Z]')). The stdlib HTMLParser.goahead()` flow is:
endtagopen.match()fails for</not followed by a letter, soparse_endtag()is never called.- The fallback "bogus comment" path calls
handle_comment(rawdata[i+2:])(i.e., data after</). - The custom
handle_comment()interprets this as "unclosed comment" (no-->) and setsoverride_comment_update = True. updatepos()then resets to(0, 1), which rewinds the scan to the start of the buffer.
So the loop is caused by the global rewind in updatepos(), not by bogus comments themselves. The fix should preserve the "emit < and retry" intent but only rewind to just after the < that triggered the bogus comment (e.g., store the triggering index and advance to index + 1), instead of resetting to the start of the buffer.