Bug: Incorrect date_published when parsing valid <abbr class="published">
Description:
When parsing this page:
👉 https://www.progressive-charlestown.com/2011/04/peeps-wrap-up-for-2011.html
The parser returns:
"date_published": "2025-05-15T00:23:00.000Z"
However, the HTML clearly includes:
< abbr class='published' title='2011-04-25T00:23:00-04:00'>12:23:00 AM< /abbr >
This means the correct UTC datetime would be:
"date_published": "2011-04-25T04:23:00.000Z"
It seems the parser extracts the time from but incorrectly replaces the date with the current system date.
Expected behavior
The parser should correctly parse both date and time from the title attribute in , not just the time part.
Steps to reproduce
Use the latest version of the parser (npm or hosted) and parse the provided URL.
Environment:
Parser version: latest (GitHub)
Runtime: Node.js
Used via: Node script / Web API