This repository was archived by the owner on Sep 24, 2023. It is now read-only.
Releases: bmjcode/pywebarchive
Releases · bmjcode/pywebarchive
Version 0.5.2
Version 0.5.1
Stable release.
Fixed
- Document the function of the
WebResource.frame_nameproperty.
Version 0.5.0
Stable release.
Added
- More complete documentation for the
WebArchiveandWebResourceclasses. - Documentation on pywebarchive's internals.
- Unit test for subresource URLs occurring as literal text.
Changed
- Massively overhaul the README.
- Improved the documentation for the
webarchivemodule. - Expanded and clarified various code comments.
- Use a
withclause for proper cleanup in test/extracted_archive_display.py. - Rename
WebArchive.extract()'ssingle_fileargument to the more descriptiveembed_subresources(potentially backwards-incompatible change).
Fixed
- Raise a
WebArchiveErrorwhen attempting to extract a webarchive with no main resource. - Raise a
WebArchiveErrorwhen attempting to convert a webarchive with no main resource to HTML. - Return the correct value for
WebArchive.resource_count()if no main resource is present.
Removed
- The unnecessary
<!-- Processed by pywebarchive -->tag previously added to extracted pages.
Version 0.4.1
Beta "I can't believe I missed that!" release.
In keeping with this project's long tradition of sloppiness, this release was rushed out to fix a single missing line of code that does nothing now, but whose absence would cause trouble if the function it calls is ever implemented.
Some more interesting changes happened in version 0.4.0, including the addition of context manager (with statement) support.
Fixed
- Call
close()inWebArchive.__exit__().
Version 0.4.0
Beta release.
Added
- Context manager (
withstatement) support in theWebArchiveclass. - The
WebArchive.close()method. - The
WebArchive.parentproperty. - Support for the
modeargument inwebarchive.open()(though only read mode remains implemented).
Changed
- Further cleaned up internal APIs.
- Improved module documentation.
Fixed
- Ensure an encoding is always specified when creating a text
WebResource. - Removed duplicated code in test/extracted_archive_display.py.
Version 0.3.3
Beta bugfix release.
Added
- Unit tests for HTML- and CSS-rewriting logic.
- Build script for the Windows version of Webarchive Extractor.
Changed
- Clean up the
WebResourceclass's internal API. - Do not force a newline after the doctype in
HTMLRewriter.handle_decl(). - Moved
test_extracted_archive_displayfrom the unit tests to a separate script. - Removed
test_extracted_archive_display's dependency on Tkinter.
Fixed
- Rewrite URLs in inline CSS code when extracting.
Version 0.3.2
Beta bugfix release.
Added
- The module version number in
webarchive.__version__. - Initial support for command-line arguments in
extractor-gui.py. - The
--versionargument inextractor.pyandextractor-gui.py.
Changed
- Further code cleanup.
- Give more descriptive names to various internals.
Fixed
- Support HTML subresources.
- Handle non-HTML subresources incorrectly served as
text/html. - Update the module description in
setup.pyto match its documentation. - Specify a text encoding in
WebArchiveTest.test_webarchive_to_html()so the test will pass on Windows. - Make
webbrowseran optional dependency inextractor.pyto matchextractor-gui.py.
Version 0.3.1
Beta bugfix release.
Added
- Unit test for
WebArchive.to_html().
Changed
- Massively expanded module documentation.
- Don't delete the
srcsetattribute from<img>. - Embed style sheets in single-file mode using data URIs rather than
<style>. - Cleaned up various internals.
Fixed
- Handle
srcsetentries without a width or pixel density descriptor. - Embed subresources recursively when calling
WebResource.to_data_uri()on an archive's main resource. - Don't escape HTML entities in a
<script>or<style>block. - Correctly handle non-HTML main resources.
Version 0.3.0
Beta release.
Added
- Experimental support for extracting webarchives to single-file HTML documents.
- External scripts and style sheets are replaced with inline content.
- External images are embedded using data URIs.
- New command-line options for
extractor.py:-s/--single-fileto extract archive contents to a single HTML file.-o/--open-pageto open the extracted webpage when finished.
- New
WebArchiveclass methods:get_local_path()returns the basename of the file created when a specified subresource is extracted.get_subframe_archive()returns the subframe archive corresponding to a specified URL.get_subresource()returns the subresource corresponding to a specified URL.to_html()returns the archive's contents as a single-file HTML document.
- The
WebResource.archiveproperty, which identifies a given resource's parentWebArchive. - The
WebArchiveErrorexception.
Changed
- Moved the development status up to beta.
Fixed
- Correctly handle "empty" tags like
<img />in XHTML documents. - Fixed local resource paths for extracted subframe archives.
Removed
- The
Extractorclass, included only for backwards compatibility with the poorly thought-out 0.1.0 API.
Version 0.2.4
Alpha-quality code cleanup release
- Added unit tests
- Use
webbrowser.open()inextractor-gui.pyfor improved portability