lxml_html_clean changelog
0.4.5 (2026-05-20)
Bugs fixed
Fixed a security vulnerability where
javascript:URLs inxlink:hrefattributes were not sanitized when``safe_attrs_only=False``, allowing cross-site scripting (XSS) attacks. The fix requireslxml>=6.1.1, which addsxlink:hrefto the set of link attributes iterated byrewrite_links(). Reported by Guillem Lefait (@glefait).
0.4.4 (2026-02-26)
Bugs fixed
Fixed a bug where Unicode escapes in CSS were not properly decoded before security checks. This prevents attackers from bypassing filters using escape sequences. (CVE-2026-28348)
Fixed a security issue where
<base>tags could be used for URL hijacking attacks. The<base>tag is now automatically removed whenever the<head>tag is removed (viapage_structure=Trueor manual configuration), as<base>must be inside<head>according to HTML specifications. (CVE-2026-28350)
0.4.3 (2025-10-02)
Maintenance
Tests updated to work correctly with new lxml and libxml2 releases.
Python 3.6 and 3.7 are no longer tested.
Improved documentation about CSS removal behavior.
0.4.2 (2025-04-09)
Bugs fixed
lxml_html_clean now correctly handles HTML input as bytes as it did before the 0.2.0 release.
0.4.1 (2024-11-15)
Bugs fixed
Removed superfluous debug prints.
0.4.0 (2024-11-12)
Bugs fixed
The
Cleaner()now scans for hidden JavaScript code embedded within CSS comments. In certain contexts, such as within<svg>or<math>tags,<style>tags may lose their intended function, allowing comments like/* foo */to potentially be executed by the browser. If a suspicious content is detected, only the comment is removed. (CVE-2024-52595)
0.3.1 (2024-10-09)
Features added
Do not parse URL addresses when it is not necessary.
0.3.0 (2024-10-09)
Features added
Parsing of URL addresses has been enhanced and Cleaner removes ambiguous URLs.
0.2.2 (2024-08-30)
Bugs fixed
sdist now includes all test files and changelog.
0.2.1 (2024-08-29)
Bugs fixed
Memory efficiency is now much better for HTML pages where cleaner removes a lot of elements. (#14)
0.2.0 (2024-07-29)
Features added
ASCII control characters (except HT, VT, CR and LF) are now removed from string inputs before they’re parsed by lxml/libxml2.
0.1.1 (2024-04-05)
Bugs fixed
Regular expresion for image data URLs now supports multiple data URLs on a single line.
0.1.0 (2024-02-26)
First official release of the split project.
Relevant changes from lxml project before the split
This part contains releases of lxml project containing important changes related to HTML Cleaner functionalities.
5.1.0 (2024-01-05)
Bugs fixed
The HTML
Cleaner()interpreted an accidentally provided string parameter for thehost_whitelistas list of characters and silently failed to reject any hosts. Passing a non-collection is now rejected.
4.9.3 (2023-07-05)
Bugs fixed
A memory leak in
lxml.html.cleanwas resolved by switching to Cython 0.29.34+.URL checking in the HTML cleaner was improved. Patch by Tim McCormack.
4.6.5 (2021-12-12)
Bugs fixed
A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script content through SVG images (CVE-2021-43818).
A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script content through CSS imports and other crafted constructs (CVE-2021-43818).
4.6.3 (2021-03-21)
Bugs fixed
A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5
formactionattribute.
4.6.2 (2020-11-26)
Bugs fixed
A vulnerability (CVE-2020-27783) was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky “style” content.
4.6.1 (2020-10-18)
Bugs fixed
A vulnerability was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky “style” content.
4.5.2 (2020-07-09)
Bugs fixed
Cleaner()now validates that only known configuration options can be set.Cleaner.clean_html()discarded comments and PIs regardless of the corresponding configuration option, ifremove_unknown_tagswas set.
4.2.5 (2018-09-09)
Bugs fixed
Javascript URLs that used URL escaping were not removed by the HTML cleaner. Security problem found by Omar Eissa. (CVE-2018-19787)
4.0.0 (2017-09-17)
Features added
The modules
lxml.builder,lxml.html.diffandlxml.html.cleanare also compiled using Cython in order to speed them up.