All notable changes to Nokogumbo will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Support Mageia distros when libxml2/libxslt system libraries are install. #165 (Thank you, @pterjan!)
- Forward-looking support for a version of Nokogiri that will provide HTML5 parsing. #171
- Update extconf.rb to use Nokogiri v1.11's CPPFLAGS for more reliable installation. #163
- Fixed a bug where
Nokogiri::HTML5.fragment(nil)
would raise an error. Now it returns an emptyDocumentFragment
like it did in v2.0.2. - Fixed assertion failure when a tag immediately followed the UTF-8 BOM.
- Limit enforced on number of attributes per element, defaulting to 400 and
configurable with the
:max_attributes
argument.
- Ignore UTF-8 byte order mark at the beginning of the input.
- Fix content sniffing for Unicode strings.
- Fixed crash where Ruby objects constructed in C can be garbage collected.
- Support Ruby 2.6
- Fix assertion failures with nonstandard HTML tags.
- Fix the handling of mis-nested formatting tags (the adoption agency algorithm).
- Fix crash with zero-length HTML tags.
- Prevent 1-byte buffer over read when constructing an error message about an unexpected EOF.
- Fix line numbers on elements from
#line
.
- Experimental support for errors (it was supported in 1.5.0 but undocumented).
- Added proper HTML5 serialization.
- Added option
:max_errors
to control the maximum number of errors reported by#errors
. - Added option
:max_tree_depth
to control the maximum parse tree depth. - Line number support via
Nokogiri::XML::Node#line
as long as Nokogumbo has been compiled with libxml2 support.
- Integrated Gumbo parser into Nokogumbo. A system version will not be used.
- The undocumented (but publicly mentioned)
:max_parse_errors
renamed to:max_errors
;:max_parse_errors
is deprecated and will go away - The various
#parse
and#fragment
(andNokogiri.HTML5
) methods returnNokogiri::HTML5::Document
andNokogiri::HTML5::DocumentFragment
classes rather thanNokogiri::HTML::Document
andNokogiri::HTML::DocumentFragment
. - Changed the top-level API to more closely match Nokogiri's while maintaining
backwards compatibility. The new APIs are
Nokogiri::HTML5(html, url = nil, encoding = nil, **options, &block)
Nokogiri::HTML5.parse(html, url = nil, encoding = nil, **options, &block)
Nokogiri::HTML5::Document.parse(html, url = nil, encoding = nil, **options, &block)
Nokogiri::HTML5.fragment(html, encoding = nil, **options)
Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, **options)
Nokogiri::HTML5::DocumentFragment.new(document, html = nil, ctx = nil)
Nokogiri::HTML5::Document#fragment(html = nil)
Nokogiri::XML::Node#fragment(html = nil)
In all cases,html
can be a string or anIO
object (something that responds to#read
). Theurl
parameter is entirely for error reporting, as in Nokogiri. Theencoding
parameter only signals what encodinghtml
should have on input; the outputDocument
orDocumentFragment
will be in UTF-8. Currently, the only options supported are:max_errors
which controls the maximum number of reported by#errors
.
- Minimum supported version of Ruby changed to 2.1.
- Minimum supported version of Nokogiri changed to 1.8.0.
Nokogiri::HTML5::DocumentFragment#errors
returns errors for the document fragment itself, not the underlying document.- The five XML namespaces described in the HTML spec, MathML, SVG, XLink, XML,
and XMLNS, are now supported. Thus
<svg>
will create ansvg
element in the SVG namespace and<math>
will create amath
element in the MathML namespace. An attributexml:lang=en
, for example, will create alang
attribute in the XML namespace, but only in foreign elements (i.e., those in the SVG or MathML namespaces). On HTML elements, this creates an attribute with the namexml:lang
. This changes the#xpath
and related APIs. - Calling
#to_xml
on aNokogiri::HTML5::Document
will produce XML output rather than HTML.
:max_parse_errors
; use:max_errors
- Fixed documents failing to serialize (via
to_html
) if they contain certainmeta
elements that set thecharset
. - Documents are now properly marked as UTF-8 after parsing.
- Fixed
Nokogiri::HTML5.fragment
reporting an error due to a missing<!DOCTYPE html>
. - Fixed crash when input contains U+0000 NULL bytes and error reporting is enabled.
- The most recent, released version of Gumbo has a potential security issue that could result in a cross-site scripting vulnerability. This has been fixed by integrating Gumbo into Nokogumbo.