-
Notifications
You must be signed in to change notification settings - Fork 434
Description
For XHTML output, it is an error to generate entity references (except the standard XML ones) as they are not guaranteed to work. Indeed https://html.spec.whatwg.org/multipage/xhtml.html#writing-xhtml-documents says:
According to XML, XML processors are not guaranteed to process the external DTD subset referenced in the DOCTYPE. This means, for example, that using entity references for characters in XML documents is unsafe if they are defined in an external file (except for
<
,>
,&
,"
, and'
).
Numeric character references should be generated instead, i.e. when XHTML output is generated, numeric-entities should be ignored (or at least, it should default to "yes").
An example of the problem:
echo "é" | tidy -ascii -asxhtml
outputs
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.8.0" />
<title></title>
</head>
<body>
é
</body>
</html>
Note: web browsers should have no problems with entity references, but remember that one of the purposes of XHTML (over HTML) is to manipulate the files with XML tools.
My Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607066 (-ascii
now needs to be added in the example, as done above).
Reporting this bug, following #1146.