Skip to content

tidy should not generate entity references (except standard XML ones) for XHTML output (-asxhtml) #1148

@vinc17fr

Description

@vinc17fr

For XHTML output, it is an error to generate entity references (except the standard XML ones) as they are not guaranteed to work. Indeed https://html.spec.whatwg.org/multipage/xhtml.html#writing-xhtml-documents says:

According to XML, XML processors are not guaranteed to process the external DTD subset referenced in the DOCTYPE. This means, for example, that using entity references for characters in XML documents is unsafe if they are defined in an external file (except for <, >, &, ", and ').

Numeric character references should be generated instead, i.e. when XHTML output is generated, numeric-entities should be ignored (or at least, it should default to "yes").

An example of the problem:

echo "é" | tidy -ascii -asxhtml

outputs

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.8.0" />
<title></title>
</head>
<body>
&eacute;
</body>
</html>

Note: web browsers should have no problems with entity references, but remember that one of the purposes of XHTML (over HTML) is to manipulate the files with XML tools.

My Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607066 (-ascii now needs to be added in the example, as done above).

Reporting this bug, following #1146.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions