XML: valid XML output

CDATA  Sections are used to escape blocks of text containing characters which would otherwise be recognized as markup. All tags and entity references are ignored by an XML processor that treats them just like any character data. CDATA blocks have been provided as a convenience measure when you want to include large blocks of special characters as character data, but you do not want to have to use entity references all the time. For example writing a tutorial about XML  would contain:

In XML you need elements which have a starting tag <song> and end tag </song>

The markup for this sentence would be:

<div>In XML you need elements which have a starting tag &lt;song&gt; and end tag &lt;/song&gt;</div>

In order to avoid this misconvenience, XML has a method to treat markup as text (or CDATA). This is done by simply enclosing the text with markup that we want to displayed (not interpreted) in a CDATA element:

<!CDATA[In XML you need elements which have a starting tag <song> and end stag </song>]]>

Between the start of the section, <![CDATA[ and the end of the section, ]]>, all character data is passed directly to the application. The only string that cannot occur in a CDATA section is ]]>. Comments are not recognized in a CDATA section. If present, the literal text will be passed directly to the application. The character string ]]> is not allowed within a CDATA block as it would signal the end of the CDATA block. CDATA does not work in HTML.

The XML specification does not use the term character entity or character entity reference. The XML specification defines five predefined entities representing special characters, and requires that all XML processors honor them. The list below lists the five XML predefined entities. The Name column mentions the entity’s name. The Character column shows the character. To render the character, the format &name; is used; for example, & renders as &.

quot 	"
amp 	&
apos 	'
lt 	< 	
gt 	>

In order to have a valid final XML text, you need to escape all XML entities and have the text written in the same encoding as the XML document processing-instruction states it (the “encoding” in the

function xml_escape($s) {
    $s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');
    $s = htmlspecialchars($s, ENT_QUOTES, 'UTF-8', false);
    return $s;