Skip to content Skip to sidebar Skip to footer

Checking If Xml Declaration Is Present

I am trying to check whether an xml file contains the necessary xml declaration ('header'), let's say: ...rest of xml file... I am us

Solution 1:

tl;dr

from xml.dom.minidom import parseString
defhas_xml_declaration(xml):
    return parseString(xml).version

From Wikipedia's XML declaration

If an XML document lacks encoding specification, an XML parser assumes that the encoding is UTF-8 or UTF-16, unless the encoding has already been determined by a higher protocol.

...

The declaration may be optionally omitted because it declares as its encoding the default encoding. However, if the document instead makes use of XML 1.1 or another character encoding, a declaration is necessary. Internet Explorer prior to version 7 enters quirks mode, if it encounters an XML declaration in a document served as text/html

So even if the XML declaration is omitted in an XML document, the code-snippet:

if re.match(r"^<\?xml\s*version=\'1\.0\' encoding=\'utf8\'\s*\?>", xmlFile.decode('utf-8')) isNone:

will find "the" default XML declaration in this XML document. Please note, that I have used xmlFile.decode('utf-8') instead of xmlFile. If you don't worry to use minidom, you can use the following code-snippet:

from xml.dom.minidom import parse

dom = parse('bookstore-003.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))

Here is a working fiddle Int bookstore-001.xml an XML declaration ist present, in bookstore-002.xml no XML declaration ist present and in bookstore-003.xml a different XML declaration than in the first example ist present. The print instruction prints accordingly the version and the encoding:

<?xml version="1.0" encoding="UTF-8"?><?xml version="None" encoding="None"?><?xml version="1.0" encoding="ISO-8859-1"?>

Post a Comment for "Checking If Xml Declaration Is Present"