Checking If Xml Declaration Is Present
Solution 1:
tl;dr
from xml.dom.minidom import parseString
defhas_xml_declaration(xml):
return parseString(xml).version
From Wikipedia's XML declaration
If an XML document lacks encoding specification, an XML parser assumes that the encoding is UTF-8 or UTF-16, unless the encoding has already been determined by a higher protocol.
...
The declaration may be optionally omitted because it declares as its encoding the default encoding. However, if the document instead makes use of XML 1.1 or another character encoding, a declaration is necessary. Internet Explorer prior to version 7 enters quirks mode, if it encounters an XML declaration in a document served as text/html
So even if the XML declaration is omitted in an XML document, the code-snippet:
if re.match(r"^<\?xml\s*version=\'1\.0\' encoding=\'utf8\'\s*\?>", xmlFile.decode('utf-8')) isNone:
will find "the" default XML declaration in this XML document. Please note, that I have used xmlFile.decode('utf-8') instead of xmlFile.
If you don't worry to use minidom
, you can use the following code-snippet:
from xml.dom.minidom import parse
dom = parse('bookstore-003.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))
Here is a working fiddle
Int bookstore-001.xml an XML declaration ist present, in bookstore-002.xml no XML declaration ist present and in bookstore-003.xml a different XML declaration than in the first example ist present. The print
instruction prints accordingly the version and the encoding:
<?xml version="1.0" encoding="UTF-8"?><?xml version="None" encoding="None"?><?xml version="1.0" encoding="ISO-8859-1"?>
Post a Comment for "Checking If Xml Declaration Is Present"