'Include one XML within another XML and parse it with python

I wanted to include an XML file in another XML file and parse it with python. I am trying to achieve it through Xinclude. There is a file1.xml which looks like

<?xml version="1.0"?>
<root>
  <document xmlns:xi="http://www.w3.org/2001/XInclude">
     <xi:include href="file2.xml" parse="xml" />
  </document>
  <test>some text</test>
</root>

and file2.xml which looks like

<para>This is a paragraph.</para>

Now in my python code i tried to access it like:

from xml.etree import ElementTree, ElementInclude

tree = ElementTree.parse("file1.xml")
root = tree.getroot()
for child in root.getchildren():
    print child.tag

It prints the tag of all child elements of root

document
test

Now when i tries to print the child objects directly like

print root.document
print root.test

It says the root doesnt have children named test or document. Then how am i suppose to access the content in file2.xml?

I know that I can access the XML elements from python with schema like:

    schema=etree.XMLSchema(objectify.fromstring(configSchema))
    xmlParser = objectify.makeparser(schema = schema)
    cfg = objectify.fromstring(xmlContents, xmlParser)
    print cfg.elemetName # access element

But since here one XML file is included in another, I am confused how to write the schema. How can i solve it?



Solution 1:[1]

Not sure why you want to use XInclude, but including an XML file in another one is a basic mechanism of SGML and XML, and can be achieved without XInclude as simple as:

<!DOCTYPE root [
  <!ENTITY externaldoc SYSTEM "file2.xml">
]>
<root>
  <document>
    &externaldoc;
  </document>
  <test>some text</test>
</root>

Solution 2:[2]

Below

import xml.etree.ElementTree as ET


xml1 = '''<?xml version="1.0"?>
<root>
  <test>some text</test>
</root>'''

xml2 = '''<para>This is a paragraph.</para>'''

root1 = ET.fromstring(xml1)
root2 = ET.fromstring(xml2)

root1.insert(0,root2)

para_value = root1.find('.//para').text
print(para_value)

output

This is a paragraph.

Solution 3:[3]

You need to make xml.etree to include the files referenced with xi:include. I have added the key line to your original example:

from xml.etree import ElementTree, ElementInclude

tree = ElementTree.parse("file1.xml")
root = tree.getroot()

#here you make the parser actually include every referenced file
ElementInclude.include(root)

#and now you are good to go
for child in root.getchildren():
    print child.tag

For a detailed reference about includes in python, see the includes section in the official Python documentation https://docs.python.org/3/library/xml.etree.elementtree.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 imhotap
Solution 2
Solution 3 Manolo Conesa