'Extracting data from xml with Python 3
I am trying to extract figures from a series of xml data.
The xml data looks like:
<commentinfo>
<note>This file contains the sample data for testing</note>
<comments>
<comment>
<name>Romina</name>
<count>97</count>
</comment>
And so on with a new name and comment.
My code is:
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
url = 'http://py4e-data.dr-chuck.net/comments_42.xml'
uh = urllib.request.urlopen(url)
data = uh.read()
# print(data)
tree = ET.fromstring(data)
# print('Name:',tree.find('count').text)
lst = tree.findall('comments/comment/count')
# print(len(lst))
# print(lst)
# x1 = result[1].find('comment')
# for item in lst:
# print('Count', item.find('count').text)
counts = tree.findall('.//count')
print(counts)
When I print counts
I get a longer version of:
<Element 'count' at 0x000000000A09FB88>, <Element 'count' at 0x000000000A09FC78>, <Element 'count' at 0x000000000A09FD68>, <Element 'count' at 0x000000000A09FE58>, <Element 'count' at 0x000000000A09FF48>, <Element 'count' at 0x000000000A0A3098>]
I am quite new to this, so I don't understand why I am getting these hex numbers, nor do I know how to extract the actual figures.
I am hoping someone can help.
Solution 1:[1]
Just loop through the list and print the text of each element.
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
url = 'http://py4e-data.dr-chuck.net/comments_42.xml'
uh = urllib.request.urlopen(url)
data = uh.read()
tree = ET.fromstring(data)
lst = tree.findall('comments/comment/count')
counts = tree.findall('.//count')
for each in counts:
print(each.text)
Solution 2:[2]
import xml.etree.ElementTree as ET
import urllib.request
url= "http://py4e-data.dr-chuck.net/comments_42.xml"
html = urllib.request.urlopen(url)
data=html.read()
#print(data)
tags=ET.fromstring(data)
lst=tags.findall('comments/comment')
x=0
for item in last:
element=int((item.find('count').text))
x=element+x
print(x)
Solution 3:[3]
This was kind of tricky cause you need to start over the sample suggestion just confuses you even more.
from urllib.request import urlopen
import xml.etree.ElementTree as ET
import ssl
url=input('Enter location:')
print('Retrieving...',url)
accumulative=0
XML=urlopen(url).read() #readu url
print('Retreived:',str(len(XML)),'characters')
tree= ET.fromstring(XML)
print(tree)
counts=tree.findall('.//count')
print('Count', str(len(counts)))
list=list()
x=None
for i in counts:
#option 1 accumulation
accumulative=accumulative+int(i.text)
x=int(i.text)
#option 2 list sum
list.append(x)
print(accumulative)
print(sum(list))
Solution 4:[4]
This is a slight modification which sums up the count.
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
total= 0
url = 'http://py4e-data.dr-chuck.net/comments_42.xml'
uh = urllib.request.urlopen(url)
data = uh.read()
tree = ET.fromstring(data)
lst = tree.findall('comments/comment/count')
counts = tree.findall('.//count')
total = 0
for count in counts:
total += int(count.text)
print('total: ', total)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | kyle |
Solution 2 | Shounak Kshirsagar |
Solution 3 | Giancarlo Meléndez |
Solution 4 | analystee |