'Extracting data from xml with Python 3

I am trying to extract figures from a series of xml data.

The xml data looks like:

<commentinfo>
  <note>This file contains the sample data for testing</note>
    <comments>
    <comment>
      <name>Romina</name>
      <count>97</count>
</comment>

And so on with a new name and comment.

My code is:

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET

url = 'http://py4e-data.dr-chuck.net/comments_42.xml'

uh = urllib.request.urlopen(url)
data = uh.read()
# print(data)

tree = ET.fromstring(data)
# print('Name:',tree.find('count').text)
lst = tree.findall('comments/comment/count')
# print(len(lst))
# print(lst)
# x1 = result[1].find('comment')

# for item in lst:
#     print('Count', item.find('count').text)

counts = tree.findall('.//count')
print(counts)

When I print counts I get a longer version of:

<Element 'count' at 0x000000000A09FB88>, <Element 'count' at 0x000000000A09FC78>, <Element 'count' at 0x000000000A09FD68>, <Element 'count' at 0x000000000A09FE58>, <Element 'count' at 0x000000000A09FF48>, <Element 'count' at 0x000000000A0A3098>]

I am quite new to this, so I don't understand why I am getting these hex numbers, nor do I know how to extract the actual figures.

I am hoping someone can help.



Solution 1:[1]

Just loop through the list and print the text of each element.

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET

url = 'http://py4e-data.dr-chuck.net/comments_42.xml'

uh = urllib.request.urlopen(url)
data = uh.read()

tree = ET.fromstring(data)

lst = tree.findall('comments/comment/count')

counts = tree.findall('.//count')

for each in counts:
    print(each.text)

Solution 2:[2]

import xml.etree.ElementTree as ET
import urllib.request

url= "http://py4e-data.dr-chuck.net/comments_42.xml"
html = urllib.request.urlopen(url)
data=html.read()
#print(data)
tags=ET.fromstring(data)
lst=tags.findall('comments/comment')
x=0
for item in last:
  element=int((item.find('count').text))
  x=element+x
print(x)

Solution 3:[3]

This was kind of tricky cause you need to start over the sample suggestion just confuses you even more.

from urllib.request import urlopen
import xml.etree.ElementTree as ET
import ssl


url=input('Enter location:')
print('Retrieving...',url)
accumulative=0
XML=urlopen(url).read() #readu url
print('Retreived:',str(len(XML)),'characters')
tree= ET.fromstring(XML)
print(tree)
counts=tree.findall('.//count')
print('Count', str(len(counts)))
list=list()
x=None
for i in counts:
    #option 1 accumulation
    accumulative=accumulative+int(i.text)
    x=int(i.text)
    #option 2 list sum
    list.append(x)

print(accumulative)
print(sum(list))

Solution 4:[4]

This is a slight modification which sums up the count.

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
total= 0

url = 'http://py4e-data.dr-chuck.net/comments_42.xml'
uh = urllib.request.urlopen(url)
data = uh.read()

tree = ET.fromstring(data)

lst = tree.findall('comments/comment/count')

counts = tree.findall('.//count')

total = 0

for count in counts:
    total += int(count.text)

print('total: ', total)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kyle
Solution 2 Shounak Kshirsagar
Solution 3 Giancarlo Meléndez
Solution 4 analystee