'Python Duplicated attributes in XML file
I am new programming Python, and I cant figured out to solve my issue :(.
I would like to know it there is a way to list all the duplicated information on an XML file, I used iter to display the iteration of GroupMap, but now I need to list down only the info that is duplicated.
for dup in root.iter('GroupMap'): print(dup.attrib)
As a result I get the next list with every part of my XML has GroupMap:
<?xml version="1.0"?>
<GroupMapping>
<GroupMap groupN="Q123/Gr01">False</GroupMap>
<GroupMap groupN="Q123/Gr02">False</GroupMap>
<GroupMap groupN="Q123/Gr03">False</GroupMap>
<GroupMap groupN="Q123/Gr04">False</GroupMap>
<GroupMap groupN="Q123/Gr05">False</GroupMap>
<GroupMap groupN="Q123/Gr06">False</GroupMap>
<GroupMap groupN="Q123/Gr01">False</GroupMap>
<GroupMap groupN="Q123/Gr02">False</GroupMap>
<GroupMap groupN="Q123/Gr03">False</GroupMap>
<GroupMap groupN="Q123/Gr04">False</GroupMap>
<GroupMap groupN="Q123/Gr05">False</GroupMap>
<GroupMap groupN="Q123/Gr06">False</GroupMap>
<GroupMap groupN="Q123/Gr01">False</GroupMap>
<GroupMap groupN="Q123/Gr02">False</GroupMap>
<GroupMap groupN="Q123/Gr03">False</GroupMap>
<GroupMap groupN="Q123/Gr04">False</GroupMap>
<GroupMap groupN="Q123/Gr05">False</GroupMap>
<GroupMap groupN="Q123/Gr06">False</GroupMap>
<GroupMap groupN="Q123/Gr01">False</GroupMap>
<GroupMap groupN="Q123/Gr02">False</GroupMap>
<GroupMap groupN="Q123/Gr03">False</GroupMap>
<GroupMap groupN="Q123/Gr04">False</GroupMap>
<GroupMap groupN="Q123/Gr05">False</GroupMap>
<GroupMap groupN="Q123/Gr06">False</GroupMap>
</GroupMapping>
my attempt:
import xml.etree.ElementTree as ET
from tkinter import filedialog
from tkinter import *
root1=Tk()
root1.filename = filedialog.askopenfilename(
initialdir="C:/Users/Administrator/Downloads/Python-XML-Parser-master/Python-XML-Parser-master/Test",
title="Select XML File",
filetypes=(("XML files", ".xml"),("all files", ".*"))
)
tree=ET.parse(root1.filename)
root=tree.getroot()
tag=root.tag
for neighbor in root.iter('GroupMapping'):
print(neighbor.attrib)
Solution 1:[1]
Assuming that the xml file is named as file.xml
:
with open("file.xml") as xml_file:
lines = xml_file.readlines()
processed_lines = set()
for line in lines:
if line in processed_lines:
print(line)
processed_lines.add(line)
This will print all the lines that are duplicate.
In case you need only the unique ones, the set processed_lines
has them.
Solution 2:[2]
I don't understand what the tkinter
code is doing exactly, (I'm not too familiar with that package), so I will ignore it, and show you the raw xml parsing:
from xml.etree import ElementTree as ET
et = ET.parse('path/to/xml_file.xml')
group_set = set()
for group in et.findall('GroupMap'):
group_attr = group.attrib['groupN']
if group_attr in group_set:
print(group_attr)
else:
group_set.add(group_attr)
You can also combine the element groupN
attribute with the corresponding group text, if you only want duplicates that have a different inner text value:
from xml.etree import ElementTree as ET
et = ET.parse('path/to/xml_file.xml')
group_set = set()
for group in et.findall('GroupMap'):
group_obj = (group.attrib['groupN'], group.text)
if group_obj in group_set:
print(group_obj)
else:
group_set.add(group_obj)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | theoctober19th |
Solution 2 | Lord Elrond |