'Python Duplicated attributes in XML file

I am new programming Python, and I cant figured out to solve my issue :(.

I would like to know it there is a way to list all the duplicated information on an XML file, I used iter to display the iteration of GroupMap, but now I need to list down only the info that is duplicated.

for dup in root.iter('GroupMap'): print(dup.attrib)

As a result I get the next list with every part of my XML has GroupMap:

<?xml version="1.0"?>
<GroupMapping>
  <GroupMap groupN="Q123/Gr01">False</GroupMap>
  <GroupMap groupN="Q123/Gr02">False</GroupMap>
  <GroupMap groupN="Q123/Gr03">False</GroupMap>
  <GroupMap groupN="Q123/Gr04">False</GroupMap>
  <GroupMap groupN="Q123/Gr05">False</GroupMap>
  <GroupMap groupN="Q123/Gr06">False</GroupMap>
  <GroupMap groupN="Q123/Gr01">False</GroupMap>
  <GroupMap groupN="Q123/Gr02">False</GroupMap>
  <GroupMap groupN="Q123/Gr03">False</GroupMap>
  <GroupMap groupN="Q123/Gr04">False</GroupMap>
  <GroupMap groupN="Q123/Gr05">False</GroupMap>
  <GroupMap groupN="Q123/Gr06">False</GroupMap>
  <GroupMap groupN="Q123/Gr01">False</GroupMap>
  <GroupMap groupN="Q123/Gr02">False</GroupMap>
  <GroupMap groupN="Q123/Gr03">False</GroupMap>
  <GroupMap groupN="Q123/Gr04">False</GroupMap>
  <GroupMap groupN="Q123/Gr05">False</GroupMap>
  <GroupMap groupN="Q123/Gr06">False</GroupMap>
  <GroupMap groupN="Q123/Gr01">False</GroupMap>
  <GroupMap groupN="Q123/Gr02">False</GroupMap>
  <GroupMap groupN="Q123/Gr03">False</GroupMap>
  <GroupMap groupN="Q123/Gr04">False</GroupMap>
  <GroupMap groupN="Q123/Gr05">False</GroupMap>
  <GroupMap groupN="Q123/Gr06">False</GroupMap>
</GroupMapping>

my attempt:

import xml.etree.ElementTree as ET 
from tkinter import filedialog 
from tkinter import * 

root1=Tk() 
root1.filename = filedialog.askopenfilename(
    initialdir="C:/Users/Administrator/Downloads/Python-XML-Parser-master/Python-XML-Parser-master/Test", 
    title="Select XML File", 
    filetypes=(("XML files", ".xml"),("all files", ".*"))
) 
tree=ET.parse(root1.filename) 
root=tree.getroot() 
tag=root.tag 
for neighbor in root.iter('GroupMapping'): 
    print(neighbor.attrib) 


Solution 1:[1]

Assuming that the xml file is named as file.xml:

with open("file.xml") as xml_file:
    lines = xml_file.readlines()
    processed_lines = set()
    for line in lines:
        if line in processed_lines:
            print(line)
        processed_lines.add(line)

This will print all the lines that are duplicate.

In case you need only the unique ones, the set processed_lines has them.

Solution 2:[2]

I don't understand what the tkinter code is doing exactly, (I'm not too familiar with that package), so I will ignore it, and show you the raw xml parsing:

from xml.etree import ElementTree as ET

et = ET.parse('path/to/xml_file.xml')
group_set = set()

for group in et.findall('GroupMap'):
    group_attr = group.attrib['groupN']
    if group_attr in group_set:
        print(group_attr)
    else:
        group_set.add(group_attr)

You can also combine the element groupN attribute with the corresponding group text, if you only want duplicates that have a different inner text value:

from xml.etree import ElementTree as ET

et = ET.parse('path/to/xml_file.xml')
group_set = set()

for group in et.findall('GroupMap'):
    group_obj = (group.attrib['groupN'], group.text)
    if group_obj in group_set:
        print(group_obj)
    else:
        group_set.add(group_obj)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 theoctober19th
Solution 2 Lord Elrond