'How can I compare the contents of two text files in python
Hello i need to write a script like a wrote in title so i want to give an example for what i want :
file1.txt's content: New York Los Angeles Miami
file2.txt's content: New York Orlando Miami Dc
i just want to compare two diffrent txts and print the diffrent added or missed elements
if you dont understand what i mean my ex-code is here :
from difflib import Differ
from numpy import diff
myfile1 = input("Enter First File's name for compare : ")
myfile2 = input("Enter Second File's name for compare : ")
ch1 = myfile1.split(".")
ch2 = myfile2.split(".")
if ch1[1] == "txt" and ch2[1] == "txt":
with open(myfile1) as file_1, open(myfile2) as file_2:
differ = Differ()
for line in differ.compare(file_1.readlines(), file_2.readlines()):
print(line)
else:
print("File format Eror !")
Solution 1:[1]
If you want to compare the single characters you can iterate over them:
with open("file1.txt", 'r') as file: # Same thing with file2
content1 = file.read()
...
Like this:
min_len = min(map(len, (content1, content2)))
for i in range(min_len): # use smaller length
if (content1[i] != content2[i]):
# You found a difference between this two characthers
# Do something
# content1 has some extra from content1[min_len:], so you do something with it
If you want to compare the characters in the words you will have to split
the input before:
content1 = file.read().split(' ')
Solution 2:[2]
first read all lines of files
with open('file1.txt') as f1:
a = f1.readlines()
with open('file2.txt') as f2:
b = f2.readlines()
for reading files in python 3.10 or higher
with (
open('file1.txt') as f1,
open('file2.txt') as f2,
):
a = f1.readlines()
b = f2.readlines()
and now for print differences between file a
and b
import difflib
a_sample = a[0] # 'New York Los Angeles Miami'
b_sample = b[0] # 'New York Orlando Miami Dc'
diff = difflib.ndiff(a.replace(' ', '\n').splitlines(keepends=True), b.replace(' ', '\n').splitlines(keepends=True))
print(''.join(diff), end="")
New
York
+ Orlando
- Los
- Angeles
- Miami+ Miami
? +
+ Dc
and iterate all the files:
for file1_line, file2_line in zip(a, b):
diff = difflib.ndiff(
a.replace(' ', '\n').splitlines(keepends=True),
b.replace(' ', '\n').splitlines(keepends=True)
)
print(''.join(diff), end="")
What's the meaning difflib symbols:
code | meaning |
---|---|
'- ' | line unique to sequence 1 |
'+ ' | line unique to sequence 2 |
' ' | line common to both sequences |
'? ' | line not present in either input sequence |
Note: you can iterate in diff output and print only +
or -
words.
python document: https://docs.python.org/3/library/difflib.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Cresht |
Solution 2 |