'Best way to compare two large files on multiple columns
I am working on a feature which will allow users to upload two csv files, write the rules to compare the rows and output a result into a file.
Both files can have any number of columns and the columns name are also not fixed.
Currently, I read the files into two separate arrays and compare the rows based on the condition given in the rule.
This works for smaller files but for large ones, it takes a lot of time and memory to do the comparison.
Is there a better way where a DB can be utilized for storing and querying on schema-less data?
Example Data:
File1
type id date amount
A 1 12/10/2005 500
B 2 12/10/2005 500
File2
type id date amount
A 1 12/10/2005 500
B 2 12/10/2005 500
A 1 12/10/2005 500
Rule1 File1.type == File2.type && File1.amount == File2.amount
Rule2 File1.id == GroupBy(File2.id) && File1.amount == File2.TotalAmount
The match condition will be = Rule1 or Rule2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|