'Grouping duplicates in CSV file and ranking data based on certain values

I have a CSV file like so -

"user_id","age","liked_ad","location"
2145,34,true,USA
6786,25,true,UK
9025,21,false,USA
1145,40,false,UK

The csv file goes on. I worked out that there are duplicate user_id's within the file and so what I am trying to do is find out which users have the most 'true' answers for the 'liked_ads' column. I am super stuck on how to do this in Java and would appreciate any help.

This is what I have so far to literally just parse the file -

    public static void main(String[] args) throws FileNotFoundException
    {
        Scanner scanner = new Scanner(new File("src/main/resources/advert-data.csv"));

        scanner.useDelimiter(",");
        
        while (scanner.hasNext()) {
            System.out.print(scanner.next() + " | ");
        }

        scanner.close();
    }

I'm stuck on where to go from here in order to achieve what I am trying to achieve.

java csv

Solution 1:^[1]

You can store the frequency of true value of liked_ad for each user_id in a Map<String, Integer> map and then sort the Map on values.

import java.io.File;
import java.io.IOException;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

public class Main {
    public static void main(String[] args) throws IOException {
        Scanner scanner = new Scanner(new File("file.txt"));

        // Ignore the header line
        if (scanner.hasNextLine()) {
            scanner.nextLine();
        }

        // Store the frequency of liked_ad for each user_id
        Map<String, Integer> map = new HashMap<>();
        while (scanner.hasNextLine()) {
            String[] data = scanner.nextLine().split(",");
            if (data.length >= 3 && Boolean.parseBoolean(data[2])) {
                map.merge(data[0], 1, Integer::sum);
            }
        }

        // Sort the Map on values and display each entry
        map.entrySet().stream().sorted(Collections.reverseOrder(Map.Entry.comparingByValue()))
                .forEach(System.out::println);

    }
}

Given the following data in the file:

"user_id","age","liked_ad","location"
1145,40,true,UK
2145,34,true,USA
6786,25,true,UK
6786,25,true,UK
1145,40,true,UK
2145,34,true,USA
9025,21,false,USA
1145,40,false,UK
1145,40,true,UK

the output will be

1145=3
6786=2
2145=2

Solution 2:^[2]

Following code should do what you want to achive:

public static void main(String[] args) throws IOException {

    SortedMap<String, Integer> stats = new TreeMap<>(Collections.reverseOrder());

    Files.readAllLines(Paths.get(args[0])).forEach((line) -> {
        String[] columns = line.split(",");
        if (Boolean.valueOf(columns[2])) {
            stats.compute(columns[0], (key, value) -> value == null ? 1 : value + 1);
        }
    });
    
    for (Entry<String, Integer> entry : stats.entrySet()) {
        System.out.println(entry.getKey() + ": " + entry.getValue());
    }
}

Solution 3:^[3]

Retrieve the CSV file, group it by user_id, count records whose third column is true in each group, find groups where the count is greater than 0, and then sort records by the count in descending order. The code will be lengthy if you try to code the process in Java.

I suggest you using SPL, the open-source Java package to do this. It is simple and only one line of code is enough:

	A
1	=file("advert-data.csv").import@cqt().groups(user_id;count(#3==true):count).select(#2>0).sort(-#2)

SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as rank.splx and invoke it in Java as you call a stored procedure:

…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st = con.prepareCall("call rank()");
st.execute();
…

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Basil Bourque
Solution 2	rmunge
Solution 3

'Grouping duplicates in CSV file and ranking data based on certain values

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]