'Java streams collect excel CSV to a list filtering based on the sum of a column

Suppose we have an excel spreadsheet that looks like:

StatusCount  FirstName  LastName  ID
1            Tod        Mahones   122145
0            Tod        Mahones   122145
1            Tod        Mahones   122145
-1           Tod        Mahones   122145
1            Ronny      Jackson   149333
1            Eliza      Cho       351995
-1           Eliza      Cho       351995
1            James      Potter    884214
1            James      Potter    884214
-1           Peter      Walker    900248
1            Zaid       Grits     993213

How can I be able to gather to a list of only the IDs of the people whose status count is a sum greater than 0, and if it is 0 or less then discard it. So in the excel spreadsheet above, the list in java should look like:

List<Integer> = [122145, 149333, 884214, 993213]

Update (adding in what I tried so far):

List<Integer> = csvFile.stream()
                       .map(Arrays::asList)
                       .filter(column -> column.get(0).equalsIgnoreCase("1")
                       .map(column -> column.get(3))
                       .map(Integer::parseInt)
                       .sorted()
                       .collect(Collectors.toList());

I collected them just by status counts of 1 but that isn't the right process, it should be to sum up the status count for each person or ID (I guess it is good to find any dupes) and if its > 0 then collect to the list, if not then discard.

Update 2: I forgot to mention that the csv file is brought into java as a List<String[]> where the List contains the rows of the csv and the String[] is the contents of the rows, so it would be like:

[[1, Tod, Mahones, 122145],[0, Tod, Mahones, 122145], [1, Tod, Mahones, 122145], ...]



Solution 1:[1]

The following should work:

  1. Create a Map<Integer, Integer> to summarize statuses per ID using Collectors.groupingBy + Collectors.summingInt
  2. Filter entries of the intermediate map and collect keys (IDs) to the list.

If the order of IDs should be maintained as in the input file, a LinkedHashMap::new can be provided as an argument when building the map.

public static List<Integer> getIDs(List<String[]> csvFile) {
    return csvFile.stream()
        .map(Arrays::asList) 
        .collect(Collectors.groupingBy(
            column -> Integer.parseInt(column.get(3)),
            LinkedHashMap::new, // optional argument to maintain insertion order
            Collectors.summingInt(
                column -> Integer.parseInt(column.get(0))
        )))
        .entrySet()
        .stream()
        .filter(e -> e.getValue() > 0)
        .map(Map.Entry::getKey)
        .collect(Collectors.toList());
}

Test

List<String[]> csvFile = Arrays.asList(
    new String[] {"1", "Tod", "Mahones", "122145"},
    new String[] {"0", "Tod", "Mahones", "122145"},
    new String[] {"1", "Tod", "Mahones", "122145"},
    new String[] {"-1", "Tod", "Mahones", "122145"},
    new String[] {"1", "Ronny", "Jackson", "149333"},
    new String[] {"1", "Eliza", "Cho", "351995"},
    new String[] {"-1", "Eliza", "Cho", "351995"},
    new String[] {"1", "James", "Potter", "884214"},
    new String[] {"1", "James", "Potter", "884214"},
    new String[] {"-1", "Peter", "Walker", "900248"},
    new String[] {"1", "Zaid", " Grits", "993213"}
);

System.out.println(getIDs(csvFile));

Output

[122145, 149333, 884214, 993213]

Solution 2:[2]

The following code should work quite good:

  1. Group the sums per ID
  2. From that Map, remove the entries with a non-positive sum
  3. Retrieve the keySet which contains the unique IDs
List<String[]> csvFile = new ArrayList<>();
csvFile.add(new String[]{"1", "Tod", "Mahones", "122145"});
csvFile.add(new String[]{"0", "Tod", "Mahones", "122145"});
csvFile.add(new String[]{"1", "Tod", "Mahones", "122145"});
csvFile.add(new String[]{"-1", "Tod", "Mahones", "122145"});
csvFile.add(new String[]{"1", "Ronny", "Jackson", "149333"});
csvFile.add(new String[]{"1", "Eliza", "Cho", "351995"});
csvFile.add(new String[]{"-1", "Eliza", "Cho", "351995"});
csvFile.add(new String[]{"1", "James", "Potter", "884214"});
csvFile.add(new String[]{"1", "James", "Potter", "884214"});
csvFile.add(new String[]{"-1", "Peter", "Walker", "900248"});
csvFile.add(new String[]{"1", "Zaid", "Grits", "993213"});

Map<String, Integer> intermediateMap = csvFile.stream()
        .map(Arrays::asList)
        .collect(Collectors.groupingBy(
                x -> x.get(3),
                Collectors.summingInt(
                        x -> Integer.parseInt(x.get(0))
                )
        ));

intermediateMap.entrySet().removeIf(e -> e.getValue() <= 0);

Set<String> ids = intermediateMap.keySet();

System.out.println(ids); // [122145, 149333, 884214, 993213]

Solution 3:[3]

As the CSV file is ordered by ID column, you can group rows by ID, sum StatusCount values in each group, and find IDs whose sums are greater than 0. You need to be familiar with Stream Collectors, otherwise it is hard to implement the algorithm.

Yet, it is easy to express the process using SPL, the open-source Java package. Only one line of code is enough,

A
1 =file("data.csv").import@ct().groups@o(ID;sum(StatusCount)).select(#2>0).(#1)
....
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st = con.prepareCall("call filter()");
st.execute();
…

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3