'How to collapse multiple rows of cell values in excel using powerquery

How can one collapse multiple rows of data such that it matches with its unique id. The data set is dynamically loaded from web. It is sorted by date so that each new entries update preceding entries.

Note:
I neither want to delete certain null values, nor ignore them by pivoting, because some of this null values are data yet to be filled.

I have a sample file with scattered rows of data on sheet 1. The solution I want to be provided is on sheet 2. How can one achieve this? This is the most complicated task. I will be happy if this challenge is resolved.

This is the link: https://ibb.co/3kVkcBL

or somethiing similar

https://docs.google.com/spreadsheets/d/1DzOmjU0izheVfizGk7XZhPRCCb5VfPLO/edit?usp=drivesdk&ouid=100839365308659312055&rtpof=true&sd=true



Solution 1:[1]

Here is one method:

Read the code comments to better understand the algorithm

Edit: to fix problem of missing grades

let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"ID", Int64.Type}, {"NAME", type text}, {"CLASS", type text}, 
        {"CACHEMISTRY", Int64.Type}, {"EXAM CHEMISTRY", Int64.Type}, 
        {"TOTALCHEM", Int64.Type}, {"GRADECHEM", type text}, {"CABIOLOGY", Int64.Type}, 
        {"EXAMBIOLOGY", Int64.Type}, {"TOTALBIOLOGY", Int64.Type}, {"GRADEBIOLOGY", type text}, 
        {"Date", type datetimezone}}),

//There some "string" nulls. Will replace them with real nulls so we can "fill up"
    replaceNulls = Table.ReplaceValue(#"Changed Type","null",null,Replacer.ReplaceValue,Table.ColumnNames(#"Changed Type")),

//Group by Name and ID
// then fill up all the columns and return the last row only
    group = Table.Group(replaceNulls,{"ID","NAME"},{
        {"newTable", each Table.Last(Table.FillDown(Table.FillUp(_,Table.ColumnNames(_)),Table.ColumnNames(_)))}
    }),

//Expand the results
    columnsToExpand = List.RemoveFirstN(Table.ColumnNames(replaceNulls),2),
    #"Expanded newTable" = Table.ExpandRecordColumn(group, "newTable",
        columnsToExpand,columnsToExpand),

//Set Data Types
// Note that Date column is typed as text
//    This is so it will display near properly in Excel (it will be missing the "T"
//    You may need to modify this depending on whether this is satisfactory or not
    typeIt = Table.TransformColumnTypes(#"Expanded newTable",{
        {"CLASS", type text}, 
        {"CACHEMISTRY", Int64.Type}, {"EXAM CHEMISTRY", Int64.Type}, 
        {"TOTALCHEM", Int64.Type}, {"GRADECHEM", type text}, {"CABIOLOGY", Int64.Type}, 
        {"EXAMBIOLOGY", Int64.Type}, {"TOTALBIOLOGY", Int64.Type}, {"GRADEBIOLOGY", type text}, 
        {"Date", type text}})
    
in
    typeIt

Note:
There are some inconsistent differences. Some of them may be typos, others I'm not sure. These can be resolved, but you need to create consistent rules

  • There seem to be a mix of string "nulls" and actual null values. I converted them all to actual null values, but that could be easily changed.
  • Ben Victor EXAMBIOLOGY is 40 on Sheet1 but 60 on Sheet2
  • Ben Victor Datetime is 2022-03-16T15:14:00.000+01:00 and 2022-04-14T11:47:00.000+01:00 on sheet 1 but 2022-03-16T11:47:00.000+01:00 on sheet 2. Which should it be?

Original Table from Google Drive
enter image description here

Result after running above query
Note that the date is not formatted as you show, but that is a simple fix

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1