'How to collapse multiple rows of cell values in excel using powerquery
How can one collapse multiple rows of data such that it matches with its unique id. The data set is dynamically loaded from web. It is sorted by date so that each new entries update preceding entries.
Note:
I neither want to delete certain null values, nor ignore them by pivoting, because some of this null values are data yet to be filled.
I have a sample file with scattered rows of data on sheet 1. The solution I want to be provided is on sheet 2. How can one achieve this? This is the most complicated task. I will be happy if this challenge is resolved.
This is the link: https://ibb.co/3kVkcBL
or somethiing similar
Solution 1:[1]
Here is one method:
Read the code comments to better understand the algorithm
Edit: to fix problem of missing grades
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"ID", Int64.Type}, {"NAME", type text}, {"CLASS", type text},
{"CACHEMISTRY", Int64.Type}, {"EXAM CHEMISTRY", Int64.Type},
{"TOTALCHEM", Int64.Type}, {"GRADECHEM", type text}, {"CABIOLOGY", Int64.Type},
{"EXAMBIOLOGY", Int64.Type}, {"TOTALBIOLOGY", Int64.Type}, {"GRADEBIOLOGY", type text},
{"Date", type datetimezone}}),
//There some "string" nulls. Will replace them with real nulls so we can "fill up"
replaceNulls = Table.ReplaceValue(#"Changed Type","null",null,Replacer.ReplaceValue,Table.ColumnNames(#"Changed Type")),
//Group by Name and ID
// then fill up all the columns and return the last row only
group = Table.Group(replaceNulls,{"ID","NAME"},{
{"newTable", each Table.Last(Table.FillDown(Table.FillUp(_,Table.ColumnNames(_)),Table.ColumnNames(_)))}
}),
//Expand the results
columnsToExpand = List.RemoveFirstN(Table.ColumnNames(replaceNulls),2),
#"Expanded newTable" = Table.ExpandRecordColumn(group, "newTable",
columnsToExpand,columnsToExpand),
//Set Data Types
// Note that Date column is typed as text
// This is so it will display near properly in Excel (it will be missing the "T"
// You may need to modify this depending on whether this is satisfactory or not
typeIt = Table.TransformColumnTypes(#"Expanded newTable",{
{"CLASS", type text},
{"CACHEMISTRY", Int64.Type}, {"EXAM CHEMISTRY", Int64.Type},
{"TOTALCHEM", Int64.Type}, {"GRADECHEM", type text}, {"CABIOLOGY", Int64.Type},
{"EXAMBIOLOGY", Int64.Type}, {"TOTALBIOLOGY", Int64.Type}, {"GRADEBIOLOGY", type text},
{"Date", type text}})
in
typeIt
Note:
There are some inconsistent differences. Some of them may be typos, others I'm not sure. These can be resolved, but you need to create consistent rules
- There seem to be a mix of string "nulls" and actual null values. I converted them all to actual null values, but that could be easily changed.
- Ben Victor EXAMBIOLOGY is 40 on Sheet1 but 60 on Sheet2
- Ben Victor Datetime is
2022-03-16T15:14:00.000+01:00
and2022-04-14T11:47:00.000+01:00
on sheet 1 but2022-03-16T11:47:00.000+01:00
on sheet 2. Which should it be?
Original Table from Google Drive
Result after running above query
Note that the date is not formatted as you show, but that is a simple fix
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |