'Complex row manipulation based on column value in SQL or Power Query

I have a call dataset. Looks like this

enter image description here

If a call about a certain member comes in within 30 days of an "original call", that call is considered a callback. I need some logic or Power Query magic to handle this dataset using this logic. So the end result should look like this

enter image description here

Right now, I have the table left joined to itself which gives me every possible combination. I thought I could do something with that but it's proven difficult and when I have over 2 million unique case keys, the duplicates kill run time and overload memory. Any suggestions? I'd prefer to do the manipulation in Power Query editor but can do it in SQL. Plz and thank you.



Solution 1:[1]

I think you can do this in Power Query, but I have no idea how it will run with two million records.

It may be able to be sped up with judicious use of the Table.Buffer function. But give it a try as written first.

The code should be reasonably self-documenting

  • Group by Member ID
  • For each Member ID, create a table from a list of records which is created using the stated logic.
  • expand the tables
  • Mark the rows to be deleted by shifting up the Datediff column by one and applying appropriate logic to the Datediff and shifted columns.
  • Code assumes that the dates for each Member ID are in ascending order. If not, an extra sorting step would need to be added

Try this M code. (Change the Source line to be congruent with your own data source).

Edit:
Code edited to allow for multiple call backs from an initial call

let

//Change next line to be congruent with your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    
    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"Case key", type text}, {"Member ID", Int64.Type}, {"Call Date", type date}}),

//Group by Member ID
//  then create tables with call back date using the stated logic
       #"Grouped Rows" = Table.Group(#"Changed Type", {"Member ID"}, {
        {"Call Backs",(t)=>Table.FromRecords(
            List.Generate(
            ()=>[ck=t[Case key]{0}, cd=t[Call Date]{0}, cb = null, df=null, idx=0],
            each [idx] < Table.RowCount(t),
            each [ck=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30 
                        then [ck] else t[Case key]{[idx]+1},
                cd=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30 
                        then [cd] else t[Call Date]{[idx]+1},
                    cb = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30  
                            then t[Call Date]{[idx]+1} else null,
                    df = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
                            then Duration.Days(t[Call Date]{[idx]+1} - [cd]) else null,
                    idx = [idx]+1],
            each Record.FromList({[ck],[cd],[cb],[df]},{"Case key","Call Date","Call Back Date", "Datediff"}))
        )}
        }),
    #"Expanded Call Backs" = Table.ExpandTableColumn(#"Grouped Rows", "Call Backs", 
        {"Case key", "Call Date", "Call Back Date", "Datediff"}, 
        {"Case key", "Call Date", "Call Back Date", "Datediff"}),

    #"Shifted Datediff" = Table.FromColumns(
        Table.ToColumns(#"Expanded Call Backs") & {
        List.RemoveFirstN(#"Expanded Call Backs"[Datediff]) & {null}},
            type table[Member ID=Int64.Type, Case key=text, Call Date=date, Call Back Date=date, Datediff=Int64.Type, shifted=Int64.Type ]),

    #"Filter" = Table.SelectRows(#"Shifted Datediff", each [shifted]=null or [Datediff]<>null),
    #"Removed Columns" = Table.RemoveColumns(Filter,{"shifted"})
in
    #"Removed Columns"

enter image description here

Example with multiple callbacks
enter image description here

Solution 2:[2]

Think you can do this with Lead function.

here is the fiddle https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=f7cabdbe4d1193e5f0da6bd6a4571b96

select 
  a.*, 
  LEAD(CallDate, 1) OVER (
    Partition by memberId 
    ORDER BY 
      CallDate
  ) AS "CallbackDate", 
  LEAD(CallDate, 1) OVER (
    Partition by memberId 
    ORDER BY 
      CallDate
  ) - a.calldate AS DateDiff 
from 
  mytable a

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Bryan Dellinger