'Convert query to KQL
I have an application that contains an object. It has 2 guid values, guid1 and guid2. At any given time only 1 is populated. Firstly guid1 and at a later point in time guid2. State changes are recorded in a table below. I'd like to group the records so that all guids for an object are grouped together.
So this table ...
timestamp guid1 guid2 text
2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | null | abc1
2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | null | abc2
2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751| def1
2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | 206eb977459c4f91bafb9b798f5d60c4| def2
2022-05-06T10:02:31.5767324Z | null | b7ef78cde158437fb65a6878ca908751| ghi1
2022-05-06T10:02:31.5767324Z | null | 206eb977459c4f91bafb9b798f5d60c4| ghi2
... becomes this set of query results
timestamp guid1 guid2 text
2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | null | abc1
2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751| def1
2022-05-06T10:02:31.5767324Z | null | b7ef78cde158437fb65a6878ca908751| ghi1
2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | null | abc2
2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | 206eb977459c4f91bafb9b798f5d60c4| def2
2022-05-06T10:02:31.5767324Z | null | 206eb977459c4f91bafb9b798f5d60c4| ghi2
From the desired output above, just taking the first 3 records for example, they are now grouped to show the full state change history. First record shows guid1 had a value of cb73c58e-e36b-4fe3-8663-33027ba2afc7, then a record to show guid2 activation and then finally guid1 is null and only guid2 exists. The same correct grouping can be seen in the last 3 records for guid1 ec5d1b9395444a06a36130a9d62ae2c5
I was having difficulty attempting this in sql never mind kql, I asked the question on a seperate sql thread and the suggested solution was below which I was having difficulty converting to kql.
select *
from t
order by Row_Number() over(partition by [timestamp] order by [timestamp]),
guid1 desc, guid2;
row_number and partition by are supported but I've not seen reference to over so I'm unsure how to achieve this.
Perhaps there is a more kql friendly way of achieving this?
[COMMENTED SOLUTION]
let t = datatable(timestamp:datetime,guid1:string,guid2:string,text:string)
[
'2022-05-06T10:00:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'' ,'abc1'
,'2022-05-06T10:00:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5' ,'' ,'abc2'
,'2022-05-06T10:01:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'b7ef78cde158437fb65a6878ca908751' ,'def1'
,'2022-05-06T10:01:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5' ,'206eb977459c4f91bafb9b798f5d60c4' ,'def2'
,'2022-05-06T10:02:31.5767324Z' ,'' ,'b7ef78cde158437fb65a6878ca908751' ,'ghi1'
,'2022-05-06T10:02:31.5767324Z' ,'' ,'206eb977459c4f91bafb9b798f5d60c4' ,'ghi2'
,'2022-05-06T10:03:31.5767324Z' ,'fee3d3522a3942a69802774f8a5128ff' ,'' ,'xxx1'
,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a' ,'' ,'yyy1'
,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a' ,'0d2ac92589634b27a171be005375b1b5' ,'yyy2'
];
t
| where isnotempty(guid1)
// select records where guid1 is not empty
| summarize take_any(guid2) by guid1
// of that recordset, reduce to 2 columns of interest and select records where the accompanying guid2 is not empty. That's default take_any behaviour
| serialize
// serialize the recordset to enable the use of window functions later in query
| extend gid = row_number()
// mark this recordset with a parent guid row identifier
| mv-expand g = pack_array(guid1, guid2) to typeof(string)
// push recordset into an array
| where isnotempty(g)
// continue in execution if the array is populated
| project g, gid
// reduce recordset to array of related guid1/guid2 and associated parent guid row identifier
| join kind=inner (t | extend g = coalesce(guid1, guid2)) on g
// inner join on original recordset
| project-away g, g1
// exclude g, g1 columns from recordset
| partition hint.strategy=native by gid
(
order by gid asc, iff(isnotempty(guid1), 1, 2) asc, iff(isempty(guid2), 1, 2) asc
| extend rid = row_number()
)
// - partition the recordset by gid (to group related parent guid records),
// - order by gid with an order preference of non empty guid1Ids/empty guid2Ids over empty guid1Ids/non empty guid2Ids
// - mark each record with a row id
| order by gid asc, rid asc
// order recordset
| project-reorder gid, rid
// reorder gid column to appear before rid
Solution 1:[1]
The assumption is that if we have a record with a guid2 and an empty guid1, we also have a record with that guid2 and a non-empty guid1.
Part 1
Create a dictionary of guid
We take all records that have guid1.
From those records, we keep only 1 record per guid1, preferably with guid2.
We number those records arbitrarily. These numbers will later represent groups of records (gid = group id).
We now duplicate all records that have both guid1 & guid2, for one record we keep guid1 and for the other - guid2. Note that those records will have the same record number.
Part 2
Match each original table record with a guid from the dictionary
If a record has guid1 (and maybe also guid2) we connect it to guid1 from the dictionary.
If a record has no guid1, we connect it to guid2.
At this point, each record as a gid. records that their guid1 & guid2 are connected have the same gid.
Part 3
Optional. Number the records within each group, first records with guid1, followed by records with both guid, and ending with records with only guid2
let t = datatable(timestamp:datetime,guid1:string,guid2:string,text:string)
[
'2022-05-06T10:00:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'' ,'abc1'
,'2022-05-06T10:00:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5' ,'' ,'abc2'
,'2022-05-06T10:01:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'b7ef78cde158437fb65a6878ca908751' ,'def1'
,'2022-05-06T10:01:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5' ,'206eb977459c4f91bafb9b798f5d60c4' ,'def2'
,'2022-05-06T10:02:31.5767324Z' ,'' ,'b7ef78cde158437fb65a6878ca908751' ,'ghi1'
,'2022-05-06T10:02:31.5767324Z' ,'' ,'206eb977459c4f91bafb9b798f5d60c4' ,'ghi2'
,'2022-05-06T10:03:31.5767324Z' ,'fee3d3522a3942a69802774f8a5128ff' ,'' ,'xxx1'
,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a' ,'' ,'yyy1'
,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a' ,'0d2ac92589634b27a171be005375b1b5' ,'yyy2'
];
t
| where isnotempty(guid1)
| summarize take_any(guid2) by guid1
| serialize
| extend gid = row_number()
| mv-expand g = pack_array(guid1, guid2) to typeof(string)
| where isnotempty(g)
| project g, gid
| join kind=inner (t | extend g = coalesce(guid1, guid2)) on g
| project-away g, g1
| partition hint.strategy=native by gid
(
order by gid asc, iff(isnotempty(guid1), 1, 2) asc, iff(isempty(guid2), 1, 2) asc
| extend rid = row_number()
)
| order by gid asc, rid asc
| project-reorder gid, rid
gid | rid | timestamp | guid1 | guid2 | text |
---|---|---|---|---|---|
1 | 1 | 2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | abc1 | |
1 | 2 | 2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751 | def1 |
1 | 3 | 2022-05-06T10:02:31.5767324Z | b7ef78cde158437fb65a6878ca908751 | ghi1 | |
2 | 1 | 2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | abc2 | |
2 | 2 | 2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | 206eb977459c4f91bafb9b798f5d60c4 | def2 |
2 | 3 | 2022-05-06T10:02:31.5767324Z | 206eb977459c4f91bafb9b798f5d60c4 | ghi2 | |
3 | 1 | 2022-05-06T10:03:31.5767324Z | fee3d3522a3942a69802774f8a5128ff | xxx1 | |
4 | 1 | 2022-05-06T10:04:31.5767324Z | 48b04d074cd141dba6eb9a354d26be0a | yyy1 | |
4 | 2 | 2022-05-06T10:04:31.5767324Z | 48b04d074cd141dba6eb9a354d26be0a | 0d2ac92589634b27a171be005375b1b5 | yyy2 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |