'SQL OLAP vs KQL || row_number() over partition by in SQL vs KQL
I have OLAP Query in SQL and requirement is same query result need to fetch in ADX(Azure Data explorer) using KQL.
I'm new to KQL and trying to convert same SQL into KQL but not able to achieve row_number() over partition by part in KQL.
Tried below article from MS but no luck using EXPLAIN key word.
https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/sqlcheatsheet
Below is SQL, Need same in KQL. Can you please help me on this?
Select logdate, adlspathL, count as totalRequests ,repeated as repeatedInLast7Days
from (select logdate, adlspathL, count ,SUM(rnk1) over(partition by adlspathL) repeated, rnk2
from(Select logdate, adlspathL, count, rnk,
rank() over(partition by adlspathL order by adlspathL) rnk1,
row_number() over(partition by adlspathL order by count desc) rnk2
from (Select logdate, adlspathL, count,
rank() over(partition by logdate order by count desc) rnk from ( Select logdate, adlspath as adlspathL, SUM(cnt) count
from `LogAnalyticsADLS`.`LogAnalyticsProdADLSAgg_OpsName`
where logdate between '2021-08-23' and '2021-08-29'
group by logdate, adlspathL
)a
)b
where rnk between 1 and 10
order by logdate , rnk
)a
)b
where rnk2 =1
Thanks, Brahma
Solution 1:[1]
Here is a KQL solution.
Currently there isn't a built-in function for rank, only for dense rank (row_rank), however a rank functionallity could be achieved by the scan operator.
Retrieval of max value + additional values from the same row, could be easily achieved using the arg_max function.
// Generation of a data sample. Not a prat of the solution
let loganalyticsprodadlsagg_opsname = range i from 1 to 1000 step 1 | project logdate = bin(datetime(2021-09-01) - 10d*rand(), 1d), adlspath = strcat("adlspath_", tostring(toint(rand(15)))), cnt = toint(rand(100));
// Solution starts here.
loganalyticsprodadlsagg_opsname
| where logdate between (datetime('2021-08-23') .. datetime('2021-08-29'))
| summarize cnt = sum(cnt) by logdate, adlspath
| partition by logdate
(
order by cnt desc
| scan declare(rank:int = 1, rn:int = 0)
with
(
step s : True => rn = s.rn + 1, rank = iff(cnt == s.cnt, s.rank, s.rn+1);
)
| where rank <= 10
)
| summarize totalrequests = arg_max(cnt, logdate), repeatedInLast7Days = count() by adlspath
adlspath | totalrequests | logdate | repeatedInLast7Days |
---|---|---|---|
adlspath_7 | 644 | 2021-08-23T00:00:00Z | 4 |
adlspath_8 | 472 | 2021-08-23T00:00:00Z | 6 |
adlspath_1 | 760 | 2021-08-24T00:00:00Z | 6 |
adlspath_12 | 400 | 2021-08-24T00:00:00Z | 5 |
adlspath_5 | 599 | 2021-08-24T00:00:00Z | 3 |
adlspath_11 | 410 | 2021-08-25T00:00:00Z | 4 |
adlspath_14 | 469 | 2021-08-25T00:00:00Z | 5 |
adlspath_13 | 565 | 2021-08-25T00:00:00Z | 6 |
adlspath_9 | 472 | 2021-08-25T00:00:00Z | 2 |
adlspath_2 | 514 | 2021-08-26T00:00:00Z | 6 |
adlspath_10 | 391 | 2021-08-26T00:00:00Z | 4 |
adlspath_3 | 414 | 2021-08-27T00:00:00Z | 4 |
adlspath_6 | 594 | 2021-08-28T00:00:00Z | 3 |
adlspath_0 | 499 | 2021-08-28T00:00:00Z | 6 |
adlspath_4 | 552 | 2021-08-29T00:00:00Z | 6 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |