'What are Azure Data Explorer external table partitions good for?

Adding pertition to the external table definition does not help with a query on the partition.

Blob path example

  • /data/1234/2021/12/02/9483D.parquet
  • /data/1235/2021/12/02/12345.parquet

Partition (pseudo syntax not the real one) : '/data/'uniqueid'/yyyy/MM/dd/'

So only two uniqueids values are in the storage path. Total files count ~ 1 million for different dates in the path

So I defined 2 partitions as virtual columns:

  1. uniqueid
  2. datetime

Executing a query on the uniqueid like: table | summarize by uniqueid goes over all files in the blob storage for some reason.

As the uniqueid is a partition and as virtual column, shouldn't the query be super fast as we have only 2 values in the path for it? Am I totally missing the point of partitioning?

EDIT add smaple:

.create external table ['sensordata'] (['timestamp']:long,['value']:real)
    kind = adl
partition by (['uniqueid']:string ,['datecreated']:datetime )
pathformat = (['uniqueid']  '/' datetime_pattern("yyyy/MM/dd", ['daterecorded']))
    dataformat = parquet
    (
        h@'abfss://[email protected]/histdata;impersonate'
    )
    with (FileExtension='.parquet')

Query sample:

sensordata
| summarize by uniqueid


Solution 1:[1]

Thanks for your input, @user998888.

We have many optimizations for partitioned external tables, and we invest significant effort in adding more and more optimizations. But we still haven't optimized the type of query like the one you provided. It's on our list.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Slavik N