'Azure Data Explorer (ADX) vs Polybase vs Databricks

Question

Today I discovered another Azure service called Azure Data Explorer (ADX). Sorry for such comparison of services, I have good understanding of all except ADX. I feel like there is a big functionality overlay, so want to know the exact role of ADX in Azure infrastructure.

What is the use case when ADX is significantly better than Synapse/Databricks?

My understanding of ADX

AFAIK, ADX is a cluster (with per hour billing, like Databricks or Synapse, not like ADLA) that is handling database for you and is optimized for streaming ingestion and ad-hoc queries at scale. It also supports external tables, that has worse performance but cheaper (you pay for Blob/ADLS storage).

Details

I don't understand why do we need ADX if:

  1. Azure Synapse has similar pricing model (cluster, per-hour), also it supports streaming ingestion and ad-hoc querying at scale. Azure Synapse support querying BlobStorage/ADLS through Polybase external tables.
  2. Databricks is another service that is capable of doing it. Using Databricks Ingest and Delta Lake - you can ingest streaming data and consume them in both: streaming and batching way. Actually you can have interactive cluster that will handle ad-hoc queries for you.
  3. Also if you want a real-time analytics - use Azure Stream Analytics. If you want Athena-like experience - use ADLA (still it doesn't support ADLS gen2).


Solution 1:[1]

Azure Data Explorer is focused on high velocity, high volume high variance (the 3 Vs of big data). It provides super fast interactive queries over such data that is streaming in. It supports json and text natively, including full text search and indexing.

It is used in a broad set of scenarios associated with sensing activity and time series in a large set of verticals: IoT, API logs, transaction monitoring and ad hoc data exploration.

Microsoft is offering ADX as a service as it is the major service that Microsoft is using for its own telemetry and all the analytical solutions as a service that we offer in Security, operational monitoring, game analytics, product insights usage analytics, Iot, Connected vehicles is built on ADX. You can find a full list in our docs. For clarity, SQL, Synapse, CosmosDB is storing its telemetry in Azure Data explorer...

SQL DW (AKA Synapse SQL pool) is an excellent data warehouse and implements the modern data warehouse pattern. ETL->Curated data model-> Load and serve via analysis services or power BI.
ADX is for real time analytics, enabling applying schema on read (SOR) on data as fresh as seconds old.

Consider ADX as a fully managed platform when replacing SOLR/Lucine based variants used for logs, time series databases and more.

Try it out in large workloads and you will see it is dramatically cheaper than the alternatives and much more powerful and performant.

Reach out to me if you need help.

Solution 2:[2]

Azure Data Explorer alias Kusto is focused on high volume data ingestion and almost real-time query and analytics. It is invented at Microsoft for log and telemetry analytics, but can be used for other purposes e.g. Iot, sensor data or web analytics. Same technology is used in Azure internal services like Azure Monitor and Log Analytics.

Similar capabilities could be build on Synapse or Databricks or HDInsight, but I see these as tools that fit much more broad use-cases. ADX has quite narrow focus. ADX does support queries (”KQL”) but has very limited SQL support. It is good for append only data, not for updates. It is not a data warehouse, database or data lake.

Microsoft material refers to the technology behind ADX with name Kusto. More info on this at https://docs.microsoft.com/en-us/azure/data-explorer/kusto/concepts/. A good comparison of services can be found in this blog post: https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Uri Barash
Solution 2 Veikko