'DISTINCT with HASH MATCH (Flow Distinct) in SQL Server
Recently, while working in SQL Server, I got an interesting thing that removing DISTINCT
keyword actually decreased my query performance and increased my search time.
I read that DISTINCT
can make queries slow so I removed it to make them faster, but that made my query even slower.
Further experimenting, I got that when I add DISTINCT
, SQL Server actually do something HASH MATCH (Flow Distinct) and this reduces the time, and even parallelism is added with HASH MATCH.
My query looks like this:
SELECT DISTINCT TOP 5000
[A].[Row Id], [A].[Account], ... other columns
FROM
[Archival System].[dbo].[Activity] A
WHERE
([A].[Row Id] LIKE N'%{search_term}%'
OR [A].[Account] LIKE '%{search_term}%'
OR ... others conditions)
When I remove DISTINCT
from this query, it becomes slower.
Here I have added TOP without ORDER BY as I a search string is not going to repeat again and if it occurs it would be rare. Hence I am avoiding ORDER BY to get a little more performance.
Below are my query execution plans with the query code of both my queries (previous and later) and the only change in query is just the DISTINCT
keyword
Previous : https://www.brentozar.com/pastetheplan/?id=HyV7FKiU9
Later: https://www.brentozar.com/pastetheplan/?id=S1bFV9jUq
Can anybody tell me what this HASH MATCH (Flow Distinct) does, and why it just works when I add distinct
?
And is it reliable to depend on that? As I am using DISTINCT
now does it will continue to return results with the same time as it is returning now.
OR is there a better way to improve my query search time.
I am using SQL Server 2012 Enterprise edition.
Thanks in advance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|