'How can I select rows where keeping only those that meet this criteria? sql/hive
I have a table like the following:
+-------+------+
|ID |lang |
+-------+------+
|1 |eng |
|1 |pol |
|2 |eng |
|3 |gro |
|3 |eng |
+-------+------+
I'd like to keep only those rows where IF an ID is repeated i keep the non 'eng' row, so e.g. i would like:
+-------+------+
|ID |lang |
+-------+------+
|1 |pol |
|2 |eng |
|3 |gro |
+-------+------+
is there a quick neat way i can achieve this?
Unsure how to go about this in a nice way to achieve result above! I am using hive
Solution 1:[1]
If you need single line per id, then use row_number(), partition by id, order by case statement in which you can have some custom ordering logic.
For example row_number below will mark any first not eng (randomly) row (per id) with rn=1, any other rows for the same id will be marked >1: 2, 3, 4... And you can filter only that single row. If you want to pick some lang preferably, add more cases to the case expression to order depending on lang, or you can add additional column or expression to the order by.
select id, lang
from ( select id, lang,
row_number() over(partition by id
order by case when lang != 'eng' then 1
else 2
end
) rn
from mytable
) s
where rn=1
If you need to keep all rows for the same id which are not 'eng', use dense_rank()
or rank()
instead of row_number()
with the same over() as above, it will assign 1 to all rows with lang!='eng' per id.
Solution 2:[2]
WITH cte_temp (Id, Lang, Rank) AS
(
SELECT
Id, Lang,
DENSE_RANK() OVER (PARTITION BY Id, LANG ORDER BY LANG DESC) AS Rank
FROM
YourTable
)
SELECT *
FROM cte_temp
WHERE rank = 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | marc_s |
Solution 2 | marc_s |