'How can I select rows where keeping only those that meet this criteria? sql/hive

I have a table like the following:

+-------+------+
|ID     |lang  |
+-------+------+
|1      |eng   |  
|1      |pol   | 
|2      |eng   |  
|3      |gro   |
|3      |eng   |
+-------+------+

I'd like to keep only those rows where IF an ID is repeated i keep the non 'eng' row, so e.g. i would like:

+-------+------+
|ID     |lang  |
+-------+------+
|1      |pol   | 
|2      |eng   |  
|3      |gro   |
+-------+------+

is there a quick neat way i can achieve this?

Unsure how to go about this in a nice way to achieve result above! I am using hive



Solution 1:[1]

If you need single line per id, then use row_number(), partition by id, order by case statement in which you can have some custom ordering logic.

For example row_number below will mark any first not eng (randomly) row (per id) with rn=1, any other rows for the same id will be marked >1: 2, 3, 4... And you can filter only that single row. If you want to pick some lang preferably, add more cases to the case expression to order depending on lang, or you can add additional column or expression to the order by.

select id, lang
  from ( select id, lang,
                row_number() over(partition by id 
                                  order by case when lang != 'eng' then 1 
                                                else 2 
                                            end
                                 ) rn
        from mytable
       ) s
  where rn=1

If you need to keep all rows for the same id which are not 'eng', use dense_rank() or rank() instead of row_number() with the same over() as above, it will assign 1 to all rows with lang!='eng' per id.

Solution 2:[2]

WITH cte_temp (Id, Lang, Rank) AS
(    
    SELECT 
        Id, Lang,
        DENSE_RANK() OVER (PARTITION BY Id, LANG ORDER BY LANG DESC) AS Rank 
    FROM 
        YourTable
)
SELECT * 
FROM cte_temp 
WHERE rank = 1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 marc_s
Solution 2 marc_s