'Remove duplicate entries from string array column of postgres

I have a PostgreSQL table where there is column which has array of strings. The row have some unique array strings or some have duplicate strings also. I want to remove duplicate strings from each row if they exists.

I have tried to some queries but couldn't make it happen.

Following is the table:

  veh_id |             vehicle_types              
 --------+----------------------------------------
      1  | {"byd_tang","volt","viper","laferrari"} 
      2  | {"volt","viper"}                        
      3  | {"byd_tang","sonata","jaguarxf"}        
      4  | {"swift","teslax","mirai"}              
      5  | {"volt","viper"}                        
      6  | {"viper","ferrariff","bmwi8","viper"}   
      7  | {"ferrariff","viper","viper","volt"}    

I am expecting following output:

  veh_id |             vehicle_types              
 --------+----------------------------------------
      1  | {"byd_tang","volt","viper","laferrari"} 
      2  | {"volt","viper"}                        
      3  | {"byd_tang","sonata","jaguarxf"}        
      4  | {"swift","teslax","mirai"}              
      5  | {"volt","viper"}                        
      6  | {"viper","ferrariff","bmwi8"}           
      7  | {"ferrariff","viper","volt"}            


Solution 1:[1]

Since each row's array is independent, a plain correlated subquery with an ARRAY constructor would do the job:

SELECT *, ARRAY(SELECT DISTINCT unnest (vehicle_types)) AS vehicle_types_uni
FROM   vehicle;

See:

Note that NULL is converted to an empty array ('{}'). We'd need to special-case it, but it is excluded in the UPDATE below anyway.

Fast and simple. But don't use this. You didn't say so, but typically you'd want to preserve original order of array elements. Your rudimentary sample suggests as much. Use WITH ORDINALITY in the correlated subquery, which becomes a bit more sophisticated:

SELECT *, ARRAY (SELECT v
                 FROM   unnest(vehicle_types) WITH ORDINALITY t(v,ord)
                 GROUP  BY 1
                 ORDER  BY min(ord)
                ) AS vehicle_types_uni
FROM   vehicle;

See:

UPDATE to actually remove dupes:

UPDATE vehicle
SET    vehicle_types = ARRAY (
                 SELECT v
                 FROM   unnest(vehicle_types) WITH ORDINALITY t(v,ord)
                 GROUP  BY 1
                 ORDER  BY min(ord)
                )
WHERE  cardinality(vehicle_types) > 1  -- optional
AND    vehicle_types <> ARRAY (
                 SELECT v
                 FROM   unnest(vehicle_types) WITH ORDINALITY t(v,ord)
                 GROUP  BY 1
                 ORDER  BY min(ord)
                ); -- suppress empty updates (optional)

Both added WHERE conditions are optional to improve performance. The 1st one is completely redundant. Each condition also excludes the NULL case. The 2nd one suppresses all empty updates.

See:

If you tried to do that without preserving original order, you'd likely update most rows without need, just because the order or elements changed even without dupes.

Requires Postgres 9.4 or later.

db<>fiddle here

Solution 2:[2]

I don't claim it's efficient, but something like this might work:

with expanded as (
  select veh_id, unnest (vehicle_types) as vehicle_type
  from vehicles
)
select veh_id, array_agg (distinct vehicle_type)
from expanded
group by veh_id

If you really want to get fancy and do something that is worst case O(n), you can write a custom function:

create or replace function unique_array(input_array text[])
returns text[] as $$
DECLARE
  output_array text[];
  i integer;
BEGIN

  output_array = array[]::text[];

  for i in 1..cardinality(input_array) loop
    if not (input_array[i] = any (output_array)) then
      output_array := output_array || input_array[i];
    end if;
  end loop;

  return output_array;
END;
$$
language plpgsql

Usage example:

select veh_id, unique_array(vehicle_types)
from vehicles

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2