'DataFrame challenge: mapping ID to value in different row. Preferably with Polars

Consider this example:

import polars as pl

df = pl.DataFrame({
    'ID': ['0', '1', '2', '3', '4', '5','6', '7', '8', '9', '10'],
    'Name' : ['A','','','','B','','C','','','D', ''], 
    'Element' : ['', '4', '4', '0', '', '4', '', '0', '9', '', '6']
})

The 'Name' is linked to an 'ID'. This ID is used as a value in the 'Element' column. How do I map the correct 'Name' to the elements? Also I want to group the elements by 'Name' ('Name_list'), count them and sort by counted values ('E_count').

The resulting df would be:

Name_list Element E_count
-------------------------
'B'       '4'     3
'A'       '0'     2
'C'       '6'     1
'D'       '9'     1

Feedback very much appreciated; even a Pandas solution.



Solution 1:[1]

Here's a Polars solution. We'll use a join to link the ID and Element columns (after some filtering and summarizing).

import polars as pl
(
    df.select(["Name", "ID"])
    .filter(pl.col("Name") != "")
    .join(
        df.groupby("Element").agg(pl.count().alias("E_count")),
        left_on="ID",
        right_on="Element",
        how="left",
    )
    .sort('E_count', reverse=True)
    .rename({"Name":"Name_list", "ID":"Element"})
)

Note: this differs from the solution listed in your answer. The Name D is associated with ID 9 (not 10).

shape: (4, 3)
?????????????????????????????????
? Name_list ? Element ? E_count ?
? ---       ? ---     ? ---     ?
? str       ? str     ? u32     ?
?????????????????????????????????
? B         ? 4       ? 3       ?
?????????????????????????????????
? A         ? 0       ? 2       ?
?????????????????????????????????
? C         ? 6       ? 1       ?
?????????????????????????????????
? D         ? 9       ? 1       ?
?????????????????????????????????

You can also use the polars.Series.value_counts method, which looks somewhat cleaner:

import polars as pl
(
    df.select(["Name", "ID"])
    .filter(pl.col("Name") != "")
    .join(
        df.get_column("Element").value_counts(),
        left_on="ID",
        right_on="Element",
        how="left",
    )
    .sort("counts", reverse=True)
    .rename({"Name": "Name_list", "ID": "Element", "counts": "E_count"})
)

Solution 2:[2]

If I understood your problem correctly, then you could use pandas and do the following:

countdf = pd.merge(df,df[['ID','Name']],left_on='Element',right_on='ID',how='inner')
countdf = pd.DataFrame(countdf.groupby('Name_y')['Element'].count())

result = pd.merge(countdf,df[['Name','ID']],left_on='Name_y',right_on='Name',how='left')
result[['Name','ID','Element']]

Solution 3:[3]

using pandas We can use map to map the values and using the where condition to keep from making name as null. lastly, its a groupby

df['Name'] = df['Name'].where(cond=df['Element']=="", 
                       other=df[df['Element']!=""]['Element'].map(lambda x: df[df['ID'] == x]['Name'].tolist()[0]), 
                       axis=0)
df[df['Element'] != ""].groupby(['Name','Element']).count().reset_index()
    Name    Element     ID
0   A   0   2
1   B   4   3
2   C   6   1
3   D   9   1

Solution 4:[4]

Try this, you don't need groupby nor joins, just map and value_counts:

df.drop('Element', axis=1)\
  .query('Name != "" ')\
  .assign(E_count = df['ID'].map(df['Element'].value_counts()))

Output:

  ID Name  E_count
0  0    A      2.0
4  4    B      3.0
6  6    C      1.0
9  9    D      1.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Pedro de Sá
Solution 3 Naveed
Solution 4 Scott Boston