'How can I create a cross-tab of two columns in a dataframe in Python and generate a total row and column in the output?

I have created a dataframe from a CSV file and now I'm trying to create a cross-tab of two columns ("Personal_Status" and "Gender"). The output should look like this: Crosstab of Gender and Personal Status including the frequencies of each gender/personal status combination and the totals of each generated row and column.

I tried creditData[["Personal_Status", "Gender"]].value_counts() but it's not quite where I want it. The output includes a column of each "Personal_Status" value, a "Gender" column, and the frequency of each combination, i.e row 1 = "Single, M, 232"

Any insight is greatly appreciated.



Solution 1:[1]

Something like this?


import pandas as pd


df = pd.DataFrame({'Name':['Kathy', 'Linda', 'Peter'],
                   'Gender': ['F','F','M'],
                   'Personal_Status':['Divorced','Married','Married']})

df2 = pd.crosstab(df.Personal_Status, df.Gender)

df2.loc['Grand Total']= df2.sum(numeric_only=True, axis=0)
df2.loc[:,'Grand Total'] = df2.sum(numeric_only=True, axis=1)

print(df2)

Output

Gender           F  M  Grand Total
Personal_Status                   
Divorced         1  0            1
Married          1  1            2
Grand Total      2  1            3

Solution 2:[2]

Well it depends highly on current shape of your dataset. You should group your data by their personal status and Gender and then summarize/aggregate count as you need the member of each category. In this case if you have only status and gender try something like this.

df = pd.DataFrame({'Name':['Kathy', 'Linda', 'Peter'],
               'Gender': ['F','F','M'],
               'Personal_Status':['Divorced','Married','Married']})

df2 = df.groupby(by=['Personal_Status','Gender'],as_index=False).count()
df3 = df2.pivot(index='Personal_Status',columns='Gender')

df3
Gender             F    M
Personal_Status          
Divorced         1.0  NaN
Married          1.0  1.0

Solution 3:[3]

Assuming the dataframe is produced as:

import pandas as pd
df = pd.DataFrame({'Name':['Kathy', 'Linda', 'Peter'],
                   'Gender': ['F','F','M'],
                   'Personal_Status':['Divorced','Married','Married']})

This one liner will produce the requested output:

pd.crosstab(df.Personal_Status, df.Gender, margins=True, margins_name='Grand Total')

Output

Gender           F  M  Grand Total
Personal_Status                   
Divorced         1  0            1
Married          1  1            2
Grand Total      2  1            3

This is slightly simpler than the answer given by @iamericfletcher.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 DeepBlue
Solution 3