'How to create ratios using value counts and separate fields in Python?

Using the data frame shown below I'd like to create manager to assistant and manager to associate percentages/ ratios based/ per location.

enter image description here

I'm looking for the most efficient way to do this. I could do it the long way, creating a plethora of variables but rather not.

The output should be something like:

Manager to Assitant

Let me know if you need any additional input or guidance

Also, please explain why you do what you do, so I and others can understand the logic.

I appreciate the help.



Solution 1:[1]

Here is what I came up with. Why? Because that is what I am used to doing:)

You can see that I did not recreate your entire dataframe. When you groupby the df it will create a new one, so your original should be intact, so it didn't matter how much of it I created..

import random
# Create data to choose from.
locations = ['Lakewood', 'Chatfield', 'Broomfield']
types = ['Associate', 'Assistant', 'Manager']
# Create lists for df.
emp_id = [n for n in range(1,21)]
rand_locs = [random.choice(locations) for _ in range(1,21)]
rand_types = [random.choice(types) for _ in range(1,21)]
# Create data and cols for df.
data = list(zip(rand_locs, rand_types))
cols = ['Location', 'Job_Type']
df = pd.DataFrame(data, index=emp_id, columns=cols)
# Groupby to isolate locations and count the job types.
gdf = df.groupby(['Location', 'Job_Type']).agg({'Job_Type': 'count'})
# Create dicts of ratios for new df.
mgr_asst_dict = {}
# Enumerate so I can use the index numbers for scalars.
# Try/except because I am using random data and sometimes some will be missing.
for x in enumerate(locations):
    try:
        ratio = (
            gdf.loc[locations[x[0]], 'Manager'].values[0] /
            gdf.loc[locations[x[0]], 'Assistant'].values[0]
        )
        mgr_asst_dict.update({x[1]: ratio})
    except:
        # Passing because I want NaN for missing data.
        pass
mgr_assoc_dict = {}
for x in enumerate(locations):
    try:
        ratio = (
            gdf.loc[locations[x[0]], 'Manager'].values[0] /
            gdf.loc[locations[x[0]], 'Associate'].values[0]
        )
        mgr_assoc_dict.update({x[1]: ratio})
    except:
        pass
# Create the new df.
ratio_df = pd.DataFrame(
    [mgr_assoc_dict, mgr_asst_dict],
    index=['Mgr to Assoc', 'Mgr to Asst']
)

>>> ratio_df
              Lakewood  Broomfield
Mgr to Assoc       2.0    0.333333
Mgr to Asst        4.0    0.333333
>>> gdf
                      Job_Type
Location   Job_Type
Broomfield Assistant         3
           Associate         3
           Manager           1
Chatfield  Assistant         1
           Associate         5
Lakewood   Assistant         1
           Associate         2
           Manager           4

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Eric M