'How to create ratios using value counts and separate fields in Python?
Using the data frame shown below I'd like to create manager to assistant and manager to associate percentages/ ratios based/ per location.
I'm looking for the most efficient way to do this. I could do it the long way, creating a plethora of variables but rather not.
The output should be something like:
Let me know if you need any additional input or guidance
Also, please explain why you do what you do, so I and others can understand the logic.
I appreciate the help.
Solution 1:[1]
Here is what I came up with. Why? Because that is what I am used to doing:)
You can see that I did not recreate your entire dataframe. When you groupby the df it will create a new one, so your original should be intact, so it didn't matter how much of it I created..
import random
# Create data to choose from.
locations = ['Lakewood', 'Chatfield', 'Broomfield']
types = ['Associate', 'Assistant', 'Manager']
# Create lists for df.
emp_id = [n for n in range(1,21)]
rand_locs = [random.choice(locations) for _ in range(1,21)]
rand_types = [random.choice(types) for _ in range(1,21)]
# Create data and cols for df.
data = list(zip(rand_locs, rand_types))
cols = ['Location', 'Job_Type']
df = pd.DataFrame(data, index=emp_id, columns=cols)
# Groupby to isolate locations and count the job types.
gdf = df.groupby(['Location', 'Job_Type']).agg({'Job_Type': 'count'})
# Create dicts of ratios for new df.
mgr_asst_dict = {}
# Enumerate so I can use the index numbers for scalars.
# Try/except because I am using random data and sometimes some will be missing.
for x in enumerate(locations):
try:
ratio = (
gdf.loc[locations[x[0]], 'Manager'].values[0] /
gdf.loc[locations[x[0]], 'Assistant'].values[0]
)
mgr_asst_dict.update({x[1]: ratio})
except:
# Passing because I want NaN for missing data.
pass
mgr_assoc_dict = {}
for x in enumerate(locations):
try:
ratio = (
gdf.loc[locations[x[0]], 'Manager'].values[0] /
gdf.loc[locations[x[0]], 'Associate'].values[0]
)
mgr_assoc_dict.update({x[1]: ratio})
except:
pass
# Create the new df.
ratio_df = pd.DataFrame(
[mgr_assoc_dict, mgr_asst_dict],
index=['Mgr to Assoc', 'Mgr to Asst']
)
>>> ratio_df
Lakewood Broomfield
Mgr to Assoc 2.0 0.333333
Mgr to Asst 4.0 0.333333
>>> gdf
Job_Type
Location Job_Type
Broomfield Assistant 3
Associate 3
Manager 1
Chatfield Assistant 1
Associate 5
Lakewood Assistant 1
Associate 2
Manager 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Eric M |