'Sampling basis column values

I have a data frame which looks something like this

week    Name State  resolution_version  resolution_status
19  smahend RESOLVED    1   FIXED
19  tcvian  RESOLVED    1   FIXED
19  velag   RESOLVED    1   FIXED
19  benhi   RESOLVED    1   FIXED
19  ysaik   RESOLVED    1   FIXED
19  saenta  RESOLVED    1   FIXED
19  moucb   RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  namees  RESOLVED    1   FIXED
19  vijgra  RESOLVED    1   FIXED

and has more columns.

I am trying to get a same sample size for each Name, like 25% of all them i.e. 25% of all cases by smahend, 25% by tcvian. I tried .sample(frac=) but it is filtering the dataset for the assigned fraction value, but not for each name

More Info: The problem statement is that in the raw data for each name we can have multiple row entries and I am trying to get a certain % (sample) for each name eg smahend has 1000 entires, ysaik has 500

so I am trying to get 50% of each name; so input is csv with all population data and out is csv with certain defined % of each name

code I tried :

    f4=gf1.apply(lambda x: x.sample(frac=(str1/100) ,random_state=str3, replace=False ))
    gf2=f3[(str1*f3['count'])/100<str2].groupby('auditor')
    f5=gf2.apply(lambda x: x.sample(n=str2 , replace=False )

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Sampling basis column values

Sources

Related Questions