'Sampling basis column values
I have a data frame which looks something like this
week Name State resolution_version resolution_status
19 smahend RESOLVED 1 FIXED
19 tcvian RESOLVED 1 FIXED
19 velag RESOLVED 1 FIXED
19 benhi RESOLVED 1 FIXED
19 ysaik RESOLVED 1 FIXED
19 saenta RESOLVED 1 FIXED
19 moucb RESOLVED 1 FIXED
19 namees RESOLVED 1 FIXED
19 namees RESOLVED 1 FIXED
19 vijgra RESOLVED 1 FIXED
and has more columns.
I am trying to get a same sample size for each Name, like 25% of all them i.e. 25% of all cases by smahend, 25% by tcvian. I tried .sample(frac=) but it is filtering the dataset for the assigned fraction value, but not for each name
More Info: The problem statement is that in the raw data for each name we can have multiple row entries and I am trying to get a certain % (sample) for each name eg smahend has 1000 entires, ysaik has 500
so I am trying to get 50% of each name; so input is csv with all population data and out is csv with certain defined % of each name
code I tried :
f4=gf1.apply(lambda x: x.sample(frac=(str1/100) ,random_state=str3, replace=False ))
gf2=f3[(str1*f3['count'])/100<str2].groupby('auditor')
f5=gf2.apply(lambda x: x.sample(n=str2 , replace=False )
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|