'Stratified sampling in python
I would like to sample a dataframe base in python. This sample has to be stratified by specific variables. I tried sklearn.cross_validation but the problem is that you can stratify with only one variable, and I need to stritify my population according to several variables.
So what I am looking for is the equivalent of proc survey, (strata instruction in SAS) or svydesign(in R). Is this function exist in python ?
I found on this page the function stratified_samples https://gist.github.com/spacelis/6088623 but there is no documentation or example of use and it is very hard to understand how you enter your stratified variable.
Thanks for your help
Solution 1:[1]
This is an old question, but for the benefit of those who arrived here from search:
There is a relatively new package in Python called samplics
. This is an equivalent to the survey
library in R. I am not experienced in SAS, though I imagine it should over that too.
samplics
is built to cover many aspects of complex survey design, including sampling, weighting and estimation. There is an example of sampling by locality on the github page.
Other packages of interest (though documentation is slightly sparse):
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | TangibleTech |