'R: How to randomly sample one value from each column and bootstrap
I have 18 columns and 100 rows, where columns stand for 18 students and rows stand for their grades in 100 exams. Here is what I want: for each student, I want to randomly sample/select only one grade from all 100 grades. In other words, I want a sample with 18 columns and just 1 row. I have tried apply, sample functions, but all of these just don't work, and I don't know why. Any help would be greatly appreciated! Thank you so much!
bs = data.frame(matrix(nrow=1,ncol=18))
for (i in colnames(high)){
bs[,i]=sample(high[,i],1,replace=TRUE)
}
as.data.frame(lapply(high[,i],sample,18,replace=TRUE))
Solution 1:[1]
Try this
apply(data, 2, sample, size = 1)
Use @StupidWolf's data for test:
set.seed(101)
apply(high, 2, sample, size = 1)
# student1 student2 student3 student4 student5 student6 student7 student8 student9 student10 student11 student12 student13 student14 student15 student16 student17 student18
# 0.57256477 0.84338121 0.71225050 0.56432392 0.23865929 0.23563641 0.51903694 0.36692427 0.51577410 0.45780908 0.19434773 0.70247028 0.60383059 0.25451088 0.78583242 0.86241707 0.05360842 0.61892604
Solution 2:[2]
Lets say your data is like this:
set.seed(100)
high = matrix(runif(100*18),ncol=18)
colnames(high) = paste0("student",1:18)
rownames(high) = paste0("exam",1:100)
head(high)
student1 student2 student3 student4 student5 student6 student7
exam1 0.30776611 0.32741508 0.3695961 0.8495923 0.5112374 0.2202326 0.03176634
exam2 0.25767250 0.38947869 0.9563228 0.6532260 0.2777107 0.7431595 0.57970549
exam3 0.55232243 0.04105275 0.9135767 0.9508858 0.3606569 0.3059573 0.15420484
exam4 0.05638315 0.36139663 0.8233363 0.6172230 0.4375279 0.4022088 0.12527050
What you want to do, is sample 1 to 100, 18 times with replacement (to be similar to bootstrap, thanks to @H1 for pointing this out):
set.seed(101)
take=sample(1:100,18,replace=TRUE)
take
[1] 73 57 46 95 81 58 95 61 60 59 99 3 32 9 96 99 99 98
As you can see from above, 99 is taken quite a few times with replace=TRUE
. We will take the 73 entry of column1, 56 entry of column2 and so on. This can be done with:
high[cbind(take,1:18)]
[1] 0.57256477 0.84338121 0.71225050 0.56432392 0.23865929 0.23563641
[7] 0.51903694 0.36692427 0.51577410 0.45780908 0.19434773 0.70247028
[13] 0.60383059 0.25451088 0.78583242 0.86241707 0.05360842 0.61892604
Solution 3:[3]
You can use the sample()
to randomly select a column.
I have created a small sample of the data here. It will be helpful if you provide the sample data for the best comprehension of the problem.
# sample data
df <- data.frame(
student1 = c(50, 45, 86, 30),
student2 = c(56, 78, 63, 58),
student3 = c(88, 60, 75, 93),
student4 = c(87, 33, 49, 11),
student5 = c(85, 96, 55, 64)
)
Then you loop through each exam record and randomly chose a student's grade and store it in a vector. As a final step, since you want a data frame, you can convert the vector to a data frame.
# column names
students <- colnames(df)
# empty vector
vals <- c()
for(s in students) {
grade <- sample(df[[s]], 1)
vals <- c(vals, grade)
}
finalDF <- as.data.frame(t(vals))
names(finalDF) <- students
finalDF
The output for 2 iterations I ran are -
student1 student2 student3 student4 student5
1 45 78 93 87 64
student1 student2 student3 student4 student5
1 45 63 93 87 96
The other answers are really smart, but nonetheless, I hope this helps!
Solution 4:[4]
You can rearrange your dataframe:
df <- df[sample(1:nrow(df)),]
then you take the first observation of each group in your dataframe:
df.pick <- df[!duplicated(df$group) , ]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | |
Solution 4 | Martin Gal |