'How to get the remaining sample after using random.sample() in Python?
I have a large list of elements (in this example I'll assume it's filled with numbers). For example: l = [1,2,3,4,5,6,7,8,9,10]
Now I want to take 2 samples from that list, one with the 80% of the elements (randomly chosen of course), and the other one with the remaining elements (the 20%), so I can use the bigger one to train a machine-learning tool, and the rest to test that training. The function I used is from random
and I used it this way:
sz = len(l) #Size of the original list
per = int((80 * sz) / 100) #This will be the length of the sample list with the 80% of the elements (I guess)
random.seed(1) # As I want to obtain the same results every time I run it.
l2 = random.sample(l, per)
I'm not totally sure, but I believe that with that code I'm getting a random sample with the 80% of the numbers.
l2 = [3,4,7,2,9,5,1,8]
Nonetheless, I can't seem to find the way to get the other sample list with the remaining elements l3 = [6,10]
(the sample()
function does not remove the elements it takes from the original list). Can you please help me? Thank you in advance.
Solution 1:[1]
For me the following code worked to randomly split a list into two (training/testing) sets, even though most machine learning libraries include easy to use splitting functions as mentioned before:
l = [1,2,3,4,5,6,7,8,9,10]
sz = len(l)
cut = int(0.8 * sz) #80% of the list
shuffled_l = random.shuffle(l)
l2 = shuffled_l[:cut] # first 80% of shuffled list
l3 = shuffled_l[cut:] # last 20% of shuffled list
Solution 2:[2]
You can simply do:
from random import sample
data = [1, 2, 3, 4, 5]
training = sample(a, len(data)*cut)
testing = [value for value in data if value not in training]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | L. Brasi |
Solution 2 | cigien |