'How to generate random correlated uniform data from a correlation matrix?
I have a very specific problem to solve that makes researching a solution quite hard because I lack the requisite math skills.
My goal: Given a covariance/correlation matrix and variable ranges, generate some random data. This data needs to meet 3 important conditions:
The covariance/correlation of this data should be similar to the provided covariance/correlation matrix.
The ranges of the variables of this data (columns) should be bounded by the provided ranges.
Each variable has a uniform distribution.
Is there perhaps an R package or function that can generate this data conditions using those provided arguments? Maybe code in some other language that I could then rewrite in R?
EDIT1:
In the case that uniformity (condition 3) cannot be met, is there perhaps an R package or function that can generate data that meets just conditions 1 and 2? In other words, I don't care what distribution the variables take.
EDIT2:
Here is my first very terrible attempt at this problem. All it does so far is create positively correlated and uniform data. Tests are at the bottom:
generate_correlated_variables <- function(variable_ranges, numPoints = 100, nbins = 10) {
df <- matrix(0, nrow = numPoints, ncol = length(variable_ranges))
colnames(df) <- names(variable_ranges)
for (i in 1:length(variable_ranges)) {
df[,i] <- runif(numPoints, min = as.numeric(variable_ranges[[i]][1]), max = as.numeric(variable_ranges[[i]][2]))
}
#Sample one variable and determine how many points fall in each bin
#These amounts will be used to sample the rest of the variables
df[,1] <- runif(numPoints, min = as.numeric(variable_ranges[[1]][1]), max = as.numeric(variable_ranges[[1]][2]))
bin_width <- (variable_ranges[[1]][2] - variable_ranges[[1]][1])/nbins
breaks_vec <- seq(variable_ranges[[1]][1], variable_ranges[[1]][2], by = bin_width)
table <- table(cut(df[,1], breaks = breaks_vec, include.lowest = TRUE))
binned_ranges_list <- vector(mode = "list", length = length(variable_ranges))
names(binned_ranges_list) <- names(variable_ranges)
temp <- vector(mode = "list", length = nbins)
for (i in 1:length(variable_ranges)) {
bin_width <- (variable_ranges[[i]][2] - variable_ranges[[i]][1])/nbins
breaks_vec <- seq(variable_ranges[[i]][1], variable_ranges[[i]][2], by = bin_width)
for (j in 1:nbins) {
temp[[j]][1] <- breaks_vec[j]
temp[[j]][2] <- breaks_vec[j+1]
}
binned_ranges_list[[i]] <- temp
}
print(binned_ranges_list)
#sample ranges
for (i in 1:length(variable_ranges)) {
sampled_values_vec <- c()
for (j in 1:nbins) {
sample <- runif(n = table[j], min = binned_ranges_list[[i]][[j]][1], max = binned_ranges_list[[i]][[j]][2])
sampled_values_vec <- c(sampled_values_vec, sample)
}
df[,i] <- sampled_values_vec
}
return(df)
}
#Tests
variable_ranges = list(A = c(1, 100), B = c(50, 100), C = c(1, 10))
a <- generate_correlated_variables(variable_ranges = variable_ranges, numPoints = 100, nbins = 2)
cor(a)
b <- generate_correlated_variables(variable_ranges = variable_ranges, numPoints = 100, nbins = 50)
cor(b)
Solution 1:[1]
Here is the idea how to get correlated uniform random numbers.
Suppose you have source of independent bits
First generate array X bits (say 2 bits).
Then generate another random array with upper (middle, lower, some position...) bits replaced from step 1.
Again generate another random array with upper (middle, lower, some position...) bits replaced from step 1.
Arrays from step 2 and 3 would be uniform, but correlated.
Code for illustration (sorry, Python)
import numpy as np
N=1000000
rng = np.random.default_rng()
m = np.empty(N, dtype=np.uint32); m.fill(2*1073741824-1) # mask 2^31-1
f = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
f = f - np.bitwise_and(f, m) # upper three bits
q = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
z = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
print("Uncorrelated")
print(np.corrcoef([q, z]))
q = f + np.bitwise_and(m, q)
z = f + np.bitwise_and(m, z)
print("Correlated")
print(np.corrcoef([q, z]))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |