'How to force PyStan to recompile a stan model?
I have a weighted Bayesian Logistic Regression model
weighted_stan_representation = """
data {
int<lower=0> n; // number of observations
int<lower=0> d; // number of predictors
array[n] int<lower=0,upper=1> y; // outputs
matrix[n,d] x; // inputs
vector<lower=0>[n] w; // coreset weights
}
parameters {
vector[d] theta; // auxiliary parameter
}
model {
theta ~ normal(0, 1);
target += w*bernoulli_logit_lpmf(y| x*theta);
}
"""
with data as such:
{'x': array([[-1.92220908, -0.86248914],
[-0.64517094, 0.40222473],
[-0.71675321, -1.2782317 ],
...,
[-2.0448459 , -0.11735602],
[-0.9622542 , -2.27172399],
[-1.09545494, -0.83435958]]),
'y': array([0, 0, 0, ..., 0, 0, 0]),
'w': array([1., 1., 1., ..., 1., 1., 1.]),
'd': 2,
'n': 10000}
I can get samples from the full posterior, i.e. with weights uniformly 1 by running
posterior = stan.build(model.weighted_stan_representation, data = full_data, random_seed = 100000)
fit = posterior.sample(num_chains = num_chains, num_samples = num_samples, num_warmup = num_warmup)
And I then want to use a sparse weight vector, and sample from the approximate sparse posterior using
coreset_posterior = stan.build(model.weighted_stan_representation, data = sparse_data)
coreset_samples = coreset_posterior.sample(num_chains = num_chains, num_samples = num_samples, num_warmup = num_warmup)
However when I access the samples, they are exactly equivalent between the two cases. I'm confident it has something to do with the model being cached when stan.build is first called, and so no new samples are ever actually being taken. This is because I get this output
Building: found in cache, done.
when I run the second stan representation. This is the first time I've used PyStan and I don't know how to get around this. There doesn't seem to be an option to force PyStan to recompile as far as I can tell.
Any help would be appreciated!
I've got the latest version of Python and PyStan installed.
Solution 1:[1]
There might more elegant ways to do this, but you can delete the cache folder in which your model is saved. After that you should be able to rebuild your model. You can use httpstan.models.calculate_model_name
to get the model's name in cache. You first need to get Stan's model description (weighted_stan_representation
in your case). You can also get the list of all the models name stored in cache with httpstan.cache.list_model_names()
.
Assuming you only want to delete the cache of your model weighted_stan_representation
, here what to do:
Be careful before copy-pasting this code, it will delete a non-empty folder !
# import the relevant modules
import shutils
import httpstan
# Get the the name of the folder where your model is saved
model_name = httpstan.models.calculate_model_name(weighted_stan_representation)
# Then get the path to this folder
model_path_in_cache = httpstan.cache.model_directory(model_name)
# Finally delete the folder and all the files it contains with shutils
shutil.rmtree(model_path_in_cache)
Look into the function from httpstan.cache
here for more functions related to cache.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Luc M |