'How to force PyStan to recompile a stan model?

I have a weighted Bayesian Logistic Regression model

weighted_stan_representation = """
data {
  int<lower=0> n; // number of observations
  int<lower=0> d; // number of predictors
  array[n] int<lower=0,upper=1> y; // outputs
  matrix[n,d] x; // inputs
  vector<lower=0>[n] w; // coreset weights
}
parameters {
  vector[d] theta; // auxiliary parameter
}
model {
  theta ~ normal(0, 1);
  target += w*bernoulli_logit_lpmf(y| x*theta);
  
  
}
"""

with data as such:

{'x': array([[-1.92220908, -0.86248914],
        [-0.64517094,  0.40222473],
        [-0.71675321, -1.2782317 ],
        ...,
        [-2.0448459 , -0.11735602],
        [-0.9622542 , -2.27172399],
        [-1.09545494, -0.83435958]]),
 'y': array([0, 0, 0, ..., 0, 0, 0]),
 'w': array([1., 1., 1., ..., 1., 1., 1.]),
 'd': 2,
 'n': 10000}

I can get samples from the full posterior, i.e. with weights uniformly 1 by running

posterior = stan.build(model.weighted_stan_representation, data = full_data, random_seed = 100000)
fit = posterior.sample(num_chains = num_chains, num_samples = num_samples, num_warmup = num_warmup)

And I then want to use a sparse weight vector, and sample from the approximate sparse posterior using

coreset_posterior = stan.build(model.weighted_stan_representation, data = sparse_data)
coreset_samples = coreset_posterior.sample(num_chains = num_chains, num_samples = num_samples, num_warmup = num_warmup)

However when I access the samples, they are exactly equivalent between the two cases. I'm confident it has something to do with the model being cached when stan.build is first called, and so no new samples are ever actually being taken. This is because I get this output

Building: found in cache, done.

when I run the second stan representation. This is the first time I've used PyStan and I don't know how to get around this. There doesn't seem to be an option to force PyStan to recompile as far as I can tell.

Any help would be appreciated!

I've got the latest version of Python and PyStan installed.



Solution 1:[1]

There might more elegant ways to do this, but you can delete the cache folder in which your model is saved. After that you should be able to rebuild your model. You can use httpstan.models.calculate_model_name to get the model's name in cache. You first need to get Stan's model description (weighted_stan_representation in your case). You can also get the list of all the models name stored in cache with httpstan.cache.list_model_names().

Assuming you only want to delete the cache of your model weighted_stan_representation, here what to do:

Be careful before copy-pasting this code, it will delete a non-empty folder !

# import the relevant modules
import shutils
import httpstan

# Get the the name of the folder where your model is saved
model_name = httpstan.models.calculate_model_name(weighted_stan_representation)

# Then get the path to this folder
model_path_in_cache = httpstan.cache.model_directory(model_name)

# Finally delete the folder and all the files it contains with shutils
shutil.rmtree(model_path_in_cache)

Look into the function from httpstan.cache here for more functions related to cache.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Luc M