'Using Scikit's StandardScaler correctly across multiple programs
I am having a question that is very similar to this topic but I want to reuse the StandardScaler
instead of LabelEncoder
. Here's what I have done:
# in one program
dict = {"mean": scaler.mean_, "var": scaler.var_}
# and save the dict
# in another program
# load the dict first
new_scaler = StandardScaler()
new_scaler.mean_ = dict['mean'] # Hoever it doesn't work
new_scaler.var_ = dict['var'] # Doesn't work either...
I also tried set_params
but it can only change these parameters: copy
, with_mean
, and with_std
.
So, how can I re-use the scaler I got in program one? Thanks!
Solution 1:[1]
Just pickle the whole thing.
Follow the official docs.
You can either use python's standard-pickle from the first link or the specialized joblib-pickle mentioned in the second link (which i recommend; often more efficient, although not that important for this simple kind of object = scaler):
import joblib
import sklearn.preprocessing as skp
new_scaler = skp.StandardScaler()
# ...fit it... do something ...
joblib.dump(new_scaler , 'my_scaler.pkl') # save to disk
loaded_scaler = joblib.load('my_scaler.pkl') # load from disk
If you by any chance want to store your sklearn-objects in databases like MySQL, MongoDB, Redis and co., the above example using file-based storage won't work of course.
The easy approach then: use python-pickle's dumps which will dump to a bytes-object (ready for most DB-wrappers).
For the more efficient joblib, you have to use python's BytesIO to use it in a similar way (as the method itself is file-based, but can be used on file-like objects).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Robin Seerig |