'H2o flow automl temporary sample frame
I have a large frame and used h2o flow run automl with a deep learning algo. However, the training metrics are calculated on a “temporary sample frame”. I could not find any info to this. I am not sure if the automl has been run on the full frame or just thus temp frame. Can someone help to understand or give a pointer? BTW, I don’t find this feature convenient.
Solution 1:[1]
This is a special case for Deep Learning models and is not the case for any other models produced by the AutoML process. For efficiency reasons (and since H2O is designed for very large datasets), the training metrics in Deep Learning models are calculated on a subset of the original training frame.
There is a parameter in the H2O Deep Learning algorithm called score_training_samples
that defaults to 10,000 rows (and since we do approximate sampling, also for efficiency reasons, it makes sense that the actual subset size is 9,993).
This should be a good approximation for training error. The only way to change this in Flow would be to train a Deep Learning model manually (outside the AutoML process).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Erin LeDell |