'sklearn lda gridsearchcv with pipeline
pipe = Pipeline([('reduce_dim', LinearDiscriminantAnalysis()),('classify', LogisticRegression())])
param_grid = [{'classify__penalty': ['l1', 'l2'],
'classify__C': [0.05,0.1, 0.3, 0.6, 0.8, 1.0]}]
gs = GridSearchCV(pipe, param_grid=param_grid, cv=5, scoring='roc_auc', n_jobs=3)
gs.fit(data, label)
I have a question for using pipeline and gridsearchcv. now i first try to use lda to reduce dimension, i want to know the process about gridsearchcv with pipeline ? split train/test->lda->fit & predict or lda->split train/test->fit & predict?
Solution 1:[1]
Part 1
First of all, the Pipeline
defines the steps that you are going to do.
In your case, first you use LinearDiscriminantAnalysis
and then LogisticRegression
.
Part 2
In
gs = GridSearchCV(pipe, param_grid=param_grid, cv=5, scoring='roc_auc', n_jobs=3)
you have defined cross validation (cv) = 5.
This number defines the number of folds ((Stratified)KFold) so you split your data automatically 5 times into train and test data and every single time you perform the analysis that Pipeline
defines.
Bottom line: the first case (split train/test->lda->fit & predict) seems better but the question is methodology-related.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | seralouk |