Category "scikit-learn"

Reading ARFF from ZIP with zipfile and scipy.io.arff

I want to process quite big ARFF files in scikit-learn. The files are in a zip archive and I do not want to unpack the archive to a folder before processing. He

Create Bayesian Network and learn parameters with Python3.x [closed]

I'm searching for the most appropriate tool for python3.x on Windows to create a Bayesian Network, learn its parameters from data and perform

UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples

I'm getting this weird error: classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.

Classification metrics can't handle a mix of binary and continuous targets [duplicate]

I try to train and test several scikit-learn models and attempt to print off the accuracy. Only some of these models work, others fail with th

CatBoost precision imbalanced classes

I use a CatBoostClassifier and my classes are highly imbalanced. I applied a scale_pos_weight parameter to account for that. While training with an evaluation d

Create ngrams only for words on the same line (disregarding line breaks) with Scikit-learn CountVectorizer

When using the scikit-learn library in Python, I can use the CountVectorizer to create ngrams of a desired length (e.g. 2 words) like so: from sklearn.metrics.

Get the coefficients of my sklearn polynomial regression model in Python

I want to get the coefficients of my sklearn polynomial regression model in Python so I can write the equation elsewhere.. i.e. ax1^2 + ax + bx2^2 + bx2 + c I'

How can i impelement SMOTE inside a columnTransformer?

I'm trying to implement SMOTENC inside a column transformer. However I'm getting error. The code and the error is provided below. #Create a mask for categorical

Kmean clustering labels in Python

I have a dataset with 7 labels in the target variable. X = data.drop('target', axis=1) Y = data['target'] Y.unique() array(['Normal_Weight', 'Overweight_Level_

Proper inputs for Scikit Learn roc_auc_score and ROC Plot

I am trying to determine roc_auc_score for a fit model on a validation set. I am seeing some conflicting information on function inputs. Documentation says: "y_

How to apply cross_val_score to cross valid our own model

Usually, we apply cross_val_score to the Sklearn models by doing the following way. scores = cross_val_score(clf, X, y, cv=5, scoring='f1_macro') Now I have my

Get U, Sigma, V* matrix from Truncated SVD in scikit-learn

I am using truncated SVD from scikit-learn package. In the definition of SVD, an original matrix A is approxmated as a product A ≈ UΣV* where

Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

I have a series like: df['ID'] = ['ABC123', 'IDF345', ...] I'm using scikit's LabelEncoder to convert it to numerical values to be fed into the RandomForestC

Scikit-Learn: How to retrieve prediction probabilities for a KFold CV?

I have a dataset that consists of images and associated descriptions. I've split these into two separate datasets with their own classifiers (visual and textual

sklearn train test split by year

I have a dataset that goes from 2016 to 2020 with a 'Year' column. I would like to use 2016-2017 as train data and 2018-2020 as test data. Is there any easy met

What are the arguments for scipy.stats.uniform?

I'm trying to create a uniform distribution between two numbers (lower bound and upper bound) in order to feed it to sklearn's ParameterSampler. I am using scip

Sklearn-GMM on large datasets

I have a large data-set (I can't fit entire data on memory). I want to fit a GMM on this data set. Can I use GMM.fit() (sklearn.mixture.GMM) repeatedly on min

pandas create Cross-Validation based on specific columns

I have a dataframe of few hundreds rows , that can be grouped to ids as follows: df = Val1 Val2 Val3 Id 2 2 8 b 1 2 3 a 5

What is the difference between SVC and SVM in scikit-learn?

From the documentation scikit-learn implements SVC, NuSVC and LinearSVC which are classes capable of performing multi-class classification on a dataset. By the

"Input contains NaN, infinity or a value too large for dtype('float64')"

I am trying to train a model, but I am getting this error Input contains NaN, infinity or a value too large for dtype('float64'). Here's part of my code, how