Category "scikit-learn"

Looping through each row in array to calculate cosine similarity

I have a subset of a dataframe that looks like: <OUT> PageNumber english_only_tags 175 flower architecture people 162 hair red bobbles

Polynomial Expansion without sklearn

I want to try and recreate this functions from scratch (without using sklearn): # The matrix is M which is 1000x10 matrix. from sklearn.preprocessing import Po

Pass information between pipeline steps in sklearn

I am working on a simple text generation problem with LSTMs. To make the preprocessing more compact and reproducible, I decided to implement everything in sklea

Cosine similarity and SVC using scikit-learn

I am trying to utilize the cosine similarity kernel to text classification with SVM with a raw dataset of 1000 words: # Libraries import numpy as np from sklear

Is this a valid approach to scale your target in machine learning without leaking information? [closed]

Consider a housing price dataset, where the goal is to predict the sale price. I would like to do this by predicting the "Sale price per Squar

TypeError: 'module' object is not iterable in django 4

TypeError: 'module' object is not iterable in django 4 I am getting the above error, it has persisted long enough than at this point I really need help. I am u

XGBoost model quantization - Sklearn model quantization

I am looking for solutions to quantize sklearn models. I am specifically looking for XGBoost models. I did find solutions to quantize pytorch and tensorflow mod

How to slice a XGBClassifier/XGBRegressor model into sub-models?

This document shows that a XGBoost API trained model can be sliced by following code: from sklearn.datasets import make_classification import xgboost as xgb bo

How to slice a XGBClassifier/XGBRegressor model into sub-models?

This document shows that a XGBoost API trained model can be sliced by following code: from sklearn.datasets import make_classification import xgboost as xgb bo

No module name 'sklearn.ensemble.forest'

I am using this code to detect face_spoofing import numpy as np import cv2 import joblib from face_detector import get_face_detector, find_faces def calc_hist(

Replace entire pandas dataframe after scaling without warning

I have tried this according to this awnser x = df[feature_collums] y = df[[label_column]][label_column] from sklearn.preprocessing import MinMaxScaler scaler =

How to set AUC as scoring method while searching for hyperparameters?

I want to perform a random search, in classification problem, where the scoring method will be chosen as AUC instead of accuracy score. Have a look at my code f

Using StandardScaler for multiple columns

I want to use StandardScaler only on certain columns, however my code resulted in error. Here is my code: from sklearn.preprocessing import StandardScaler num_c

sklearn.model_selection.train_test_split random state

I am training a computer vision model. I divide the images in 3 datasets: training, validation and testing. So that I get always the same images in training, va

Is it possible to average the output of multiple classification models using pipeline in sklearn?

As an example, suppose there is a random forest and a logistic regression model that accept the same input data, and I want the inference result to be the avera

Issue fitting a SGD Classifier

I'm following the book Hands-on Machichine Learning by Aurelien Geron, more specifically, where it begins to go into classifiers. I'm following the code from th

Sklearn Pipeline with KernelExplainer and data to predict as DataFrame leads to error

I want to calculate shap values from a sklearn pipeline with a preprocessor and a model. When i do it with the code below I get 0 for all shape_values def creat

Cannot install Scipy in FreeBSD 13 with Python 3.10

I am trying install scipy in FreeBSD 13. I have built python 3.10 on FreebSD 13 and managed to install pandas, matplotlib and numpy on a virtual environment whi

Negative BIC values for GaussianMixture in scikit-learn (sklearn)

In scikit-learn, the GaussianMixture object has the method bic(X) that implements the Bayesian Information Criterion to choose the number of components that bet

Why the sum "value" isn't equal to the number of "samples" in scikit-learn RandomForestClassifier?

I built a random forest by RandomForestClassifier and plot the decision trees. What does the parameter "value" (pointed by red arrows) mean? And why the sum of

Category "scikit-learn"

Other Categories