Category "pandas"

How to duplicate each row having only one column different than the previous row pandas data frame?

I have a big data and I want to duplicate each row just below the original column by changing just one column value I want to copy the previous row value in pl

Degeneracy given a graph

An exercise requires to determine the degenerative level of a graph. To do that, I have found useful the following code (source: https://www.geeksforgeeks.org/f

Getting a value Error : how to use string data type in model.fit for jupyter using DecisionTreeClassifier?

this is the code import pandas as pd from sklearn.tree import DecisionTreeClassifier dataset = pd.read_csv("emotion.csv") X = dataset.drop(columns = ["mood"]) y

upgrade from pandas 1.1.5 to latest version

simply not able to upgrade Pandas, tried below python --version Python 3.6.8 pip3 install --upgrade pandas Defaulting to user installation because normal site-p

Calculate the difference in days between two date fields

I have a problem. I have two date fields fromDate and toDate. The toDate also contains a timestamp, e.g. 2021-03-22T18:59:59Z. The problem is that I want to cal

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Importing from pyxdameraulevenshtein gives the following error, I have pyxdameraulevenshtein==1.5.3, pandas==1.1.4 and scikit-learn==0.20.2. Numpy is 1.16.1.

'CRF' object has no attribute 'keep_tempfiles'

I have imported ` from itertools import chain import nltk import sklearn import scipy.stats import sklearn_crfsuite from sklearn_crfsuite import scorers,CR

Minimal decimal number in pandas dataframe

I'm trying to make a dataframe in pandas where all columns have at least 6 decimals. I've tried splitting the decimal numbers on their . and look at the decimal

How to get the Isoweek from DatetimeIndex

I have a simple pandas dataframe with a date as index: import pandas as pd data = {'date': ['2010-01-04','2014-03-15','2017-07-15','2019-12-28','2005-01-03'],

Select previous row every hour in pandas

I am trying to obtain the closest previous data point every hour in a pandas data frame. For example: time value 0 14:59:58 15 1 15:00:10 2

How to match Datetimeindex for all but the year?

I have a dataset with missing values and a Datetimeindex. I would like to fill this values with the mean values of other values reported at the same month, day

pandas equivalent to mutate accros

I would like to perform following operation in Pandas: library(tidyverse) df <- tibble(mtcars) df %>% select(ends_with('t')) %>% head(3) # A

Pyinstaller - app without needed library on macOS

I've prepared python script (using pycharm in both OS, projects with venv, pyinstaller cpmmand run in pycharm terminal) which begins with 'import pandas' and wa

importing data from csv - could not convert string to float

I am having difficulties importing some data from a csv file. Input from csv file (extract): Speed;A [rpm];[N.m] 700;-72,556 800;-58,9103 900;-73,1678

convert float64 (from excel import) to str using pandas

although the same question has been asked multiple times. I dont seem to make it work. I use python 3.8 and I rean an excel file like this df = pd.read_excel(r"

Filter Pandas Dataframe based on List of substrings

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new su

convert float64 (from excel import) to str using pandas

although the same question has been asked multiple times. I dont seem to make it work. I use python 3.8 and I rean an excel file like this df = pd.read_excel(r"

Filter Pandas Dataframe based on List of substrings

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new su

How to extract the query result from a Hive job output logs using DataprocHiveOperator?

I am trying to build a data migration pipeline using Airflow, source being a Hive table on a Dataproc cluster and the destination is BigQuery. I'm using Datapro

Deleting multiple rows under same App Name but with different number of reviews

I have a dataframe having many columns, 2 of them being 'App' and 'Reviews'. I discovered that for the same app there are multiple rows because they differ in t