Category "pandas"

How do you compare columns 'a' and 'b' to return 'c' or 'd'?

I am trying to compare two columns and then return a third value from one of the two adjacent columns. I have read that using iterrows is not the correct way to

issue on pandas_ta adx indicator

when i run this code it's obvious get this error s missing close value. df['ADX'] = ta.adx(df['High'], df['Low'],length = 14) df output: TypeError

pd.read_csv - dates in pandas multiindex column names

I import a csv file into a pandas dataframe. df=pd.read_csv('data.csv',index_col=[0],header=[0,1]) My data has a column multiindex with two levels. Level(0) co

Verify that a column name is a unique identifier

I have a dataset called df_authors and in that dataset I have a column called author. I have to verify that df_authors.author is a unique identifier. What I tri

Clustering between two sets of data points - Python

I'm hoping to use k-means clustering to plot and return the position of each cluster's centroid. The following groups two sets of xy scatter points into 6 clust

Sklearn Pipeline with KernelExplainer and data to predict as DataFrame leads to error

I want to calculate shap values from a sklearn pipeline with a preprocessor and a model. When i do it with the code below I get 0 for all shape_values def creat

remove rows in dataframe which are not all 1 or all 0

I need to retain rows in the dataframe which has all row values as 0 or all 1. a = np.repeat(0,10) b = np.repeat(1,10) ab = pd.DataFrame({'col1':a,'col2':b}).tr

Sort columns values based on floats inside a string, then concat

I'm working on a pretty messy DF. Looking like this, but with 30 columns: a b some text (other text) : 56.3% (text again: 40%) again text (not same text) : 33%

How to save my first dataframe value with Pandas?

I just don't get it. I'm trying to save two different value(to different position) to an excel file, but the first one gets overwritten everytime. Why? @classme

How do I get a conditional total in pandas dataframe

I have a 32000 row 20 column dataframe consisting of data around many securities. Eg of target columns is as follows: The output that I want is like this: Eff

How do I get a conditional total in pandas dataframe

I have a 32000 row 20 column dataframe consisting of data around many securities. Eg of target columns is as follows: The output that I want is like this: Eff

Using a variable within str.contains()

Pretty much the title. Any way to use a variable to filter in str.contain()? i have been unsuccessful in using a str+@variable

JSON input to multiple excel file outputs

I have a JSON file that looks like this: { "Person A": { "Company A": { "Doctor": { "Morning": "2000", "Afternoon": "1200" },

string column conversion to float in Pandas DataFrame

I want to get left value (LD) pipe separated value from the DataFrame column "'CA Distance Nominal (LD | au)" here is the code. when I convert the string to flo

What is the method used by Pandas profiling tool to identify duplicates rows?

I'm looking for the rationale about the method used by pandas profiling tool to identify duplicates rows (in a dataframe with multiple columns)? I couldn't find

How to completely reorganise a table using aggregate data from qualitative information

I have a pandas dataframe which has the following layout: Column data type 'Water-Binder' float 'Fly Ash' float 'Age' int 'Strength %' float The age column i

How do I find all the polygons of a GeoDataframe that contain any point of another GeoDataframe in GeoPandas?

I have a GeoDataframe of about 3200 polygons, and another GeoDataframe of about 26,000 points. I want to get a third GeoDataframe of only the polygons that cont

Pandas : Create new column based on text values of other columns

My dataframe looks like this: id text labels 0 447 glutamine synthetase [protein] 1 447 GS

How to pivot a dataframe to a wide format?

Suppose I have a pandas DataFrame like this: import pandas as pd data = pd.DataFrame({'header': ['age', 'height', 'weight', 'country', 'age', 'height', 'weight

Create a Tensorflow Dataset from a Pandas data frame with numerous labels?

I am trying to load a pandas dataframe into a tensor Dataset. The columns are text[string] and labels[a list in string format] A row would look something like: