Category "pandas"

How to control the color of a specific column in a bar plot depending on it's xtick label?

I have a number of plots that show transcribed text from a speech to text engine in which I want to show the bars where the S2T engine transcribed correctly. I

Pandas - Compare each row with one another across dataframe and list the amount of duplicate values

I would like to add a column to an existing dataframe that compares every row in the dataframe against each other and list the amount of duplicate values. (I do

Filter dataframe with multiple conditions including OR

I wrote a little script that loops through constraints to filter a dataframe. Example and follow up explaining the issue are below. constraints = [['stand','=='

How to read a .log file in Python

Can you please help me with to code that I can use to read .Log file and then change '-' separated value to different column. The Content in the file is: Config

unable to parse using pd.json_normalize, it throws null with index values

Sample of my data: ID target 1 {"abc":"xyz"} 2 {"abc":"adf"} this data was a csv output that i imported as below in python data=pd.read_csv('location',convert

How to generate a map with clusters in Python

I have this dataframe below and I would like to know how I can make a graph similar to the one I inserted in the attachment. Can you help with some material or

Drop duplicate IDs keeping if value = certain value , otherwise keep first duplicate

>>> df = pd.DataFrame({'id': ['1', '1', '2', '2', '3', '4', '4', '5', '5'], ... 'value': ['keep', 'y', 'x', 'keep', 'x', 'Keep', 'x'

How to align text inside a cell in pandas

If I have a cell containing 2 characters and sometimes 3. I need to format the cell-like: <2spaces>XX<2spaces> and if contains 3 characters: <2s

Calculate MAPE and apply to PySpark grouped Dataframe [@pandas_udf]

Goal: Calculate mean_absolute_percentage_error (MAPE) for each unique ID. y - real value yhat - predicted value Sample PySpark Dataframe: join_df +----------+--

ValueError: X has 19 features, but LinearRegression is expecting 20 features as input

I'm trying to do polynomial regression using this code here: x_train,x_test,y_train,y_test = train_test_split(self.X, self.y, test_size=split, random_state=rand

Matplotlib plots in the wrong data format eventhough it is a datetime object [duplicate]

I have a problem when trying to plot a timeseries with matplotlib: df = pd.read_csv('myfile.dat', skiprows=1) #Change data type to datetime d

Batch conversion of xlsx files to txt in Python

I am trying to convert files with the extension xlxs to txt files. All items have the same name and are marked with a number. The problem is that there are no n

How to deal with SettingWithCopyWarning in Pandas

Background I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this: E:\FinReporter\FM_EXT

pandas.core.indexing.IndexingError: Too many indexers in scikit-learn agglomerative clustering

I have this data set: col_index Sample FID SNP1 SNP2 SNP3 SNP4 SNP5 LiverCysts ESRD_Aug2020 Renal_Survival_Aug2020 Group 1 23 0 1

how to fill a row in a subcolumn inside a multi column dataframe?

I have a multicolumn dataframe called full_week that the first column is the employees names and the other columns are columns with each weekday name starting f

How to use ODBC connection for pyspark.pandas

In my following python code I successfully can connect to MS Azure SQL Db using ODBC connection, and can load data into an Azure SQL table using pandas' datafra

combine two rows with negligible threshold on a groupby dataframe

I have a raw dataframe(simplified) as below: ColumnA startime endtime A 2022-02-23 08:22:32.113000+00:00 2022-02-23 10:54:04.163000+00:00 A 2022-02-23 10:54:04

calculate day of the year from 15minute timeseries data

I want a column with day of year. How do I calculate day of the year with 15-minute interval data which are resampled to daily entries. The following code produ

Convert text file into dataframe with custom multiple delimiter in python

i'am new to python. I have one txt file. it contains some data like 0: 480x640 2 persons, 1 cat, 1 clock, 1: 480x640 2 persons, 1 chair, Done. date (0.635s) Tue

Apply function to multiple row pandas

Suppose I have a dataframe like this 0 5 10 15 20 25 ... action_0_Q0 0.299098 0.093973 0.761735 0.0