Category "pandas"

Filter dataframe with multiple conditions including OR

I wrote a little script that loops through constraints to filter a dataframe. Example and follow up explaining the issue are below. constraints = [['stand','=='

How to read a .log file in Python

Can you please help me with to code that I can use to read .Log file and then change '-' separated value to different column. The Content in the file is: Config

unable to parse using pd.json_normalize, it throws null with index values

Sample of my data: ID target 1 {"abc":"xyz"} 2 {"abc":"adf"} this data was a csv output that i imported as below in python data=pd.read_csv('location',convert

How to generate a map with clusters in Python

I have this dataframe below and I would like to know how I can make a graph similar to the one I inserted in the attachment. Can you help with some material or

Drop duplicate IDs keeping if value = certain value , otherwise keep first duplicate

>>> df = pd.DataFrame({'id': ['1', '1', '2', '2', '3', '4', '4', '5', '5'], ... 'value': ['keep', 'y', 'x', 'keep', 'x', 'Keep', 'x'

How to align text inside a cell in pandas

If I have a cell containing 2 characters and sometimes 3. I need to format the cell-like: <2spaces>XX<2spaces> and if contains 3 characters: <2s

Calculate MAPE and apply to PySpark grouped Dataframe [@pandas_udf]

Goal: Calculate mean_absolute_percentage_error (MAPE) for each unique ID. y - real value yhat - predicted value Sample PySpark Dataframe: join_df +----------+--

ValueError: X has 19 features, but LinearRegression is expecting 20 features as input

I'm trying to do polynomial regression using this code here: x_train,x_test,y_train,y_test = train_test_split(self.X, self.y, test_size=split, random_state=rand

Matplotlib plots in the wrong data format eventhough it is a datetime object [duplicate]

I have a problem when trying to plot a timeseries with matplotlib: df = pd.read_csv('myfile.dat', skiprows=1) #Change data type to datetime d

Batch conversion of xlsx files to txt in Python

I am trying to convert files with the extension xlxs to txt files. All items have the same name and are marked with a number. The problem is that there are no n

How to deal with SettingWithCopyWarning in Pandas

Background I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this: E:\FinReporter\FM_EXT

pandas.core.indexing.IndexingError: Too many indexers in scikit-learn agglomerative clustering

I have this data set: col_index Sample FID SNP1 SNP2 SNP3 SNP4 SNP5 LiverCysts ESRD_Aug2020 Renal_Survival_Aug2020 Group 1 23 0 1

how to fill a row in a subcolumn inside a multi column dataframe?

I have a multicolumn dataframe called full_week that the first column is the employees names and the other columns are columns with each weekday name starting f

How to use ODBC connection for pyspark.pandas

In my following python code I successfully can connect to MS Azure SQL Db using ODBC connection, and can load data into an Azure SQL table using pandas' datafra

combine two rows with negligible threshold on a groupby dataframe

I have a raw dataframe(simplified) as below: ColumnA startime endtime A 2022-02-23 08:22:32.113000+00:00 2022-02-23 10:54:04.163000+00:00 A 2022-02-23 10:54:04

calculate day of the year from 15minute timeseries data

I want a column with day of year. How do I calculate day of the year with 15-minute interval data which are resampled to daily entries. The following code produ

Convert text file into dataframe with custom multiple delimiter in python

i'am new to python. I have one txt file. it contains some data like 0: 480x640 2 persons, 1 cat, 1 clock, 1: 480x640 2 persons, 1 chair, Done. date (0.635s) Tue

Apply function to multiple row pandas

Suppose I have a dataframe like this 0 5 10 15 20 25 ... action_0_Q0 0.299098 0.093973 0.761735 0.0

How to get this single column data into data frame with appropriate columns

I am learning pandas and Data Science and am a beginner. I have a data as following Rahul 1 2 5 Suresh 4 2 1 Dharm 1 3 4 I would like it in my dataframe as Rah

How can I group by below table from Customer ID and Product Code and get them to one row?

How can I group by below table from Customer ID and Product Code and get them to one row as below using Python? Customer ID Product Code Days since the last