Category "pandas"

Merge two dfs with multiple entries of same value in joining column

I have two data frames. The first is input which looks like the following: Merchant SKU Quantity Per Box NOB Shipment Status id_using_regex prepped_by_in

How to convert the dummy variable columns in to several columns?

I know how to unstack rows into columns, but how to deal with the following dataframe? date dummy avg lable 1-19 1 20 l1 1-19 0 40 l1 1-27 1 100 l2 1-27 0 140

How to insert nulls into a SQL Server table

I have the following dataframe: data = [['Alex', 182.2],['Bob', 183.2],['Clarke', 188.4], ['Kelly', NA]] df = pd.DataFrame(data, columns = ['Name', 'Height'])

Convert txt file, with variable categories, to dictionary and pandas df

I've converted a txt file that has a fixed number of variables, for every entry, to a dict and df. For example, if every entry in the txt file has a Date entry

Iterating through rows in a dataframe

I have a dataframe of 12 different teams with their own statistics. My objective is to repeat an entire series of steps for one team, and so on, until the last

Fill in uneven sized lists in Python

I have a 2D-List contains unequal size lengths, like this: lst = [[1,2,3],[-1,2,4],[0,2],[2,-3,6]] I use this code to insert a 0 if element size less 3: newlis

How to import data with dates as index from excel with pandas

I am importing the data with this command df = pd.read_excel('C:/Users/Me/Data.xlsx', sheet_name='Prices') and this is the result: The date is a common column

Unnest json dict to rows in pandas

I have the following dataset from a json file: mydf = pd.DataFrame({ 'load': { 0: {'id': '100','name': 'Joe'}, 1: {'id': '101','name': 'Ann'}, 2: {'id': '1

Modifying overlapping time period to include 1 day difference

I am trying to modify the overlapping time period problem so that if there is 1 day difference between dates, it should still be counted as an overlap. As long

How to locate print output? or convert it into jpeg?

I'm trying to show more than one dataframe with using tkinter. There are 2 options for me, showing dataframe directly by using print() and saving dataframe as j

How to control the color of a specific column in a bar plot depending on it's xtick label?

I have a number of plots that show transcribed text from a speech to text engine in which I want to show the bars where the S2T engine transcribed correctly. I

Pandas - Compare each row with one another across dataframe and list the amount of duplicate values

I would like to add a column to an existing dataframe that compares every row in the dataframe against each other and list the amount of duplicate values. (I do

Filter dataframe with multiple conditions including OR

I wrote a little script that loops through constraints to filter a dataframe. Example and follow up explaining the issue are below. constraints = [['stand','=='

How to read a .log file in Python

Can you please help me with to code that I can use to read .Log file and then change '-' separated value to different column. The Content in the file is: Config

unable to parse using pd.json_normalize, it throws null with index values

Sample of my data: ID target 1 {"abc":"xyz"} 2 {"abc":"adf"} this data was a csv output that i imported as below in python data=pd.read_csv('location',convert

How to generate a map with clusters in Python

I have this dataframe below and I would like to know how I can make a graph similar to the one I inserted in the attachment. Can you help with some material or

Drop duplicate IDs keeping if value = certain value , otherwise keep first duplicate

>>> df = pd.DataFrame({'id': ['1', '1', '2', '2', '3', '4', '4', '5', '5'], ... 'value': ['keep', 'y', 'x', 'keep', 'x', 'Keep', 'x'

How to align text inside a cell in pandas

If I have a cell containing 2 characters and sometimes 3. I need to format the cell-like: <2spaces>XX<2spaces> and if contains 3 characters: <2s

Calculate MAPE and apply to PySpark grouped Dataframe [@pandas_udf]

Goal: Calculate mean_absolute_percentage_error (MAPE) for each unique ID. y - real value yhat - predicted value Sample PySpark Dataframe: join_df +----------+--

ValueError: X has 19 features, but LinearRegression is expecting 20 features as input

I'm trying to do polynomial regression using this code here: x_train,x_test,y_train,y_test = train_test_split(self.X, self.y, test_size=split, random_state=rand