Category "dataframe"

condition should be a Column dataframe PySpark

When using df_hdr_join.count() > 0 in when statement, it gives an error 'condition should be a Column'. I tried following. df_result = df.withColumn('NUM', w

How to transform a list of dictionary into a table

How to transform a list of dictionary into a table. Here is the table: [{'wow': 1, 'item': 1, 'money': 1}, {'best': 1, 'sock': 1, 'saved': 1, 'found'

Azure Synapse Serverless SQL Pools - how to optimize transformations using notebooks and load tables into ADLSG2

We use Synapse Notebooks to perform data transformations and load the data into fact and dimension tables within our ADLSG2 data lake. We are disappointed with

pandas groupby dropping columns

I'm doing a simple group by operation, trying to compare group means. As you can see below, I have selected specific columns from a larger dataframe, from which

How to convert data in Polars?

I used .write_ipc from Polars to store as a feather file. It turns out that the numerical strings have been saved as integers. So I need to convert the columns

Pandas Groupby with Aggregates

I am working with pandas and I was wondering if there is a difference based on which statistical functions are applied as shown in the below examples and if the

Combine multiple dataframes wit pandas

I use the following script to measure the average RGB color of the picture in a selected path. I tried to make 1 dataframe with pd.concat but it doesn't work ou

group time stamps based on intervals

I have a dataset that looks like this: main_id time_stamp aaa 2019-05-29 08:16:05+05

Date interval average Python pandas

This is my dataframe: ID number Date purchase 1 2022-05-01 1 2021-03-03 1 2020-01-03 2 2019-01-03 2 2018-01-03 I want to get a horizontal dataframe with alle

How to reduce the size of my dataframe in Python?

working on NLP problem I ended up with a big features dataset dfMethod Out[2]: c0000167 c0000294 c0000545 ... c4721555 c4759703 c4759772 0

Flatten list of dictionaries in dataframe

I'm pulling data with Facebook Insights API and there are nested columns in the data I pull. I tried separating them by index but failed. column I want to split

Working with a multiindex dataframe, to get summation results over a boolean column, based on a condition from another column

We have a multiindex dataframe that looks like: date condition_1 condition_2 item1 0 2021-06-10 06:30:00+00:00

How to select elements from json column except unwanted columns in spark

I have various columns in Spark DataFrame, they are nested json columns. In configuration i will provide a list of columns and fields to remove from json. For e

How to combine two dataframes into one like this, using pandas and python?

Please see the picture here. I have two data frames and i need to convert it into single one, using merge or concat method and i am unable to do so. Can our com

How to combine two dataframes into one like this, using pandas and python?

Please see the picture here. I have two data frames and i need to convert it into single one, using merge or concat method and i am unable to do so. Can our com

Splitting a record into 12 months based on the date in pandas dataframe

I have the data in the below format stored in a pandas dataframe PolicyNumber InceptionDate 1 2017-12-28 00:00:00.0 https://i.stack.imgur.com/pE

Merge two dfs with multiple entries of same value in joining column

I have two data frames. The first is input which looks like the following: Merchant SKU Quantity Per Box NOB Shipment Status id_using_regex prepped_by_in

How to convert the dummy variable columns in to several columns?

I know how to unstack rows into columns, but how to deal with the following dataframe? date dummy avg lable 1-19 1 20 l1 1-19 0 40 l1 1-27 1 100 l2 1-27 0 140

changing dtype in polars

i created a data frame using polars. when datas are inserted, dtype of the coulmn automatically changes to what inserted. (i think its a feature of polars?) but

How to import data with dates as index from excel with pandas

I am importing the data with this command df = pd.read_excel('C:/Users/Me/Data.xlsx', sheet_name='Prices') and this is the result: The date is a common column