Category "pandas"

Error while converting csv to parquet file using pandas

I would like to upload csv as parquet file to S3 bucket. Below is the code snippet. df = pd.read_csv('right_csv.csv') csv_buffer = BytesIO() df.to_parquet(csv_b

Multiple aggregations of the same column using pandas GroupBy.agg()

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df["returns"], without having to call agg() multiple times

add a column in dataframe based on existing value in another dataframe

I have a dataframe DF3 : zone_id combine 0 ABD 10 BCD 20 ABC 30 ABE and a second dataframe :combinaison_df: zone_id combine 0

How can I create a cross-tab of two columns in a dataframe in Python and generate a total row and column in the output?

I have created a dataframe from a CSV file and now I'm trying to create a cross-tab of two columns ("Personal_Status" and "Gender"). The output should look like

Missing data error on adfuller test although I cleaned for inf and nans

Currently I am working on a data set which has many time-dependent variables. I ran adfuller for all and changed the non-stationary ones to percentage change (t

Pandas approximating/rounding large numbers from csv

I am reading numbers from a csv file into a pandas dataframe. When the numbers I am reading are approximately >1E12, pandas will approximate the number to 3

How to create ratios using value counts and separate fields in Python?

Using the data frame shown below I'd like to create manager to assistant and manager to associate percentages/ ratios based/ per location. I'm looking for the

Searching a value within range between columns in pandas (not date columns and no sql)

thanks in advance for help. I have two dataframes as given below. I need to create column category in sold frame based on information in size frame. It should c

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I want to filter my dataframe with an or condition to keep rows with a particular column's values that are outside the range [-0.25, 0.25]. I tried: df = df[(df

Sort multiIndex table based on other table

I have a multiIndex data frame like this probe_names PLAGL1 GRB10 MEST H19 KCNQ1OT1 MEG3 MEG8 SNRPN \ Patient_1 0 0.55 0.53 0.53

Compare two excel files for the difference using pandas with multiple tabs

I found this nice script online which does a great job comparing the differences between 2 excel sheets but there's an issue - it doesn't work if the excel file

I have a dataframe with a json substring in 1 of the columns. i want to extract variables and make columns for them

imports json df = pd.read_json("C:/xampp/htdocs/PHP code/APItest.json", orient='records') print(df) I would like to create three columns extra: ['name','l

how to "transpose" datas from a date to another one in python

Sorry i had a lot of trouble explaining my problem in the title but i hope it will be more understandable with this example : i have a data source that tells me

Pandas rolling window cumsum, with incomplete series

I have a pandas df as follows: YEAR MONTH USERID TRX_COUNT 2020 1 1 1 2020 2 1 2 2020 3 1 1 2020 12

When one of my column in dataframe is nested list, how should i transform it to multi-dimensional np.array?

I have the following data frame. test = { "a": [[[1,2],[3,4]],[[1,2],[3,4]]], "b": [[[1,2],[3,6]],[[1,2],[3,4]]] } df = pd.DataFrame(test) df a b 0

Filter rows in dataframe based on value counts [duplicate]

I have a large dataframe/Questionaire df (871 x 24) containing a column named "Identifier" which stores an unique ID for each of the participa

Save multiple/distinct .CSV files after for loop execution

I have 65 xml files that I need to convert to .CSV, and save each converted file as a separate .CSV file. I have tried using a for loop but am not having any lu

Testing in pandas library: Why is function style chosen over class based testing?

Why is functional style testing facilitating testing compared to class based testing? Is this just additional library specific functionality or are there any ge

Groupby and create a dummy =1 if column values do not contain 0, =0 otherwise

My df id var1 A 9 A 0 A 2 A 1 B 2 B 5 B 2 B 1 C 1 C 9 D 7 D 2 D 0 .. desired output will ha

Importing pandas_profiling

I am working on Automating the EDA, while I want to import the pandas_profiling, I am facing an error: ImportError: cannot import name 'soft_unicode' from 'mark