Efficiency of multiple chained str transformation and alternatives

I'm wanting to change a dataframe column so the values are lower case and also have their whitespace stripped. For this I used chained str transformations. df.l

Change NaN to None in Pandas dataframe

I try to replace Nan to None in pandas dataframe. It was working to use df.where(df.notnull(),None). Here is the thread for this method. Use None instead of np.

Python Pandas - Lookup a variable column depending on another column's value

I'm trying to use the value of one cell to find the value of a cell in another column. The first cell value ('source') dictates which column to lookup. import p

Predicting with SMOTE

If I have a training dataset that has 1083 samples and a testing dataset that has 79871 samples, how do I go about making the samples equal? I have been using S

pandas read_csv throwing ValueError: Invalid file path or buffer object type: <class 'list'>

I want to read a csv file sent as a command line argument. Thought I could directly use FileType object of argsprase but I'm getting errors. from argparse impor

Repeat rows in a pandas DataFrame based on column value

I have the following df: code . role . persons 123 . Janitor . 3 123 . Analyst . 2 321 . Vallet . 2 321 . Auditor . 5 The first line means that I hav

Extracting a .7z File into a Pandas Data Frame

I am Using a Jupyter notebook (google colab) to try and extract data from a .7z file into a pandas dataframe, using linux commands. The data is from http://untr

Calculate Decay Rate in Python

I have dataset which somewhat follows an exponentional decay df_A Period Count 0 1600 1 894 2 959 3 773 4 509 5 206 I want

Create numpy array from function applied to (multiple) pandas columns

I have pd.DataFrame containing rows of values: import pandas as pd df = pd.DataFrame({"col1": [1, 2, 3, 4, 5, 6], "col2": [6, 5, 4, 3, 2, 1]}) I now want to f

pandas does not read my excel file and even break the existed file

I want to stack a series of dataframe in one excel file and I wrote the code below. if os.path.isfile(result) is False: with pd.ExcelWriter(result, engine='

Functional Programming: How does one create a new column in a multi-index data frame that is a function of another column?

Suppose the below simplified dataframe. (The actual df is much, much bigger.) How does one assign values to a new column f such that f is a function of another

Pandas+Uncertainties producing AttributeError: type object 'dtype' has no attribute 'kind'

I want to use Pandas + Uncertainties. I am getting a strange error, below a MWE: from uncertainties import ufloat import pandas number_with_uncertainty = ufloa

Error while converting csv to parquet file using pandas

I would like to upload csv as parquet file to S3 bucket. Below is the code snippet. df = pd.read_csv('right_csv.csv') csv_buffer = BytesIO() df.to_parquet(csv_b

Multiple aggregations of the same column using pandas GroupBy.agg()

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df["returns"], without having to call agg() multiple times

add a column in dataframe based on existing value in another dataframe

I have a dataframe DF3 : zone_id combine 0 ABD 10 BCD 20 ABC 30 ABE and a second dataframe :combinaison_df: zone_id combine 0

How can I create a cross-tab of two columns in a dataframe in Python and generate a total row and column in the output?

I have created a dataframe from a CSV file and now I'm trying to create a cross-tab of two columns ("Personal_Status" and "Gender"). The output should look like

Missing data error on adfuller test although I cleaned for inf and nans

Currently I am working on a data set which has many time-dependent variables. I ran adfuller for all and changed the non-stationary ones to percentage change (t

Pandas approximating/rounding large numbers from csv

I am reading numbers from a csv file into a pandas dataframe. When the numbers I am reading are approximately >1E12, pandas will approximate the number to 3

How to create ratios using value counts and separate fields in Python?

Using the data frame shown below I'd like to create manager to assistant and manager to associate percentages/ ratios based/ per location. I'm looking for the