Category "pandas"

Efficiency of multiple chained str transformation and alternatives

I'm wanting to change a dataframe column so the values are lower case and also have their whitespace stripped. For this I used chained str transformations. df.l

Change NaN to None in Pandas dataframe

I try to replace Nan to None in pandas dataframe. It was working to use df.where(df.notnull(),None). Here is the thread for this method. Use None instead of np.

Change NaN to None in Pandas dataframe

I try to replace Nan to None in pandas dataframe. It was working to use df.where(df.notnull(),None). Here is the thread for this method. Use None instead of np.

Python Pandas - Lookup a variable column depending on another column's value

I'm trying to use the value of one cell to find the value of a cell in another column. The first cell value ('source') dictates which column to lookup. import p

Predicting with SMOTE

If I have a training dataset that has 1083 samples and a testing dataset that has 79871 samples, how do I go about making the samples equal? I have been using S

pandas read_csv throwing ValueError: Invalid file path or buffer object type: <class 'list'>

I want to read a csv file sent as a command line argument. Thought I could directly use FileType object of argsprase but I'm getting errors. from argparse impor

Repeat rows in a pandas DataFrame based on column value

I have the following df: code . role . persons 123 . Janitor . 3 123 . Analyst . 2 321 . Vallet . 2 321 . Auditor . 5 The first line means that I hav

Extracting a .7z File into a Pandas Data Frame

I am Using a Jupyter notebook (google colab) to try and extract data from a .7z file into a pandas dataframe, using linux commands. The data is from http://untr

Calculate Decay Rate in Python

I have dataset which somewhat follows an exponentional decay df_A Period Count 0 1600 1 894 2 959 3 773 4 509 5 206 I want

Create numpy array from function applied to (multiple) pandas columns

I have pd.DataFrame containing rows of values: import pandas as pd df = pd.DataFrame({"col1": [1, 2, 3, 4, 5, 6], "col2": [6, 5, 4, 3, 2, 1]}) I now want to f

pandas exlewriter.book does not read my excel file and even break the existed file

I want to stack a series of dataframe in one excel file and I wrote the code below. if os.path.isfile(result) is False: with pd.ExcelWriter(result, engine='

Functional Programming: How does one create a new column in a multi-index data frame that is a function of another column?

Suppose the below simplified dataframe. (The actual df is much, much bigger.) How does one assign values to a new column f such that f is a function of another

Pandas+Uncertainties producing AttributeError: type object 'dtype' has no attribute 'kind'

I want to use Pandas + Uncertainties. I am getting a strange error, below a MWE: from uncertainties import ufloat import pandas number_with_uncertainty = ufloa

Error while converting csv to parquet file using pandas

I would like to upload csv as parquet file to S3 bucket. Below is the code snippet. df = pd.read_csv('right_csv.csv') csv_buffer = BytesIO() df.to_parquet(csv_b

Multiple aggregations of the same column using pandas GroupBy.agg()

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df["returns"], without having to call agg() multiple times

add a column in dataframe based on existing value in another dataframe

I have a dataframe DF3 : zone_id combine 0 ABD 10 BCD 20 ABC 30 ABE and a second dataframe :combinaison_df: zone_id combine 0

How can I create a cross-tab of two columns in a dataframe in Python and generate a total row and column in the output?

I have created a dataframe from a CSV file and now I'm trying to create a cross-tab of two columns ("Personal_Status" and "Gender"). The output should look like

Missing data error on adfuller test although I cleaned for inf and nans

Currently I am working on a data set which has many time-dependent variables. I ran adfuller for all and changed the non-stationary ones to percentage change (t

Pandas approximating/rounding large numbers from csv

I am reading numbers from a csv file into a pandas dataframe. When the numbers I am reading are approximately >1E12, pandas will approximate the number to 3

How to create ratios using value counts and separate fields in Python?

Using the data frame shown below I'd like to create manager to assistant and manager to associate percentages/ ratios based/ per location. I'm looking for the