Category "pandas"

JSON input to multiple excel file outputs

I have a JSON file that looks like this: { "Person A": { "Company A": { "Doctor": { "Morning": "2000", "Afternoon": "1200" },

string column conversion to float in Pandas DataFrame

I want to get left value (LD) pipe separated value from the DataFrame column "'CA Distance Nominal (LD | au)" here is the code. when I convert the string to flo

What is the method used by Pandas profiling tool to identify duplicates rows?

I'm looking for the rationale about the method used by pandas profiling tool to identify duplicates rows (in a dataframe with multiple columns)? I couldn't find

How to completely reorganise a table using aggregate data from qualitative information

I have a pandas dataframe which has the following layout: Column data type 'Water-Binder' float 'Fly Ash' float 'Age' int 'Strength %' float The age column i

How do I find all the polygons of a GeoDataframe that contain any point of another GeoDataframe in GeoPandas?

I have a GeoDataframe of about 3200 polygons, and another GeoDataframe of about 26,000 points. I want to get a third GeoDataframe of only the polygons that cont

Pandas : Create new column based on text values of other columns

My dataframe looks like this: id text labels 0 447 glutamine synthetase [protein] 1 447 GS

How to pivot a dataframe to a wide format?

Suppose I have a pandas DataFrame like this: import pandas as pd data = pd.DataFrame({'header': ['age', 'height', 'weight', 'country', 'age', 'height', 'weight

Create a Tensorflow Dataset from a Pandas data frame with numerous labels?

I am trying to load a pandas dataframe into a tensor Dataset. The columns are text[string] and labels[a list in string format] A row would look something like:

Dividing values in columns based on their previous marker

I have the following dataframe: df = {'id': [1,2,3,4], '1': ['Green', 'Green', 'Green', 'Green'], '2': ['34','67', 'Blue', '77'], '3': ['Blue', '45', '9

Map range from 2 columns based on overlapping range in another Pandas dataframe and sum values for same range

I have two datasets (df1 and df2) of values with a certain range (Start and End) in both of them. I would like to annotate the first one (df1) with values from

Read timeout in pd.read_parquet from S3, and understanding configs

I'm trying to simplify access to datasets in various file formats (csv, pickle, feather, partitioned parquet, ...) stored as S3 objects. Since some users I supp

create dataframe from dictionary of datetime and int

I have datetime and int values dictionary like below. end_date = datetime.datetime.strptime("01-12-2020", "%d-%m-%Y") details = { datetime.datetime.strptime

Adding new column based on combined criteria in Pandas Groupby

Following on from my previous question (thanks to those responding) I'm stuck again in achieving what I suspect is possible using a groupby in Pandas. Here's wh

How to create a new column showing when a change to an observation occurred?

I have a data-frame formatted like so: Contract Agreement_Date Date A 2017-02-10 2020-02-03 A 2017-02-10 2020-02-04 A 2017-02-11 2020-02-09 A 2017-02-11 2020-0

Inserting Data to SQL Server from a Python Dataframe Quickly

I have been trying to insert data from a dataframe in Python to a table already created in SQL Server. The data frame has 90K rows and wanted the best possible

What's the computational complexity of .iloc[] in pandas dataframes?

I'm trying to understand what's the execution complexity of the iloc function in pandas. I read the following Stack Exchange thread (Pandas DataFrame search is

What's the computational complexity of .iloc[] in pandas dataframes?

I'm trying to understand what's the execution complexity of the iloc function in pandas. I read the following Stack Exchange thread (Pandas DataFrame search is

How do I force a blank for rows in a dataframe that have any str or character apart from numerics?

I have a datframe >temp Age Rank PhoneNumber State City 10 1 99-22344-1 Ga abc 15 12 No Ma xyz For the column(Phone Numbe

How to reshape my dataset in specific way?

I have a dataset: name val a a1 a a2 b b1 b b2 b b3 c c1 I want to make all possible permutations "names" which are not

How to create a new columns based off of values of other columns which could contain #s or NaN?

I have a few dataframes that I'm merging based on known, populated fields. The resulting dataframe will always contain a set of columns, but may or may not have