Category "pandas"

How to get pandas to return the row index on which a CSV read error occurs

I have a CSV: '1\n2\na'. If I read it with something like pd.read_csv(io.StringIO('1\n2\na'), names=['A'], dtype={'A': 'float'}) specifying that the first colum

Setting Data Frame Column Names with Data Frame includes extra characters: ('ColumnName',)

I've got a python script set to pull data and column names from a Pervasive PSQL database, and it then creates the table and records in MS SQL. I'm creating dat

Python API Call: JSON to Pandas DF

I'm working on pulling data from a public API and converting the response JSON file to a Pandas Dataframe. I've written the code to pull the data and gotten a s

How to transform columns with method chaining?

What's the most fluent (or easy to read) method chaining solution for transforming columns in Pandas? (“method chaining” or “fluent” is

Python/Pandas Add string to rows in a column that contain a character a specific number of times

I have a Pandas DataFrame(data) with a ['Duration'] column as 'object' type that has time durations in format: 'H:%M:%S' such as '1:47:54' with 7 characters, bu

Adding new dataframe colonms using information extracted from the url in the url column, but the url could be missing information

Given: A pandas dataframe that contains a user_url column among other columns. Expectation: New columns added to the original dataframe where the columns are co

How do I remove hours and seconds from my DataFrame column in python? [duplicate]

I have a DataFrame : Age Gender Address Date 15 M 172 ST 2022-02-07 00:00:00 I Want to remove hh:mm:ss I tried: import datetime

Getting `A value is trying to be set on a copy of a slice from a DataFrame.` when setting a column

I know a value should not be set on a view of a pandas dataframe and I'm not doing that but I'm getting this error. I have a function like this: def do_somethin

Pandas groupby feature question for output CSV

I have the following code df.groupby('AccountNumber')[['TotalStake','TotalPayout']].sum() which displays as I would like it to in pandas The issue is when I ou

Alternative way to append a dataframe to itself N times and populate new column

Is there an alternative way to append a dataframe to itself N times where N is based on a list length, and the list contents are added as a new column to the da

Create multiple DataFrames using data from an api

I'm using the world bank API to analyze data and I want to create multiple data frames with the same indicators for different countries. import wbgapi as wb imp

Is there a way to control which vertices connect in a plotly.express.line_geo map?

I'm trying to make a connection map that has the option to use an animation_frame to show different months/years. Plotly.express has this option, but the plotly

Is there a way to validate data type lengths in Pandas when using the read_csv function?

I'm trying to put some sort of length validation for columns using Pandas. For example, let's say I have a csv named test.csv that has the following data within

Why am I getting NANs when concatenating a Data Frame with a Series

I have a Pandas Dataframe ('a') and a Series ('b') both with timeseries index (weekends excluded). I am trying to concatenate them. Both of them start with the

Apply loc to the entire dataframe but one column (keep the one column as it was and not remove it)

I am trying to divide the entire dataframe by a fix number but I want to keep the 'Year' column as is. I tried dividing the entire df with 100 and then multiply

Pandas - Cross referencing with DatetimeIndex - Groupby

I have data of many companies by month (End of Month). I want to create a new columns with groupby for each company where: new_col from Jul of this year to Jun

Compare 2 csv files and remove the common lines from 1st file | python

I want to compare 2 csv files master.csv and exclude.csv and remove all the matching lines based on column1 and write the final output in mater.csv file. master

Apply a weighted decay that changes over time in Python

I have a dataframe in Python that looks like the one below: I want to calculate the dnf_rate_weighted so that there's a 0.95 decay for each stage going back th

How to create a Dataframe from multiple dictionaries

I have a little issue with my the data I have (multiple dictionaries) to process and create a Dataframe from them. This what the data look like: print(data) 0

Easiest way to ignore or drop one header row from first page, when parsing table spanning several pages

I am parsing a PDF with tabula-py, and I need to ignore the first two tables, but then parse the rest of the tables as one, and export to a CSV. On the first re