Category "pandas"

Why is there an extra row of zeros in the histogram of images in a folder?

I have a folder comprising 20 images (.jpg format). I am trying to obtain the histogram of each of the images and store it as a Pandas data frame. My code is sh

Python pandas - series to dataframe

. How do I print out only the country names that exist in the dataframe among series with country names as index?

Extract nested values from data frame using python

I've extracted the data from API response and created a dictionary function: def data_from_api(a): dictionary = dict( data = a['number'] ,created_b

Replace null values by the mean of each group

I have a dataset similar to below with several columns which contain Nan values. I would like to group the dataset by location and fill the Nan in Iso code and

Pandas store a json into a pandas column

I have a pandas dataframe like this data = {"Name": ["Tom", "nick", "kish", "jack"], "Age": [20, 21, 19, 18]}

Remove zeros from Dataframe of lists

I have such a DataFrame: index B 0 [0,1,2,0,4] 1 [1,0,2,0,0,1,7] I want to count the non zero values of each list for each row. Result: index B 0 3 1 4

LabelEncoding a permutation of combination of columns

I'd like to create class labels for a permutation of two columns using sklearn's LabelEncoder(). How do I achieve the following behavior? import pandas as pd im

(Pandas, Python) Selecting indices of a parent DF based on shared column values with a child DF

(I recently asked this question on r/learnpython (here), but didn't get any feedback, so am re-posting it verbatim here. Hope that is okay!) Suppose I have a D

Python rank: give negative rank to negative numbers

I have a basic set of data like: ID Value A 0.1 B 0.2 C -0.1 D -0.01 E 0.15 If we use data.rank() we get the result: ID Value A 3 B 5 C 1 D 2 E 4 Bu

Filter column list based on another column in Python

In Python, I have a dataset like this below, where column1 and column2 are objects and not strings: data = {'id': ['first_value', 'first_value', 'second_value'

Xarray: grouping by contiguous identical values

In Pandas, it is simple to slice a series(/array) such as [1,1,1,1,2,2,1,1,1,1] to return groups of [1,1,1,1], [2,2,],[1,1,1,1]. To do this, I use the syntax:

What could be wrong with a Pandas DataFrame?

I couldn't make head or tail of this: I have a function that reads a bunch of csv files from a S3 bucket, concats them and returns the DataFrame: def create_df(

i want to make urls

1.Link is "https://www.xyz.{country}/dp/{asin}" 2.I have to pick two things from csv file which country and asin. CSV file contains : Asin Country 0

Most efficient way to transform this data using Pandas?

I currently have several hundred .csv files in the format shown on the left below, and I need to transform them all into the format shown on the right. I tried

Retrieving values based on other values (dataframe) - how to make my code more efficient?

So after much trying I've managed to get something a bit closer to what I intend to do. Scenario is as follows, a dataframe with many columns of which one conta

How can I plot specific Excel data from two columns with conditions?

I have a huge spreadsheet of data that looks something like this: Date IDNumber Item 2021-05-10 1 Apple 2021-05-10 1 Orange 2021-05-10 2 Apple 2021-05-10 2 Gra

Sum of list values in a df, new column, values are objects

I have a df made of values from a dictionary. I can get rid of [], ',' and split it all in different cols (one col per number). But can't make the transfer to f

make a mean of several year dataframes, hour by hour

I have several dataframes of some value taken very hour, on several year, like this : df1 Out[6]: time P G(i) H_sun T2m WS10m Int

How to convert mean value of each column variable and fill this mean value to corresponding variable in dataframe? [duplicate]

I have a mining dataset which has a following features Rock_type, Gold in grams(AU). Rock type has 8 different rock types and Gold (AU) has pr

Iterating through XMLs, making dataframes from nodes and merging them with a master dataframe. How should I optimize this code?

I'm trying to iterate through a lot of xml files that have ~1000 individual nodes that I want to iterate through to extract specific attributes (each node has 1