Category "pandas"

pandas - repalce a key's value of a dictionary column with another column

In pandas, I have 2 columns, one of which is a dictionary and the other is a numerical column. When the dictionary column is not null, is there a time efficient

how to use Google Cloud Translate API for translating bulk data?

I have a csv file of several thousands of rows in multiple languages and I am thinking of using google cloud translate API to translate foreign language text in

Seaborn boxplot for classification with pandas wide to long [duplicate]

I have data that I would like to train an ml classifier on. The data is in wide format. I'd like to do a boxplot with searborn sns.boxplot(x='

python random_sample to generate values

I am currently using random_sample to generate weightage allocation for 3 stocks where each row values add up to 1 and I rounded them to 2dp. weightage=[] n = 0

Create new column using keys pair value from a dataframe column

I have a data frame with many column. One of the column is named 'attributes' and in it has a list of dictionary with keys and values. I want to extract each ke

Remove element from a list based on condition in pandas dataframe

a= {'A' : [1, 2,3,4], 'B' : ['FOOTBALL','BASKETBALL','HANDBALL','VOLLEYBALL'], 'C' : [[5,10,15,40],[1,4],[20,10,40],[10,40]] } How can I remove the element 40

Applying own functions

I am trying to apply my own function. Below you can see the data and function. import pandas as pd import numpy as np data_test = { 'sales_201

Within a pandas DF, how can I snag last two parts of a list as a single string for conditional output?

I'm doing some modification to a CSV via pandas. For one of the situations, I want to use parse a URL into a list, grab the last two items of that list, and out

Python Pandas Geopy AttributeError 'NoneType' object has no attribute 'raw' , getting city, state and country from long/lat

I've looked around for a solution and tried filtering my df to where the longitude and latitude are not null but to no avail. This is my first time using geopy

String-join pandas dataframe colums and skip nan values

I'm trying to join column values into new column but I want to skip nan values: df['col'] = 'df['col1'].map(str) + ',' + df['col2'].map(str) + ',' + df['col3'].

How can I pivot a dataframe?

What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables. Even if they don't know t

Output 2D array to a Matrix as a CSV - Python

I have a 2D array with vectorised rows with each row representing a document in the corpus: array[[ 0.0 0.0 0.4583 0.6584 0.0] ...

How to bring data frame into single column from multiple columns in python

I have data format in these multiple columns. So I want to bring all 4 columns of data into a single column. YEAR Month pcp1 pcp2 pcp3 pcp4 1984

Exclude Japanese Stopwords from File

I am trying to remove Japanese stopwords from a text corpus from twitter. Unfortunately the frequently used nltk does not contain Japanese, so I had to figure o

How to color parts only of a Pandas dataframe columns?

I've a Pandas dataframe with continuous sequence of ones and zeroes, as follows: import numpy as np import pandas as pd m = np.array([[1, 1, 1, 1], [1, 1, 1, 0

Date and time conversion using Pandas & Python to create Ts

I'm trying to get Ts using my existing data of data and time, which looks like (Pdb) df[0][:7] 0 [Data & Time] 1 Jan 01 08:00:01.193 2 Jan 01 08

pandas - how to access the value of next 16 rows as a list of 16 numbers

Say I have just 2 columns in pandas. Column 1 has all numerical values and column 2 has values only at the every 16th position (so column 2 has value at index 0

Extracting rows from list in data frame where at max numbers [duplicate]

So i've been given a pandas data frame and created a definition for the maximum variable in one column. max_energy = D202['USAGE'].max() max_e

How can I plot a pandas dataframe where x = month and y = frequency of text?

I have the following dataset: Date ID Fruit 2021-2-2 1 Apple 2021-2-2 1 Pear 2021-2-2 1 Apple 2021-2-2 2 Pear 2021-2-2 2 Pear 2021-2-2 2 Apple 2021-3-2 3 Apple

openpyxl ImportError in Airflow docker when using pd.read_excel()

When using pandas pd.read_excel() in an airflow task inside a container I get the openpyxl error below. I tried installing openpyxl using poetry and even using