Category "pandas"

importing data from csv - could not convert string to float

I am having difficulties importing some data from a csv file. Input from csv file (extract): Speed;A [rpm];[N.m] 700;-72,556 800;-58,9103 900;-73,1678

convert float64 (from excel import) to str using pandas

although the same question has been asked multiple times. I dont seem to make it work. I use python 3.8 and I rean an excel file like this df = pd.read_excel(r"

Filter Pandas Dataframe based on List of substrings

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new su

convert float64 (from excel import) to str using pandas

although the same question has been asked multiple times. I dont seem to make it work. I use python 3.8 and I rean an excel file like this df = pd.read_excel(r"

Filter Pandas Dataframe based on List of substrings

I have a Pandas Dataframe containing multiple colums of strings. I now like to check a certain column against a list of allowed substrings and then get a new su

How to extract the query result from a Hive job output logs using DataprocHiveOperator?

I am trying to build a data migration pipeline using Airflow, source being a Hive table on a Dataproc cluster and the destination is BigQuery. I'm using Datapro

Deleting multiple rows under same App Name but with different number of reviews

I have a dataframe having many columns, 2 of them being 'App' and 'Reviews'. I discovered that for the same app there are multiple rows because they differ in t

How to create dummy variable for specifc values in a column?

I want to create a dummy variable for a specific value in a column. Let's say my database looks like this : I want a dummy variable just for the museums. pd.ge

Pandas combining slices and list to select columns

Let us assume that a DataFrame df has the following columns: ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7'] We can use a slice or a list to select some columns: Wit

Perform a merge by date field without creating an auxiliary column in the DataFrame

Be the following DataFrames in python pandas: | date | counter | |-----------------------------|------------------| | 2022-01-0

iterating different length arrays and replace values

I have a dataframe that looks like this: df = pd.DataFrame({'col1': [[[1,5,3],[0,0,0]], [[1,2,3],[0,0,0], [1,2,3]]]}) # which looks like this: col1 0 [[1

How to plot distribution of missing values in a dataframe

I have a data frame with 100's of column and would like to investigate the proportion of missing values by plotting graph. I'm able to get the proportion using

removing columns with pandas from csv - not found in axis

I'm trying to remove 1 column from .csv but I'm receiving an error. import pandas as pd df.drop("First Invoice #", axis = 1, inplace= True) KeyError: "['First

Concat null columns data with actual data in pandas?

I have set of columns need to be merged into single column where some columns have data and some don't have where it should be joined with the data to single co

pandas, creating dataframes based on tuple

I have a tuple that has data for several categories. Now I want to extract small dataframes from this tuple for each category based on a list I created. I want

How to plot correlation matrix/heatmap with categorical and numerical variables

I have 4 variables of which 2 variables are nominal (dtype=object) and 2 are numeric(dtypes=int and float). df.head(1) OUT: OS_type|Week_day|clicks|avg_app_s

Sum of different slices rows and column

I have pandas DataFrame df and three arrays columns_list, lower_boarder and upper_boarder all have the same shape. I want to find array with shape as input arra

FastAPI - Dataframe updated change lost between route

I'm trying to make a simple FastAPI api. Let's suppose these routes: The POST Route @api.post('/user', name='Get list of users') def get_user(user: User):

Pandas cannot open an Excel (.xlsx) file

Please see my code below: import pandas df = pandas.read_excel('cat.xlsx') After running that, it gives me the following error: Traceback (most recent call las

Count occurrences within a specific range

I have a data frame that looks like this: Tag 0 skip_1 1 run 2 skip_1 3 run 4 skip_1 5