Category "dataframe"

Selecting a subset of a dataframe based on a list - pandas

I am working with a large dataframe (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt) with pandas in Python 3, using PyCharm. The column

iterate over a df and multiply the values by the values of another df

My df1 looks like this:It contains 3 unique project id.The date starts on 01-01-22 and ends on 01-12-28 id date p50 p90 apv1 01-01-22 1000 1000 apv2 01-01-22 1

How to use pivot_longer in this case?

I have the following data frame: df =structure(list(Country = c("DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE",

ValueError: shapes and not aligned: (dim 2) != 4 (dim 0)

I am currently working on a script that does some array manipulating and calculations for modeling. I am running into an error and unsure how to solve it. from

Find differences between a set of csv files in folder 1 against a set of csv files in folder 2?

There are a number a files that need to be compared for differences in their rows; difference not as in subtraction but as in what values are different for each

Dropping NA-values with a MAX threshold and not MIN threshold

Is there any way to remove columns from a dataframe that has LESS NA-values than for instance 200? So instead of df.dropna(threshold = 200) we want the opposite

Why can one column of the pandas DataFrame not be filled?

I'm having some problems iteratively filling a pandas DataFrame with two different types of values. As a simple example, please consider the following initializ

How to fill na values of a column by checking another column

This image would help better: The column titled passengerId describes the group number and person number, people in the same group are usually families, hence

Trouble when trying to do a VLOOKUP like with two pandas dataframes

I've read a lot of questions regarding this matter, but none of it solved my problem. I have 2 dataframes, one containing a list of all students of graduation l

Remove not increasing rows based on other columns values

I have a data frame on R and I want to remove all rows that are not increasing in my column 3. Each row have to be higher or equal than the previous one. But m

Concat multiple dataframe and manage those that doesn't exist

I try to concat some dataframe - 30 dataframe of 24h data - that been created automatically with some csv, but sometimes csv doesn't exist, so the dataframe was

how to create a dataframe from a list of dictionary value?

I have a list - elements_listed = [{'data': {'data/2022/04/1': '26-Apr-2022 07:47', 'data/2022/04/2': '24-Apr-2022 17:27', 'data/2022/04/3': '22-Apr-2022 14:20'

Assign multiple columns different values based on conditions in Panda dataframe

I have dataframe where new columns need to be added based on existing column values conditions and I am looking for an efficient way of doing. For Ex: df = pd.D

Finding and comparing unique values Grouped by Datetime Quarters python

I'm working with an extremely large dataset in a Pandas Dataframe. I'm now trying to understand on a quarterly basis: how many UNIQUE sellers have COMMENCED usi

Annotate bars with values on Pandas bar plots

I was looking for a way to annotate my bars in a Pandas bar plot with the rounded numerical values from my DataFrame. >>> df=pd.DataFrame({'A':np.rand

Keep getting "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

data_df.loc[data_df['hotelID'] == sqlIDs[neededId] & to_integer(df.iloc[row, 6]) >= to_integer(MostRecent)] This is the snippet that keeps getting me th

Best way to Create a custom Transformer In Java spark ml

I am learning Big data using Apache spark and I want to create a custom transformer for Spark ml so that I can execute some aggregate functions or can perform o

drop same values in different columns by pair (drop connected components)

after applying levenshtein distance algorithm I get a dataframe like this: Elemento_lista Item_ID Score idx ITEM_ID_Coincidencia 4 691776 100 5 691777 4 691776

Selecting data from a pandas DataFrame

I have defined a pandas DataFrame, given the number of rows (index) and columns. I perform a series of operations and store the data in such DataFrame. The code

DataFrame VWAP Does not match TradingView

Not sure why I cannot get my DataFrame VWAP calculations to TradingView version at this link: https://www.tradingview.com/support/solutions/43000502018-volume-w