Category "dataframe"

Pandas : Create new column based on text values of other columns

My dataframe looks like this: id text labels 0 447 glutamine synthetase [protein] 1 447 GS

Dividing values in columns based on their previous marker

I have the following dataframe: df = {'id': [1,2,3,4], '1': ['Green', 'Green', 'Green', 'Green'], '2': ['34','67', 'Blue', '77'], '3': ['Blue', '45', '9

Map range from 2 columns based on overlapping range in another Pandas dataframe and sum values for same range

I have two datasets (df1 and df2) of values with a certain range (Start and End) in both of them. I would like to annotate the first one (df1) with values from

create dataframe from dictionary of datetime and int

I have datetime and int values dictionary like below. end_date = datetime.datetime.strptime("01-12-2020", "%d-%m-%Y") details = { datetime.datetime.strptime

How do I populate upper.tri of matrix with matched integers from the lower.tri?

Issue I have a dataframe of familial relationships coded with integers, where R01 is the relationship of person N to person 1, R02 their relationship to person

How to reshape my dataset in specific way?

I have a dataset: name val a a1 a a2 b b1 b b2 b b3 c c1 I want to make all possible permutations "names" which are not

How to create a new columns based off of values of other columns which could contain #s or NaN?

I have a few dataframes that I'm merging based on known, populated fields. The resulting dataframe will always contain a set of columns, but may or may not have

Add a new record for each missing row in a DataFrame with TimeStamp without replacing the original records

Be the next Pandas DataFrame: | date | counter | |-------------------------------------|------------------| | 2

Pandas - Take value n month before

I am working with datetime. Is there anyway to get a value of n months before. For example, the data look like: dft = pd.DataFrame( np.random.randn(100, 1),

R incongruity when copying a column in R with ifelse

After loading lots of xlsx sheets of multiple workbooks, I want to create a double check of the tidiness and cleanliness of the data source. I created a data fr

Remove specific string char at the beginning of each lines of a txt file using python

I'm currently working on a script in python. I want to convert an xls file into a txt file but I also want to clean and manage the data. In the xls files, there

Need to add row above the headers of Dataframe in pandas

I have a dataframes, I need to add 8 rows above the header of dataframe, I am sharing dataframe and the desired output Dataframe:- Toll No. Vr.name

Most efficient way to search over a DataFrame in Python [duplicate]

I have a DataFrame having these kind of data : df = pd.DataFrame({ 'id' : ['a', 'a', 'b', 'b', 'c', 'c'], 'alias' : ['value'+str(i) fo

Create dataframe based on matching

I want to create a df in R with two variables, they have different number of rows. This is an abstract example: I want to match a 3 to "Fail" (without writing i

How to conditionally assign values from another dataframe?

I want to merge 2 dataframes without using the function '.merge' and I try to assign a value to a dataframe column based on an interval and an id. intervals = p

PerformanceWarning: DataFrame is highly fragmented. How to convert in to a more efficient way via pd.concat with designated column name

I got following warning while running under python 3.8 with the newest pandas. PerformanceWarning: DataFrame is highly fragmented. this is the place where I c

Drop rows of dataframe if the rows have continuously the same value

I am dealing with metered time series data, that should not have the exact same value for more than n steps. I want to build a script that, given a threshold n,

Why do factors get coerced to a number subsetting a data frame?

I was trying to get the diagonal of the iris data set and wrote the following for loop: diagonal_list <- list() for (j in seq_len(ncol(iris))) { diagon

Unmanaged memory jamming cluster during dask's merge_asof method

I am trying to merge large dataframes using dask.dataframe.multi.merge_asof, but I am running into issues with accumulating unmanaged memory on the cluster. I h

How do I plot my datetime on the x axis when this value is used as index?

I have a short question. This is my dataframe: gradient result date 2022-04-15 09:43:20 0.206947 0.10