Category "pandas"

Smart for loop in python for a portfolio performance

this is my first question here, so go easy on me. I've computed a certain portfolio in python, for which I've gotten a dataframe (or list for that matter) of ar

Groupby id and change values for all rows for the earliest date to NaN

I have the following id, i would like to groupby id and then replace value X with NaN. My current df. ID Date X other variables.. 1 1/1/18

Calculate cosine similarity and output without duplicates?

I have the following vectors in my toy example: data = pd.DataFrame({ 'id': [1, 2, 3, 4, 5], 'a': [55, 2123, -19.3, 9, -8],

Python complex iterating through excel files to concatenate colnames that are not named equal

I have multiple xls files in a directory. each file dataframe headers are different but data type is same. 1.xls Location StreetAddress America Pvtld 80

Pandas: Values to columns and then group and merge by same Id [duplicate]

I have a dataframe like this df = DataFrame({'Id':[1,2,3,3,4,5,6,6,6], 'Type': ['T1','T1','T2','T3','T2','T1','T1','T2','T3'],

creating dictionaries from values in pandas columns with repeating values

Considering this sample dataframe: location emp 0 fac_1 emp1 1 fac_2 emp2 2 fac_2 emp3 3 fac_3 emp4 4 fac_4 emp5 It can be recreated by

Latex expressions in pandas dataframes not rendering in vscode

I am trying to set some labels and the caption of my dataframe using mathjax, but it doesn't render in vscode. For example, when I do import pandas as pd test =

TypeError: '<=' not supported between instances of 'str' and 'float'

I want to find the number of rows of clin dataframe where the OS_MONTHS value is <= 12.0. The values in the OS_MONTHS are float. This seems like a trivial qu

how to use list comprehension to subset the dataframe with the valuecounts

make year honda 2011 honda 2011 honda n/a toyota 2011 toyota 2022 Im trying to get list of the make that has value counts more than 2 below is

How to find minimum of some variable with repeating row indexes and preserve all other variables in Python Pandas

Basically, I have multiple repeating dates and the indices (1/2/1990 many times followed by 1/3/1990 many more times, etc.) I want to find the minimum of a give

searching in range between columns using sqlite3 in pandas

I have found solution to my problem in one question Merge pandas dataframes where one value is between two others I tried to modify it for my situation but it d

What's the equivalent of `pandas.Series.map(json.loads)` in polars?

Based on the document of polars, one can use json_path_match to extract JSON fields into string series. But can we do something like pandas.Series.map(json.load

Python Pandas SUMIF excel equivalent

I can't figure out how to achieve a certain task in my python script. I have a dataframe that contains media coverage for a specific topic. One of my columns na

Iterate trough a converted datetime pandas dataframe with a external function

https://rhodesmill.org/skyfield/positions.html#azimuth-and-altitude-from-a-geographic-position Hi I have function that generates a sun-shot azimuth on a specifi

how to access a specific data in two columns using if and statement

My Data Frame My Code: a = 10001 b = "01.01.2001" if a == np.any(df["Token_ID"]) and b == np.any(df["Date_of_birth"]): print("yes") else: print("no")

How to use python to merge multiple sheets from an excel file and values from particular cells

I have an excel file with multiple sheets, the actual data I need from each sheet is from cell B7 to F38, how can I merge all the sheets' data into one by using

Convert date format from a 'yfinance' download

I have a yfinance download that is working fine, but I want the Date column to be in YYYY/MM/DD format when I write to disk. The Date column is the Index, so I

Add missing dates to pandas dataframe

My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my

Find a pattern in middle of multiple sentences

I have a dataframe as below, data = [ [ 1, 'AR-123456' ], [ 1, '123456' ], [ 2, '345678' ], [ 3,'Application-12345678901'], [ 3, '1234567890

Trying to find a graph in matplotlib

I have data that show the difference of temperatures from 1955 to 2020 from an average. I want to make a graph in matplotlib that looks like this: It shows tem