Category "pandas"

Finding and comparing unique values Grouped by Datetime Quarters python

I'm working with an extremely large dataset in a Pandas Dataframe. I'm now trying to understand on a quarterly basis: how many UNIQUE sellers have COMMENCED usi

How using function np.where along with apply lambda

this code: def nearest_independment(target): lst=df[df['CLINE_TYPE'].str.contains('crease') & df['CLINE_TYPE'].isin(['nan']).shift(2)

Annotate bars with values on Pandas bar plots

I was looking for a way to annotate my bars in a Pandas bar plot with the rounded numerical values from my DataFrame. >>> df=pd.DataFrame({'A':np.rand

Keep getting "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

data_df.loc[data_df['hotelID'] == sqlIDs[neededId] & to_integer(df.iloc[row, 6]) >= to_integer(MostRecent)] This is the snippet that keeps getting me th

Finding datetime object in pandas df column

I have the following code, where I want to determine if a datetime object exists in a data frame. Here is the code: df_grid['Date'] = pd.to_datetime(df_grid['Da

Plotly Table does not show in Jupyter Lab in Python?

I try to plot table in Plotly in Python in Jupyter Lab. But my table in plotly does not show in Jupyter Lab, my code is as below: df = pd.read_csv('df.csv') fi

How to calculate values in Pandas Dataframe itself?

You can see my dataframe below, x values are different value, but other values are same with left values, for example, column 15 and column 16 are same value. I

How do I replace missing values with NaN

I am using the IMDB dataset for machine learning, and it contains a lot of missing values which are entered as '\N'. Specifically in the StartYear column which

drop same values in different columns by pair (drop connected components)

after applying levenshtein distance algorithm I get a dataframe like this: Elemento_lista Item_ID Score idx ITEM_ID_Coincidencia 4 691776 100 5 691777 4 691776

How to convert rows under a column into a list under a dictionary under a list under a dictionary?

I have an excel file with the below column name as variantID and corresponding elements. I want the final output as {"filters":[{"tags":[{"k":201,"v":"201"},{"k

Selecting data from a pandas DataFrame

I have defined a pandas DataFrame, given the number of rows (index) and columns. I perform a series of operations and store the data in such DataFrame. The code

How can I group elements in pandas series based on how many times they repeat?

I have this example_series: 0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 False 9 False 10 False

Assign outcome from SQL query to column

I have a dataframe (test_df) that looks like this: dq_code dq_sql Results ID_24 select 'A' as B, 'B' as

Python pandas read_fwf strips white space

I am facing an issue using the read_fwf command from the Python library pandas, same as described in this unresolved question I want to read an ascii file conta

Converting a pandas dataframe to multi-index and changing values

I have the following dataframe: d = [{'AX':['Rec=1','POSi=2'], 'AVF1':[], 'HI':['Rec=343', 'POSi=4'], 'version_1':[]}, {'AX':[], 'AVF1':['Rec=4', 'POSi=454'],

Calculate the pair-wise correlation between distinct class pairs over two feature columns and the target variable?

Most similar questions relating to calculating this involve a single correlation value for each feature column, showing how the features in a dataset correlate

Having trouble expanding/normalizing a dataframe column of dictionary values into a dataframe/ other columns

I'm trying to expand a dataframe column of dictionaries into it's own dataframe/other columns. I have already tried using json_normalize, iteration, and list c

Split / Explode a column of dictionaries into separate columns with pandas

I have data saved in a postgreSQL database. I am querying this data using Python2.7 and turning it into a Pandas DataFrame. However, the last column of this dat

Why does dask take long time to compute regardless of the size of dataframe

What is the reason that dask dataframe takes long time to compute regardless of the size of dataframe. How to avoid this from happening ? What is the reason beh

DF with values for Time Intervals

I am trying to make a manual dataframe.. I would like to have a time stamp with a time interval, for example: df1: Time Interval Price 10:00 - 11:00 $15 11:00