Category "dataframe"

Python Dataframes - Breaking out single rows with duplicate columns into multiple rows and fewer columns

I have a data frame like this: A B C Date1 Time1 Value1 Date2 Time2 Value2 abc def ghi 01-01-2000 15:00:00 100 01-01-2000 19:00:00 110 There are duplicate col

Pandas - Add a new column extracting value from arrays based on other column value

I am currently stuck trying to extract a value from a list/array depending on values of a dataframe. Imagine i have this array. This array i can manually create

Pandas - Add a new column extracting value from arrays based on other column value

I am currently stuck trying to extract a value from a list/array depending on values of a dataframe. Imagine i have this array. This array i can manually create

R: slicing over dates in a dataframe using custom time window

I have a dateframe of player rankings over many years (2000-2020), which looks like : Now, I wish to group_by() and summarise() and calculate statistics for di

RStudio: Selecting the column with the latest available data from a dataframe

I am trying to extract data from the World Bank and import it into RStudio for a regression analysis. The data can be found here and as you can see, the online

Slicing a dataframe using matches to build a new dataframe with Pandas?

I am trying to get my code to take in a dataframe, find all occurrences of "START:", then iterate through each occurrence to create 'slices' (Where the first ro

Categorical column after melt in pandas

Is it possible to end up with a categorical variable column after a melt operation in pandas? If I set up the data like this: import pandas as pd import numpy a

Coalesce columns and create another column to specify source

I'm using dplyr::coalesce() to combine several columns into one. Originally, across columns, each row has only one column with actual value while the other colu

R Dataframe By Group Calculation

I have a dataframe like below (the real data has many more people and club): Year Player Club 2005 Phelan Chicago Fire 2007 Phelan Boston Pant 2

Any optimize way to iterate excel and provide data into pd.read_sql() as a string one by one

#here I have to apply the loop which can provide me the queries from excel for respective reports: df1 = pd.read_sql(SQLqueryB2, con=con1) df2 = pd.rea

Pandas Pivot table - How compute the following default ratio?

I am able to compute the default rate in number (e.g, the percentage of customers falled into default), with the code below, getting the following output: impor

Pandas dataframe divide features to group of high correlation

I have a dataframe with over 280 features. I ran correlation map to detect groups of features that are highly correlated: Now, I want to divide the features to

Adding a new column in pandas dataframe from another dataframe with differing indices

This is my original dataframe. This is my second dataframe containing one column. I want to add the column of second dataframe to the original dataframe at th

How to convert a nested dict, to a pandas dataframe

I'm trying to convert a dataframe that has inside other dataframe like: { 'id': 3241234, 'data': { 'name':'carol', 'lastname': 'netfli

Efficient way to unnest (explode) multiple list columns in a pandas DataFrame

I am reading multiple JSON objects into one DataFrame. The problem is that some of the columns are lists. Also, the data is very big and because of that I canno

Pandas wide to long bringing empty DataFrame

I was working in a pretty simple task: applying wide_to_long to a DataFrame, but every time I ran it, I got an empty DataFrame. I was almost sure I was doing it

Skipping same entries in different rows of a column in SQL table

for example I have created a dataframe in R named as "Numbers" which has following output: Numbers 1 2 3 2 4 1 When I tried to insert this dataframe in SQL tab

Merging two dataframes without losing data

I have two dataframes: df_1 = Material TypeOf 4100 N200 4101 M200 4200 M200 4500 N200 .

retrieve only months with at least 28 sample days - pandas dataframe

Hello to the people of the web, I have a dataframe containing 'DATE' (datetime) as index and TMAX as column with values: tmax dataframe What i'm trying to do is

Py4JJavaError when trying to write pyspark DataFrame to parquet

I wanted to convert a large .csv vile into .parquet format using pyspark. I am using python 3. I tried changing the codec used for compression, as suggested in