Category "dataframe"

Subsetting dataframe with grep

I have following data Sample_ID<-c("a1_01_01","a2_03_03","a3_07_07","a4_09_09","a5_10_10","a6_21_21") Sex<-c(M, M, F, F, M, NM) DF1<-data.frame(Sample_

ValueError: row index exceeds matrix dimensions sparse coo max

I really have no idea what's the root cause! I have created below matrix and had tried increase the (M, N) size, or reduce the data size or the row size or colu

DataFrame challenge: mapping ID to value in different row. Preferably with Polars

Consider this example: import polars as pl df = pl.DataFrame({ 'ID': ['0', '1', '2', '3', '4', '5','6', '7', '8', '9', '10'], 'Name' : ['A','','','','B

How to evenly spread out date data (pandas)

I'm working on a project and I'm struggling with some formats of dataframes. I have two dataframes, each containing a different number of months. I want all the

How to create a dummy only if a column has non-zero values for certain dates but zero for other dates

Let's say, I want to identify traders who only traded during bull runs but did not trade (zero values) during downturns or stable periods. Let's say we have two

Python, Pandas and intersection - not PIVOT

This isn't a straightforward pivot question. I don't want to create new named columns (or numbered ones). What I am looking for is to find a way to search for

Calculate and return the average of positive, negative, and neutral

I have the following dataframe: enter image description here I am trying to have three additional columns in which they return sum of instances of 0, 1-, and 1

Transform a dataset from wide to long pandas

I have a little problem with the transformation from wide to long on a dataset. I tried with melt but I didn't get a good result. I hope that someone could help

populating dataframe with youtube api

list2=['PewDiePie', 'jacksepticeye', 'iDubbbzTV', 'Markiplier','MarkiplierGAME', 'EminemMusic','EdSheeran', 'TaylorSwift', 'CNN', 'FoxNews', 'CBCNews', 'ABCNew

Combine Columns in Pandas

Let's say I have the following Pandas dataframe. It is what it is and the input can't be changed. df1 = pd.DataFrame(np.array([['a', 1,'e', 5],

How to display dictionary with dataframes on a localhost (using Flask and Python)?

I have the following dict: {'id': 1, 'df': pd.DataFrame({'id': [1,2,3], 'col1': ['kuku', 'dudu', 'lulu'], 'col2': [8,9,10]}), 'df_size': 3} When I am trying to

Strange Plotly behaviour with Choropleth Mapbox

I want to create a choropleth map out of a GeoJSON file that looks like this: {"type": "FeatureCollection", "features": [ {'type': 'Feature', 'geometry': {'type

Finding percentage of rejection in pandas dataframe

I have a pandas data frame like given below Id1 YEAR CLAIM_STATUS no_of_claims 1 2019-01 4 1 1 2019-01 5 1

How to apply code to dataframe by condition?

I have the following dataframe: library(dplyr) library(tidyverse) library(concordance) Year <- c(2016,2016,2017,2019,2020,2020,2020,2013,2010,2010) Pf <-

Visualization random sample with displaCy

How can I visualize using displaCy in a dataframe? I have a data called taks_output and want to visualize a sample of the columm msg_lower? What I did: import p

Reshape wide to long for many columns with a common prefix

My frame has many pairs of identically named columns, with the only difference being the prefix. For example, player1.player.id and player2.player.id. Here's an

create dataframe as week and their weekly sum from dictionary of datetime and int

I have datetime and int values dictionary like below. details = { datetime.datetime.strptime("04-01-2021", "%d-%m-%Y") : 15, datetime.datetime.strptime(

Dataframe transformation by taking month columns into rows

The original dataframe is as follows: And I would like to change it into this way:

How to find quantile of a row in PySpark dataframe?

I have the following PySpark dataframe and I want to find percentile row-wise. value col_a col_b col_c row_a 5.0 0.0 11.0 row_b 3394.0 0

How to extract a specific range out of a dataframe and store it in another dataframe and then delete the range out of the original dataframe | pandas

I have some timeseries of energy consumption and i can eyeball when someone is on holidays if the consumption is under a certain range. I have this piece of cod