Category "dataframe"

Python:Pandas - Object to string type conversion in dataframe

I'm trying to convert object to string in my dataframe using pandas. Having following data: particulars NWCLG 545627 ASDASD KJKJKJ ASDASD TGS/ASDWWR42045645010

Word count Matrix of document corpus with Pandas Dataframe

Well, I have a corpus of 2000+ text documents and I'm trying to make a matrix with pandas dataframe in the most elegant way. The matrix would look like this: d

Streamlit Panda Query Function Syntax Error When Finding Column in CSV Dataframe

When Using Streamlit to build a data interface getting a syntax error. My downloaded csv dataframe has a column 'NUMBER OF PERSONS INJURED', after converting i

how to assign an entire list to each row of a pandas dataframe

I have a dataframe and a list df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6]}) mylist= [10,20,30,40,50] I would like to have a list as element in each row of a

How to lemmatise a dataframe column Python

How can lemmatise a dataframe column. CSV file "train.csv" looks like this id tweet 1 retweet if you agree 2 happy birthday your majesty 3 essential oil

Trigger IF Statement only when two Spark dataframe meet the conditions

I have two identical Spark DataFrame. They have the same columns. I am trying to create a IF-Else statement in one line but couldnt find a better way to do it.

How to generate random correlated uniform data from a correlation matrix?

I have a very specific problem to solve that makes researching a solution quite hard because I lack the requisite math skills. My goal: Given a covariance/corre

pandas how to check dtype for all columns in a dataframe?

It seems that dtype only work for pandas.DataFrame.Series, right? Is there a function to display data types of all columns at once?

suppress Name dtype from python pandas describe

Lets say I have r = pd.DataFrame({'A':1 , 'B':pd.Series(1,index=list(range(4)),dtype='float32')}) And r['B'].describe()[['mean','std','min','m

Color pandas DataFrame value if larger than 1.5*median(column)

Let's say I have a DataFrame that looks like this: df= pd.DataFrame({'A': [1,-2,0,-1,17], 'B': [11,-23,1,-3,132], 'C': [121,

How to extract .zst files into a pandas dataframe

I'm a bit of a beginner when it comes to Python, but one of my projects from school needs me to perform classification algorithms on this reddit popularity data

Python Pandas Group by date using datetime data

I have a column Date_Time that I wish to groupby date time without creating a new column. Is this possible the current code I have does not work. df = pd.group

Compare two dataframes Pyspark

I'm trying to compare two data frames with have same number of columns i.e. 4 columns with id as key column in both data frames df1 = spark.read.csv("/path/to/

Remove repeating column values in Python Pandas

I have a data set that has dates and subtotal of other columns. I want to remove the same recurring dates per subtotal

H2O python - How to let h2oframe to dataframe with correctly character and datetime

I have a csv file, and want to use H2O to do DeepLearning. But it has some Chinese and datetime that when I finish my Deeplearning need to save output to csv, i

How to Remove outlier from DataFrame using IQR?

I Have Dataframe with a lot of columns (Around 100 feature), I want to apply the interquartile method and wanted to remove the outlier from the data frame. I a

Find the Max value of an Array column and find associated value in another Array with in the dataframe

I have a csv file with below data. Id Subject Marks 1 M,P,C 10,8,6 2 M,P,C 5,7,9 3 M,P,C 6,7,4 I Need to find out Max value in the Marks column for each Id an

How can i fill in missing csv file value base on reference csv file

I have a reference file like this Id, Value1, Value2 a, a1, a2 b, b1, b2 c, c1, c2 d, d1, d2 ... n, n1, n2 and the missing file Id, Value1, Value2 d, ,

Spark pivot groupby performance very slow

I am trying to pivot the dataframe of raw data size 6 GB and it used to take 30 minutes time (aggregation function sum): x_pivot = raw_df.groupBy("a", "b", "c"

How to replace the missing values with average of ffill() and bfill() in pandas?

This is a sample dataframe and it containsNA: x y z datetime 0 2 3 4 02-02-2019 1 NA NA NA 03-02-2019 2 3 5 7 04-0