Category "data-cleaning"

Join large set of CSV files where the header is the timestamp for the file

I have a large set of CSV files. Approx. 15 000 files. And would like to figure out how to join them together as one file for data processing. Each file is in a

How to delete empty spaces from pandas DataFrame rows until first populated field?

Lets say I imported a really messy data from a PFD and I´m cleaning it. I have something like this: Name Type Date other1 other2 other3 Name1 '' '' Type1

How to Remove quotation mark with object data type from a column in Python and convert to float

Customer id ----- object ValueError: could not convert string to float: "'5769842393258'" df["Customer id"] = df["Customer id"] .replace('"', '',

Remove underscore and number at the end of string

I am working with a dataset that has column with some underscores. There is a patter to it but they are different patterns, as shown below ID Col1 1029

How To Sum Count Result?

I have a database that will count daily total amount of customer that does or doesn't have a transactions. Customer Column is a varchar data type Here is how

How to Check Which Record is non-numeric in a String Column in Delta Table

I am working on Delta table using Databricks on Azure. The Delta table contains about 100 million records with many columns. One column data type of which is S

How do I remove nonsensical or incomplete words from a corpus?

I am using some text for some NLP analyses. I have cleaned the text taking steps to remove non-alphanumeric characters, blanks, duplicate words and stopwords, a