'How to check if a CSV has a header using Python?
I have a CSV file and I want to check if the first row has only strings in it (ie a header). I'm trying to avoid using any extras like pandas etc. I'm thinking I'll use an if statement like if row[0] is a string print this is a CSV but I don't really know how to do that :-S any suggestions?
Solution 1:[1]
Python has a built in CSV module that could help. E.g.
import csv
with open('example.csv', 'rb') as csvfile:
sniffer = csv.Sniffer()
has_header = sniffer.has_header(csvfile.read(2048))
csvfile.seek(0)
# ...
Solution 2:[2]
I'd do something like this:
is_header = not any(cell.isdigit() for cell in csv_table[0])
Given a CSV table csv_table
, grab the top (zeroth) row. Iterate through the cells and check if they contain any pure digit strings. If so, it's not a header. Negate that with a not
in front of the whole expression.
Results:
In [1]: not any(cell.isdigit() for cell in ['2','1'])
Out[1]: False
In [2]: not any(cell.isdigit() for cell in ['2','gravy'])
Out[2]: False
In [3]: not any(cell.isdigit() for cell in ['gravy','gravy'])
Out[3]: True
Solution 3:[3]
For files that are not necessarily in '.csv' format, this is very useful:
built-in function in Python to check Header in a Text file
def check_header(filename):
with open(filename) as f:
first = f.read(1)
return first not in '.-0123456789'
Answer by: https://stackoverflow.com/users/908494/abarnert
Post link: https://stackoverflow.com/a/15671103/7763184
Solution 4:[4]
Here is a function I use with pandas in order analyze whether header
should be set to 'infer'
or None
:
def identify_header(path, n=5, th=0.9):
df1 = pd.read_csv(path, header='infer', nrows=n)
df2 = pd.read_csv(path, header=None, nrows=n)
sim = (df1.dtypes.values == df2.dtypes.values).mean()
return 'infer' if sim < th else None
Based on a small sample, the function checks the similarity of dtypes with and without a header row. If the dtypes match for a certain percentage of columns, it is assumed that there is no header present. I found a threshold of 0.9
to work well for my use cases. This function is also fairly fast as it only reads a small sample of the csv file.
Solution 5:[5]
Well i faced exactly the same problem with the wrong return of has_header for sniffer.has_header and even made a very simple checker that worked in my case
has_header = ''.join(next(some_csv_reader)).isalpha()
I knew that it wasn't perfect but it seemed it was working...and why not it was a simple replace and check if the the result was alpha or not...and then i put it on my def and it failed.... :( and then i saw the "light"
The trouble is not with the has_header the trouble was with my code because i wanted to also check the delimiter before i parse the actual .csv ...but all the sniffing has a "cost" as they advance one line at a time in the csv. !!!
So in order to have has_header working as it should you should make sure you have reset everything before using it.
In my case my method is :
def _get_data(self, filename):
sniffer = csv.Sniffer()
training_data = ''
with open(filename, 'rt') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(2048))
training_data = csv.reader(csvfile, delimiter=dialect.delimiter)
csvfile.seek(0)
has_header=csv.Sniffer().has_header(csvfile.read(2048))
#has_header = ''.join(next(training_data)).isalpha()
csvfile.seek(0)
Solution 6:[6]
I think the best way to check this is -> simply reading 1st line from file and then match your string instead of any library.
Solution 7:[7]
Simply use try and except ::::::::::::::::::::::::::
import pandas as pd
try:
data = pd.read_csv('file.csv',encoding='ISO-8859-1')
print('csv file has header::::::')
except:
print('csv file has no header::::::')
Solution 8:[8]
An updated version of ChrisD's answer with fallback for empty files:
with open(filename, "r") as f:
try:
has_headings = csv.Sniffer().has_header(f.read(1024))
except csv.Error:
# The file seems to be empty
has_headings = False
https://docs.python.org/3/library/csv.html#csv.Sniffer.has_header
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ChrisD |
Solution 2 | Joe Bashe |
Solution 3 | Frankthetank |
Solution 4 | pietz |
Solution 5 | |
Solution 6 | Abhijit |
Solution 7 | daniel lugo |
Solution 8 | Freddy Mcloughlan |