'How to open a .data file extension

I am working on side stuff where the data provided is in a .data file. How do I open a .data file to see what the data looks like and also how do I read from a .data file programmatically through python? I have Mac OSX

NOTE: The Data I am working with is for one of the KDD cup challenges



Solution 1:[1]

Kindly try using Notepad or Gedit to check delimiters in the file (.data files are text files too). After you have confirmed this, then you can use the read_csv method in the Pandas library in python.

import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")

Solution 2:[2]

It vastly depends on what is in it. It could be a binary file or it could be a text file.

If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))

If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:

Reading binary file in Python and looping over each byte

Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)

After further into from above and looking at the page the format is:

Data Format The datasets use a format similar as that of the text export format from relational databases:

One header lines with the variables names One line per instance Separator tabulation between the values There are missing values (consecutive tabulations)

Therefore see this answer:

parsing a tab-separated file in Python

I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...

I suspect it doesnt open in sublime because the file is huge, but that is just a guess.

Solution 3:[3]

To get a quick overview of what the file may content you could do this within a terminal, using strings or cat, for example:

$ strings file.data

or

$ cat -v file.data

In case you forget to pass the -v option to cat and if is a binary file you could mess your terminal and therefore need to reset it:

$ reset

Solution 4:[4]

I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.

Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.

I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:

inf = open("processed-1.cleveland.data", "r")

lines = inf.readlines()

for line in lines:
    print(line, end="")

Solution 5:[5]

It works for me.

import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()

I mean that just take it as a csv file if it is seprated with ','. solution from @mustious.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 aalbagarcia
Solution 2 Community
Solution 3 nbari
Solution 4 Wizard
Solution 5 laguarage