'How to open a .data file extension
I am working on side stuff where the data provided is in a .data
file. How do I open a .data
file to see what the data looks like and also how do I read from a .data
file programmatically through python? I have Mac OSX
NOTE: The Data I am working with is for one of the KDD cup challenges
Solution 1:[1]
Kindly try using Notepad or Gedit to check delimiters in the file (.data
files are text files too). After you have confirmed this, then you can use the read_csv
method in the Pandas library in python.
import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")
Solution 2:[2]
It vastly depends on what is in it. It could be a binary file or it could be a text file.
If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))
If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:
Reading binary file in Python and looping over each byte
Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)
After further into from above and looking at the page the format is:
Data Format The datasets use a format similar as that of the text export format from relational databases:
One header lines with the variables names One line per instance Separator tabulation between the values There are missing values (consecutive tabulations)
Therefore see this answer:
parsing a tab-separated file in Python
I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...
I suspect it doesnt open in sublime because the file is huge, but that is just a guess.
Solution 3:[3]
To get a quick overview of what the file may content you could do this within a terminal, using strings
or cat
, for example:
$ strings file.data
or
$ cat -v file.data
In case you forget to pass the -v
option to cat and if is a binary file you could mess your terminal and therefore need to reset it:
$ reset
Solution 4:[4]
I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.
Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.
I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:
inf = open("processed-1.cleveland.data", "r")
lines = inf.readlines()
for line in lines:
print(line, end="")
Solution 5:[5]
It works for me.
import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()
I mean that just take it as a csv file if it is seprated with ','. solution from @mustious.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | aalbagarcia |
Solution 2 | Community |
Solution 3 | nbari |
Solution 4 | Wizard |
Solution 5 | laguarage |