'Databricks dbfs file read issue
I am trying to open a file that i uploaded to the dbfs location. However, I get error while trying to open the file but I can see the file when I do a ls. Also there is no issue while reading the file to a RDD. Can someone explain the behavior of dbfs? I tried several times after going through the documentation aswell. This is the documentation I followed.
#ls
dbutils.fs.ls("/tmp/sample.txt")
Out[82]: [FileInfo(path='dbfs:/tmp/sample.txt', name='sample.txt', size=46044136)]
#creating RDD from the txt file
data_file = "/tmp/sample.txt"
raw_data = sc.textFile(data_file)
raw_data.take(1)
Out[99]: ["Oct 12 2009 \tNice trendy hotel location not too bad...........\t"]
#open the txt file
with open ("/tmp/sample.txt" , 'r') as f:
for i, line in enumerate (f):
if (i%10000==0):
print("read {0} reviews".format(i))
print (gensim.utils.simple_preprocess(line))
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/sample.txt'
#as per documentation
with open ("/dbfs/tmp/sample.txt" , 'r') as f:
for i, line in enumerate (f):
if (i%10000==0):
print("read {0} reviews".format(i))
print (gensim.utils.simple_preprocess(line))
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/sample.txt'
Been scratching my head on this. Any help will be greatly appreciated.
P.S. I am using community edition of Databricks if that helps.
Solution 1:[1]
This is a limitation of Community Edition with DBR >= 7.x. If you want to access that DBFS file locally then you can use dbutils.fs.cp('dbfs:/file', 'file:/local-path')
(or %fs cp dbfs:/file file:/local-path
) to copy file from DBFS to local file system where you can work with it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Alex Ott |