'h2o.ai H2OResponseError: Server error water.exceptions.H2ONotFoundArgumentException: Error: File does not exist
Using h2o
on python in jupyter notebook and getting error message:
...
/home/mapr/anaconda2/lib/python2.7/site-packages/h2o/backend/connection.pyc in _process_response(response, save_to)
723 # Client errors (400 = "Bad Request", 404 = "Not Found", 412 = "Precondition Failed")
724 if status_code in {400, 404, 412} and isinstance(data, (H2OErrorV3, H2OModelBuilderErrorV3)):
--> 725 raise H2OResponseError(data)
726
727 # Server errors (notably 500 = "Server Error")
H2OResponseError: Server error water.exceptions.H2ONotFoundArgumentException:
Error: File <path to data file I'm trying to import> does not exist.
when trying to import data with
train = h2o.import_file(path = os.path.realpath("relative path to data file"))
Yet the file does in fact exist on the specified path. Why would this be happening?
Details
Following h2o
deeplearning example for accessing h2o
service from python code in a jupyter notebook. Everything works fine up until the part where need to import .csv
data, eg.
spiral = h2o.import_file(path = os.path.realpath("../data/spiral.csv"))
At which point the error above is raised. The source code comments that
# In this case, the cluster is running on our laptops. Data files are imported by their relative locations to this notebook.
Yet, when running
os.path.exists(os.path.realpath("./data/<my data csv file>"))
in the notebook, the response is true
. So it seems like the relative path is recognized by the python os package*, but there is some problem with the h2o.import_file() method.
What could be going on here? Thanks.
Note: that I'm am using port forwarding from the machine actually running the h2o and jupyter-notebook services with something like:
remote machine:
$jupyter-notebook --no-browser --port=8889
local machine:
$ssh -N -L localhost:8888:localhost:8889 myuser@mnode01
* The directory structure is:
bin
data
|
|_____ mydata.csv
include
lib
remote-h2o.ipynb
UPDATE
Think have found the problem. The h2o python docs specify that
The path to the data must be a valid path for each node in the H2O cluster. If some node in the H2O cluster cannot see the file, then an exception will be thrown by the H2O cluster.
This raises the question, that does this mean that all of the cluster nodes need to have the same virtualenv (with same absolute path) that I am running the jupyter notebook and holding the data/mydata.csv in?
Solution 1:[1]
I had the same problem and solved it by changing the port in the h2o.init(port='XXXXX')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Quentin Moreau |