'h2o.ai H2OResponseError: Server error water.exceptions.H2ONotFoundArgumentException: Error: File does not exist

Using h2o on python in jupyter notebook and getting error message:

...
/home/mapr/anaconda2/lib/python2.7/site-packages/h2o/backend/connection.pyc in _process_response(response, save_to)
    723         # Client errors (400 = "Bad Request", 404 = "Not Found", 412 = "Precondition Failed")
    724         if status_code in {400, 404, 412} and isinstance(data, (H2OErrorV3, H2OModelBuilderErrorV3)):
--> 725             raise H2OResponseError(data)
    726 
    727         # Server errors (notably 500 = "Server Error")
H2OResponseError: Server error water.exceptions.H2ONotFoundArgumentException: 
Error: File <path to data file I'm trying to import> does not exist.

when trying to import data with

train = h2o.import_file(path = os.path.realpath("relative path to data file"))

Yet the file does in fact exist on the specified path. Why would this be happening?

Details

Following h2o deeplearning example for accessing h2o service from python code in a jupyter notebook. Everything works fine up until the part where need to import .csv data, eg.

spiral = h2o.import_file(path = os.path.realpath("../data/spiral.csv")) 

At which point the error above is raised. The source code comments that

# In this case, the cluster is running on our laptops. Data files are imported by their relative locations to this notebook.

Yet, when running

os.path.exists(os.path.realpath("./data/<my data csv file>"))

in the notebook, the response is true. So it seems like the relative path is recognized by the python os package*, but there is some problem with the h2o.import_file() method.

What could be going on here? Thanks.

Note: that I'm am using port forwarding from the machine actually running the h2o and jupyter-notebook services with something like:

remote machine:

$jupyter-notebook --no-browser --port=8889

local machine:

$ssh -N -L localhost:8888:localhost:8889 myuser@mnode01

* The directory structure is:

bin  
data
  |
  |_____ mydata.csv  
include  
lib  
remote-h2o.ipynb

UPDATE

Think have found the problem. The h2o python docs specify that

The path to the data must be a valid path for each node in the H2O cluster. If some node in the H2O cluster cannot see the file, then an exception will be thrown by the H2O cluster.

This raises the question, that does this mean that all of the cluster nodes need to have the same virtualenv (with same absolute path) that I am running the jupyter notebook and holding the data/mydata.csv in?

h2o


Solution 1:[1]

I had the same problem and solved it by changing the port in the h2o.init(port='XXXXX')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Quentin Moreau