'How to Run a DataBricks Notebook From Another Notebook with "different cluster"
In Databricks I understand that a notebook can be executed from another notebook but the notebook will run in the current cluster by default.
For eg: I have notebook1 running on cluster1 and I am running notebook2 from notebook1 using below command
dbutils.notebook.run("notebook2", 3600)
but this will run on cluster1, how can I make it run on cluster2 ?
Solution 1:[1]
After digging through dbutils.py
, I found a hidden argument to dbutils.notebook.run()
called _NotebookHandler__databricks_internal_cluster_spec
that accepts a cluster configuration JSON.
If you want to run "notebook2" on a cluster you've already created, you'll simply pass the JSON for that cluster. If you want Databricks to create a new cluster for you, just define the cluster's resources under the key "new_cluster". For example:
cluster_config = '''
{
"new_cluster": {
"spark_version": "9.1.x-cpu-ml-scala2.12",
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true"
},
...
"enable_elastic_disk": true,
"num_workers": 4
}
}
'''
dbutils.notebook.run('notebook2', 36000, _NotebookHandler__databricks_internal_cluster_spec=cluster_config)
I am only able to test this on Azure Databricks, unfortunately.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |