'How to Run a DataBricks Notebook From Another Notebook with "different cluster"

In Databricks I understand that a notebook can be executed from another notebook but the notebook will run in the current cluster by default.

For eg: I have notebook1 running on cluster1 and I am running notebook2 from notebook1 using below command

dbutils.notebook.run("notebook2", 3600)

but this will run on cluster1, how can I make it run on cluster2 ?



Solution 1:[1]

After digging through dbutils.py, I found a hidden argument to dbutils.notebook.run() called _NotebookHandler__databricks_internal_cluster_spec that accepts a cluster configuration JSON.

If you want to run "notebook2" on a cluster you've already created, you'll simply pass the JSON for that cluster. If you want Databricks to create a new cluster for you, just define the cluster's resources under the key "new_cluster". For example:

cluster_config = '''
{
  "new_cluster": {
      "spark_version": "9.1.x-cpu-ml-scala2.12",
      "spark_conf": {
          "spark.databricks.delta.preview.enabled": "true"
      },
      ...
      "enable_elastic_disk": true,
      "num_workers": 4
  }
}
'''

dbutils.notebook.run('notebook2', 36000, _NotebookHandler__databricks_internal_cluster_spec=cluster_config)

I am only able to test this on Azure Databricks, unfortunately.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1