'How can I execute and schedule Databricks notebook from Azure Devops Pipeline using YAML
I wanted to do CICD of my azure Databricks notebook using YAML file. I have followed the below flow
- Pushed my code from Databricks notebook to Azure Repos.
- Created a Build using below YAML script.
stages:
- stage: Build
displayName: Build stage
jobs:
- job: Build
displayName: Build
steps:
- task: CopyFiles@2
displayName: 'Copy Files to: $(build.artifactstagingdirectory)'
inputs:
SourceFolder: '$(System.DefaultWorkingDirectory)'
TargetFolder: ' $(build.artifactstagingdirectory)'
- task: PublishBuildArtifacts@1
displayName: 'Publish Artifact: notebooks'
inputs:
ArtifactName: dev_release
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)'
ArtifactName: 'publish build'
publishLocation: 'Container'
By doing above I was able to create a Artifact.
Now I have added another task to deploy that artifact to my Databricks workspace. By using below YAML Script.
- stage: Deploy
displayName: Deploy stage
jobs:
- job: Deploy
displayName: Deploy
pool:
vmImage: 'vs2017-win2016'
steps:
- task: DownloadBuildArtifacts@0
inputs:
buildType: 'current'
downloadType: 'single'
artifactName: 'dev_release'
downloadPath: '$(System.ArtifactsDirectory)'
- task: databricksDeployScripts@0
inputs:
authMethod: 'bearer'
bearerToken: 'dapj0ee865674cd9tfb583dbad61b78ce9b1-4'
region: 'Central US'
localPath: '$(System.DefaultWorkingDirectory)'
databricksPath: '/Shared'
Now i want to run the deployed notebook from here only. So I have "Configure Databricks CLI" task and "Execute Databricks" task to execute the note book.
Got below Error:
##[error]Error: Unable to locate executable file: 'databricks'. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also verify the file has a valid extension for an executable file.
##[error]The given notebook does not exist.
How can I execute notebook from Azure DevOps. My notebooks are in Scala Language.
Is there any other way to use in Production servers.
Solution 1:[1]
As you have deployed the Databricks Notebook using Azure DevOps and asking for any other way to run it, I would like to suggest you Azure Data Factory Service.
In Azure Data Factory, you can create pipeline that executes a Databricks notebook against the Databricks jobs cluster. You can also pass Azure Data Factory parameters to the Databricks notebook during execution.
Follow the official tutorial to Run Databricks Notebook with Databricks Notebook Activity in Azure Data Factory to deploy and run Databrick Notebook.
Additionally, you can schedule the pipeline trigger at any particular time or event to make the process completely automatic. Refer https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
Solution 2:[2]
try this :
- job: job_name
displayName: test job
pool:
name: agent_name(selfhostedagent)
#pool:
workspace:
clean: all
steps:
- checkout: none
- task: DownloadBuildArtifacts@0
displayName: 'Download Build Artifacts'
inputs:
artifactName: app
downloadPath: $(System.DefaultWorkingDirectory)
- task: riserrad.azdo-databricks.azdo-databricks-configuredatabricks.configuredatabricks@0
displayName: 'Configure Databricks CLI'
inputs:
url: '$(Databricks_URL)'
token: '$(Databricks_PAT)'
- task: riserrad.azdo-databricks.azdo-databricks-deploynotebooks.deploynotebooks@0
displayName: 'Deploy Notebooks to Workspace'
inputs:
notebooksFolderPath: '$(System.DefaultWorkingDirectory)/app/path/to/notebbok'
workspaceFolder: /Shared
- task: riserrad.azdo-databricks.azdo-databricks-executenotebook.executenotebook@0
displayName: 'Execute /Shared/path/to/notebook'
inputs:
notebookPath: '/Shared/path/to/notebook'
existingClusterId: '$(cluster_id)'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | UtkarshPal-MT |
Solution 2 | Alex Ott |