Asset Bundle

Introduction

  • A tool to facilitate the adoption of software engineering best practices, including source control, code review, testing, and continuous integration and delivery (CI/CD), for your data and AI projects

  • To provide a way to include metadata alongside your project's source files and make it possible to describe Databricks resources such as jobs and pipelines as source files.

A bundle includes the following parts:

  • Required cloud infrastructure and workspace configurations

  • Source files, such as notebooks and Python files, that include the business logic

  • Definitions and settings for Databricks resources, such as Lakeflow Jobs, Lakeflow Spark Declarative Pipelines, Dashboards, Model Serving endpoints, MLflow Experiments, and MLflow registered models

  • Unit tests and integration tests

Project Structure

  • Folder Structure

  • Configuration

  • Job Configuration

Command

databricks bundle init

Initializes a new project from a template.

databricks bundle validate

Checks your YAML for syntax errors or missing fields.

databricks bundle deploy

The main command. Uploads code and creates/updates resources in the workspace.

databricks bundle run <job-key>

Immediately triggers a run of a job defined in the bundle.

databricks bundle destroy

Deletes all resources created by the bundle in the workspace.

Standard Workflow:

  1. Validate: databricks bundle validate

  2. Deploy to Dev: databricks bundle deploy (uses the default target).

  3. Deploy to Prod: databricks bundle deploy -t prod

Reference

Last updated