Asset Bundle

Introduction

A tool to facilitate the adoption of software engineering best practices, including source control, code review, testing, and continuous integration and delivery (CI/CD), for your data and AI projects
To provide a way to include metadata alongside your project's source files and make it possible to describe Databricks resources such as jobs and pipelines as source files.

A bundle includes the following parts:

Required cloud infrastructure and workspace configurations
Source files, such as notebooks and Python files, that include the business logic
Definitions and settings for Databricks resources, such as Lakeflow Jobs, Lakeflow Spark Declarative Pipelines, Dashboards, Model Serving endpoints, MLflow Experiments, and MLflow registered models
Unit tests and integration tests

Project Structure

Folder Structure

my_project/
├── databricks.yml          # Main configuration file (The "Brain")
├── bundle.lock.json        # Auto-generated; tracks deployed resource IDs
├── src/                    # Business logic (Notebooks, Python files)
│   ├── notebook.py
│   └── transform.sql
├── resources/              # Resource definitions (Jobs, Pipelines)
│   ├── my_job.yml
│   └── my_pipeline.yml
└── tests/                  # Unit and integration tests
    └── test_logic.py

Configuration

databricks.yml

bundle:
  name: my-etl-project

include:
  - resources/*.yml

targets:
  # Development target (usually your personal sandbox)
  dev:
    default: true
    mode: development
    workspace:
      host: https://adb-12345.azuredatabricks.net

  # Production target (strict mode, usually run via CI/CD)
  prod:
    mode: production
    workspace:
      host: https://adb-67890.azuredatabricks.net
    run_as:
      service_principal_name: 0000-1111-2222-3333

Job Configuration

my_job.yml

resources:
  jobs:
    daily_etl_job:
      name: "[${bundle.target}] Daily ETL Job"
      tasks:
        - task_key: run_notebook
          notebook_task:
            notebook_path: ../src/notebook.py
          new_cluster:
            spark_version: "14.3.x-scala2.12"
            node_type_id: "Standard_DS3_v2"
            num_workers: 1

Command

databricks bundle init

Initializes a new project from a template.

databricks bundle validate

Checks your YAML for syntax errors or missing fields.

databricks bundle deploy

The main command. Uploads code and creates/updates resources in the workspace.

databricks bundle run <job-key>

Immediately triggers a run of a job defined in the bundle.

databricks bundle destroy

Deletes all resources created by the bundle in the workspace.

Standard Workflow:

Validate: databricks bundle validate
Deploy to Dev: databricks bundle deploy (uses the default target).
Deploy to Prod: databricks bundle deploy -t prod

Reference

What are Declarative Automation Bundles? | Databricks on AWSdocs.databricks.com

PreviousLakeflow Job NextCompute

Last updated 6 days ago

hashtagIntroduction

hashtagProject Structure

hashtagCommand

hashtagReference

Introduction

Project Structure

Command

Reference