Gradient Terraform Integration

Integrating Gradient into your Terraform process typically involves the following steps:

Include Workspace and Job Configuration in your Terraform Plan
Configure Terraform to ignore recommendation fields when detecting drift
Let Gradient “auto-apply” recommendations directly to your Databricks Job via Databricks API

Databricks Workspace and Job Configurations Used by Gradient

Gradient utilizes Databricks webhook notification destinations to be notified upon the start of managed Databricks Jobs. Each notification destination should be incorporated into your infrastructure management process to maintain Gradient configuration within your Databricks workspace definition. See example.

Databricks Workspace Webhook Notification Destination

resource "databricks_notification_destination" "sync_webhook" {
  display_name = "Notification Destination"
  config {
    generic_webhook {
      url      = "https://example.com/webhook"
      username = "username" // Optional
      password = "password" // Optional
    }
  }
}

Additionally, each workflow cluster being managed by Gradient should reference this webhook.

Databricks Job Notifications

resource "databricks_job" "example_job" {
  name = "example job"
  ...
  webhook_notifications {
    on_start {
      id = databricks_notification_destination.sync_webhook.id
    }
  }

Managing Terraform Drift

If you are using terraform plan to tell you when there is a configuration drift of resources created by Terraform, we recommend you use one of the following methods to omit Databricks Job cluster configurations generated by Gradient. This will avoid the most recent cluster configuration from being overwritten by Terraform.

Ignore the Entire Cluster Configuration

Specifying ‘ignore_chages = all’ under ‘lifecycle’ definition of the entire cluster configuration will result in the entire cluster configuration being ignored by the drift detection process.

resource "databricks_cluster" "single_node" {
  cluster_name            = "Single Node"
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 20

  spark_conf = {
    # Single-node
    "spark.databricks.cluster.profile" : "singleNode"
    "spark.master" : "local[*]"
  }

  custom_tags = {
    "ResourceClass" = "SingleNode"
  }

  lifecycle {
    ignore_changes = all
  }

Ignore only the Cluster Configurations Managed by Gradient

Explicitly specifying which configurations to ignore allows configurations not managed by Gradient to be evaluated by the drift detection process. However, it is important to notes that these configurations may change as new features are added to Gradient.

resource "databricks_cluster" "example_cluster" {
  cluster_name  = "example-cluster"
  spark_version = "7.3.x-scala2.12"
  node_type_id  = "i3.xlarge"
  num_workers   = 2


  custom_tags = {
    "sync:project-id" = "<insert-project-id>" # customer needs to add their project id tag
    # ...other tags
  }
  # other configurations...

  lifecycle {
    ignore_changes = [ # Fields sync modifies
      num_workers,
      node_type_id
      # Note: Gradient also modifies EBS volumes
    ]
  }
}

Apply All Recommendations to Terraform

If you choose not "ignore changes" and want to reintegrate the recommendations back into their terraform resource, you can retrieve the latest recommendation using the following function in the Sync Python Library:

sync.api.projects.get_latest_project_config_recommendation(project_id: str) → Optional[sync.api.projects.Response[dict]]

Get Latest Project Configuration Recommendation.

Parameters
project_id (str) – project ID

Returns
Project Configuration Recommendation object

Return type
Response object or None

This function returns a Python dictionary containing the recommended cluster configuration for the project. Parse and persist this data in the format required by your infrastructure management process.

Auto-Apply Recommendations

To avoid manually applying recommendations, you can also enable Auto-Apply in the "Edit settings" button in the Gradient project page. If this option is enabled, recommendations will be automatically applied after each run of your job.

This setting is applicable only to Databricks Workflows. Auto-Apply is not applicable if you're using the DatabricksSubmitRunOperator or Databricks /api/2.1/jobs/runs/submit API.

The Auto-Apply setting is applicable only to Databricks Workflows.

Auto-Apply is not applicable if you're using the DatabricksSubmitRunOperator or Databricks /api/2.1/jobs/runs/submit API.

PreviousApache Airflow for Databricks NextProject Settings

Last updated 11 months ago

Was this helpful?