Sync Docs
Sync HomeLaunch GradientBook Demo
  • Sync Gradient
    • The Gradient Platform
      • How Does it Work?
    • Discover Quickstart
    • Add Workspace
      • Create Sync API Key
      • Add Databricks Workspace
        • AWS Databricks Setup
          • EventBridge Setup
        • Azure Databricks Setup
      • Webhook Setup
    • Project Setup
      • Import Jobs to Projects
      • Verify and Run Jobs
      • Generate and Apply Recommendation
    • Advanced Use Cases
      • Install the Sync-CLI
      • Manual Workspace Setup
        • AWS Instance Profile
      • Apache Airflow for Databricks
      • Gradient Terraform Integration
    • Project Settings
    • Account Settings
    • ROI Reporting
    • FAQ
  • Tutorials & Best Practices
    • Running Gradient in Production
      • Production Auto-Enabled
      • Optimization Windows
      • Development Clones
    • Demos
  • Developer Docs
    • Resources
    • Sync Python Library
    • Gradient CLI Walkthrough
  • Security
    • Privacy and Security Compliance
  • Trust Center
    • Portal
  • Product Announcements
    • Product Updates
  • Need Help?
    • Troubleshooting Guide
Powered by GitBook
On this page
  • Step 1: Configure the web-hook
  • 1.1 Create new webhook credentials:
  • 1.2 Create a new webhook destination.
  • Step 2: Create workspace configuration
  • Step 3: Integrate workspace
  • Step 4: Verify Permissions to Gradient generated artifacts
  • 1. The “/Sync Computing” directory
  • 2. The "Sync Computing | <your Sync tenant id>" secret scope
  • 3. The “Sync Computing: <your Sync tenant id> Job Run Recording” cluster
  • Step 5: Select your cloud provider below to complete the installation

Was this helpful?

Export as PDF
  1. Sync Gradient
  2. Advanced Use Cases

Manual Workspace Setup

Webhooks provide an easy 1-click experience to on-board new jobs

PreviousInstall the Sync-CLINextAWS Instance Profile

Last updated 1 year ago

Was this helpful?

The Databricks workspace setup is a one-time setup for your organization. With the webhook tutorial below, all users within an organization will be able to:

  • Onboard new jobs onto Gradient with a single click through the Gradient UI

  • Onboard jobs at mass scale

  • Integrate Gradient without any modifications to your Databricks workflows tasks

Before you begin!

  • Ensure that you've since you'll need that here

  • Install the on your dev box using the instructions

  • A user with admin access to your Databricks workspace is required to complete the steps below

  • Verify your workspace allows outbound and inbound traffic from your Databricks clusters. The Gradient integration process makes calls to AWS APIs and Sync services hosted at https://api.synccomputing.com. IP Whitelisting may be required.

Step 1: Configure the web-hook

Prior to configuring the notification destination in the Databricks Workspace, we need to retrieve the webhook URL and credentials from the Gradient API. We can use the to do this.

1.1 Create new webhook credentials:

sync-cli workspaces reset-webhook-creds <workspace-id>

Your <workspace-id> is the "o" parameter on your Databricks URL

Example output:

sync-cli workspaces reset-webhook-creds 839284039492
{                           
  "username": "290s381e-8ep4-4d6a-84d4-433d84897fsc",
  "password": "jc0dUD8zd44Uwid26jGI",
  "url": "https://api.synccomputing.com/integrations/v1/databricks/notify"
}

The webhook credentials returned by this command cannot be retrieved again - so write them down somewhere!

1.2 Create a new webhook destination.

Set the following parameters in the UI:

  • Name: "Gradient"

  • Username: Use the "username" generated from the previous output

  • Password: Use the "password" generated from the previous output

  • URL: Use the "url" generated from the previous output

Step 2: Create workspace configuration

Next, you need to configure your Databricks workspace with the webhook and Sync credentials:

Run the sync-cli command create-workspace-config

sync-cli workspaces create-workspace-config /
--databricks-plan-type <plan-type> /
--databricks-webhook-id <webhook-id> / 
<workspace-id>
  • <plan-type> - Select between Standard, Premium, Enterprise

  • <webhook-id> - Go back to admin > Notification destinations and edit the "Gradient" webhook. Next to the "Edit destination settings" title, there's a copy button. Click it to copy the Webhook ID (see image below)

Once the command is run, you will need to provide the CLI with following information:

  1. Databricks host

  2. Databricks token

  3. Sync API key ID

  4. Sync API key secret

  5. AWS instance profile ARN (for Databricks on AWS only. See AWS Instance Profile)

  6. Databricks plan type

  7. Webhook ID (same step as <webhook-id> above)

Example output:

% sync-cli workspaces create-workspace-config /
--instance-profile-arn arn:aws:iam::481126062844:instance-profile/sync-minimum-access /
--databricks-plan-type Enterprise /
--databricks-webhook-id 8bd3b048-e496-4u09-b9de-4e2298e117y6 /
656201176161048          

Databricks host (prefix with https://) [https://dbc-d85uga-1d40.cloud.databricks.com]:                                                                                                                                                                                                                                  
Databricks token:                                                                                                                                                                                                                                                                                                         
Sync API key ID [SXbT6fduHB8FfPPy5psUdP5g7cS9SPm]:                                                                                                                                                                                                                                                                       
Sync API key secret:                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                          
{                                                                                                                                                                                                                                                                                                                         
  "workspace_id": "3522015453188848",                                                                                                                                                                                                                                                                                                 
  "databricks_host": "**********",                                                                                                                                                                                                                                                                                        
  "databricks_token": "**********",                                                                                                                          
  "sync_api_key_id": "**********",                                                                                                                           
  "sync_api_key_secret": "**********",                                                                                                                       
  "instance_profile_arn": "arn:aws:iam::123123565455:instance-profile/sync-minimum-access",                                                                                                                                                                                                                               
  "webhook_id": "7465b068-e490-4a87-b9ce-4e8740e123c6",                                                                                                      
  "plan_type": "Standard"                                                                                                                                  
}

Step 3: Integrate workspace

The next step is to download the code used to submit the Spark event logs to Gradient. Once again, we will use the CLI to perform the following tasks:

  1. Adds/updates the init script to the workspace “/Sync Computing” directory

  2. Adds/updates secrets used by the init script and the Sync reporting job

  3. Adds/updates the job run recording/reporting notebook to the workspace in the “/Sync Computing” directory

  4. Adds/updates the Databricks Secrets scope, "Sync Computing | <your Sync tenant id>", used by Gradient to store credentials and configurations

  5. Creates/updates a job with the name “Sync Computing: Record Job Run” that sends up the event log and cluster report for each prediction

  6. Creates/updates and pins an all-purpose cluster with the name “Sync Computing: Job Run Recording” for the prediction job

Run the command sync-cli workspaces apply-workspace-config <workspace-id>

Example Output

% sync-cli workspaces apply-workspace-config 789564875555745                                                                                                                                                                                           
Workspace synchronized

Step 4: Verify Permissions to Gradient generated artifacts

The final step is to ensure that all the newly created artifacts are accessible during job runs. By default Databricks jobs have the permissions of the job owner.

Therefore, you should ensure that the owner, directly or through group permissions, can access the following artifacts:

1. The “/Sync Computing” directory

You should be able to see and access the "Sync Computing" directory in your Workspace. See the screenshot below.

2. The "Sync Computing | <your Sync tenant id>" secret scope

You should be able to see and have access to the "Sync Computing | <your Sync tenant id>" secret scope. Check if you can view the scope with the list-scopes command below:

databricks secrets list-scopes

3. The “Sync Computing: <your Sync tenant id> Job Run Recording” cluster

You should be able to see and run the "Sync Computing: <your Sync tenant id> Job Run Recording" cluster in the Databricks console under Compute > All-purpose Compute.

Your workspace should now be configured to send logs using Databricks web-hook notifications.

The "Sync Computing: Job Run Recording" cluster is created using the configuration below. If your workspace has any policies enabled that would restrict creation of this cluster, the setup process cannot proceed. In this case, please reach out to us at support@synccomputing.com for further assistance.

{
    "cluster_name": "Sync Computing | <sync-tenant-id>: Job Run Recording",
    "spark_version": "13.3.x-scala2.12",
    "aws_attributes": {
        "instance_profile_arn": "<your instance profile ARN>"
    },
    "node_type_id": "m4.large",
    "driver_node_type_id": "m4.large",
    "custom_tags": {
        "sync:tenant-id": "<sync-tenant-id>"
    },
    "spark_env_vars": {
        "DATABRICKS_HOST": "{{secrets/Sync Computing | <sync-tenant-id>/DATABRICKS_HOST}}",
        "DATABRICKS_TOKEN": "{{secrets/Sync Computing | <sync-tenant-id>/DATABRICKS_TOKEN}}",
        "SYNC_API_KEY_ID": "{{secrets/Sync Computing | <sync-tenant-id>/SYNC_API_KEY_ID}}",
        "SYNC_API_KEY_SECRET": "{{secrets/Sync Computing | <sync-tenant-id>/SYNC_API_KEY_SECRET}}",
        "SYNC_API_URL": "https://api.synccomputing.com"
    },
    "autotermination_minutes": 10,
    "enable_elastic_disk": false,
    "disk_spec": {
        "disk_type": {
            "ebs_volume_type": "GENERAL_PURPOSE_SSD"
        },
        "disk_count": 1,
        "disk_size": 32
    },
    "enable_local_disk_encryption": false,
    "data_security_mode": "NONE",
    "runtime_engine": "STANDARD",
    "num_workers": 0
}

Step 5: Select your cloud provider below to complete the installation

With the webhook URL and credentials, a workspace admin can now create a . In your Databricks console go to admin > notification destinations > add destination

Gradient requires cloud permissions to access cluster information. An instance profile with the correct permissions is required. Please see "" for instructions on how to create an appropriate instance profile.

webhook notification destination
AWS additional steps
created a Sync API Key
Sync CLI
here
Sync CLI
Where to find the worksapce-id parameter from your Databricks address
New notification destination to set up a webhook
Where to find the webhook-id (note - you have to ciick on the copy button)
Workspace directory where the Sync Computing directory should be seen
The Job Run Recording cluster
Cover

Additional one-time steps for AWS Databricks users

Cover

Additional one-time steps for Azure Databricks users