Manual Workspace Setup
Webhooks provide an easy 1-click experience to on-board new jobs
The Databricks workspace setup is a one-time setup for your organization. With the webhook tutorial below, all users within an organization will be able to:
Onboard new jobs onto Gradient with a single click through the Gradient UI
Onboard jobs at mass scale
Integrate Gradient without any modifications to your Databricks workflows tasks
Before you begin!
Ensure that you've created a Sync API Key since you'll need that here
A user with admin access to your Databricks workspace is required to complete the steps below
Verify your workspace allows outbound and inbound traffic from your Databricks clusters. The Gradient integration process makes calls to AWS APIs and Sync services hosted at https://api.synccomputing.com. IP Whitelisting may be required.
Step 1: Configure the web-hook
Prior to configuring the notification destination in the Databricks Workspace, we need to retrieve the webhook URL and credentials from the Gradient API. We can use the Sync CLI to do this.
1.1 Create new webhook credentials:
Your <workspace-id> is the "o" parameter on your Databricks URL
Example output:
The webhook credentials returned by this command cannot be retrieved again - so write them down somewhere!
1.2 Create a new webhook destination.
With the webhook URL and credentials, a workspace admin can now create a webhook notification destination. In your Databricks console go to admin > notification destinations > add destination
Set the following parameters in the UI:
Name: "Gradient"
Username: Use the "username" generated from the previous output
Password: Use the "password" generated from the previous output
URL: Use the "url" generated from the previous output
Step 2: Create workspace configuration
Next, you need to configure your Databricks workspace with the webhook and Sync credentials:
Run the sync-cli command create-workspace-config
<plan-type> - Select between
Standard
,Premium
,Enterprise
<webhook-id> - Go back to admin > Notification destinations and edit the "Gradient" webhook. Next to the "Edit destination settings" title, there's a copy button. Click it to copy the Webhook ID (see image below)
Once the command is run, you will need to provide the CLI with following information:
Databricks host
Databricks token
Sync API key ID
Sync API key secret
AWS instance profile ARN (for Databricks on AWS only. See AWS Instance Profile)
Databricks plan type
Webhook ID (same step as <webhook-id> above)
Example output:
Step 3: Integrate workspace
The next step is to download the code used to submit the Spark event logs to Gradient. Once again, we will use the CLI to perform the following tasks:
Adds/updates the init script to the workspace “/Sync Computing” directory
Adds/updates secrets used by the init script and the Sync reporting job
Adds/updates the job run recording/reporting notebook to the workspace in the “/Sync Computing” directory
Adds/updates the Databricks Secrets scope, "Sync Computing | <your Sync tenant id>", used by Gradient to store credentials and configurations
Creates/updates a job with the name “Sync Computing: Record Job Run” that sends up the event log and cluster report for each prediction
Creates/updates and pins an all-purpose cluster with the name “Sync Computing: Job Run Recording” for the prediction job
Run the command sync-cli workspaces apply-workspace-config <workspace-id>
Example Output
Step 4: Verify Permissions to Gradient generated artifacts
The final step is to ensure that all the newly created artifacts are accessible during job runs. By default Databricks jobs have the permissions of the job owner.
Therefore, you should ensure that the owner, directly or through group permissions, can access the following artifacts:
1. The “/Sync Computing” directory
You should be able to see and access the "Sync Computing" directory in your Workspace. See the screenshot below.
2. The "Sync Computing | <your Sync tenant id>" secret scope
You should be able to see and have access to the "Sync Computing | <your Sync tenant id>" secret scope. Check if you can view the scope with the list-scopes command below:
3. The “Sync Computing: <your Sync tenant id> Job Run Recording” cluster
You should be able to see and run the "Sync Computing: <your Sync tenant id> Job Run Recording" cluster in the Databricks console under Compute > All-purpose Compute.
Gradient requires cloud permissions to access cluster information. An instance profile with the correct permissions is required. Please see "AWS additional steps" for instructions on how to create an appropriate instance profile.
Your workspace should now be configured to send logs using Databricks web-hook notifications.
The "Sync Computing: Job Run Recording" cluster is created using the configuration below. If your workspace has any policies enabled that would restrict creation of this cluster, the setup process cannot proceed. In this case, please reach out to us at support@synccomputing.com for further assistance.
Step 5: Select your cloud provider below to complete the installation
Last updated