Import Jobs to Projects

Introduction

Projects is a solution to continuously optimize and monitor a repeat production Databricks workloads. To implement projects, integration of the Sync library in a user's orchestration system (e.g. Airflow, Databricks Workflows) is necessary.

Once integrated, the Gradient UI will provide high level metrics and easy to use controls to monitor and manage your Apache Spark clusters.

1. Add a new project

2. Import multiple Databricks jobs

Use the Databricks Auto Import wizard to easily create multiple projects, each linked to a Databricks Job in your workspace.

The Auto Import wizard connects to your specified Databricks workspace using a Databricks Personal Access Token obtained during the Add Databricks Workspace step.

The import wizard requires at least 1 successful run of your job within Databricks console. (So if you created a fresh job, never run before, the import wizard won't see it)

NOTICE: The import wizard will make the following changes to your selected Databricks Jobs:

  1. Add the web-hook notification destination to the job so that we are notify on every successful run

  2. Update the job cluster with the init script, env vars, and instance profile to collect worker instance and volume information.

3. Review the candidate jobs

Review the compatible Databricks jobs and select the jobs for which you would like to create a Gradient project and select create projects for each of the selected jobs. By creating a project, the following properties will be added for each.

  • If you want to manually import a single job, follow the manual single job import instructions.

  • Community Edition accounts are limited to only 3 Projects. To create more Projects, sign up for an Enterprise account.

4. View new projects

You should now see the project[s] you created on you Projects summary dashboard. New projects will have a status of "Pending Setup" until the project is configured to receive logs for recommendations.

Sync Project ID: Each project is associated with a project_id parameter. This number will be important for future steps to link Databricks with Gradient. This number can be found at the top of the page within each project.

Last updated