Links
Comment on page

Import Jobs to Projects

Introduction

Projects is a solution to continuously optimize and monitor a repeat production Apache Spark workload. To implement projects, integration of the Sync library in a user's orchestration system (e.g. Airflow, Databricks Workflows) is necessary.
Once integrated, the Gradient UI will provide high level metrics and easy to use controls to monitor and manage your Apache Spark clusters.

1. Add a new project

From the Projects tab, click on the
button.

2. Select the Databricks platform

Select the Databricks option. Sync can support Apache Spark run on EMR as well. Contact Sync to learn more via the Intercom chat button on the lower right hand corner.

3. Import multiple Databricks jobs

For Databricks projects you can use the Databricks Auto Import wizard to easily create multiple projects, each linked to a Databricks Job in your workspace.
The Auto Import wizard connects to your specified Databricks workspace using a Databricks Personal Access Token. This token is only used for this session and is not stored by Sync.
NOTICE: The import wizard will make the following changes to your selected Databricks Jobs:
  1. 1.
    Add the web-hook notification destination to the job so that we are notify on every successful run
  2. 2.
    Update the job cluster with the init script, env vars, and instance profile to collect worker instance and volume information.

4. Review the candidate jobs

Review the compatible Databricks jobs and select the jobs for which you would like to create a Gradient project and select create projects for each of the selected jobs. By creating a project, the following properties will be added for each.
  • If you want to manually import a single job, follow the manual single job import instructions.
  • Community Edition accounts are limited to only 3 Projects. To create more Projects, sign up for an Enterprise account.

6. View new projects

You should now see the project[s] you created on you Projects summary dashboard. New projects will have a status of "Pending Setup" until the project is configured to receive logs for recommendations.
Sync Project ID: Each project is associated with a project_id parameter. This number will be important for future steps to link Databricks with Gradient. This number can be found at the top of the page within each project.