Comment on page
Import Jobs to Projects
Projects is a solution to continuously optimize and monitor a repeat production Apache Spark workload. To implement projects, integration of the Sync library in a user's orchestration system (e.g. Airflow, Databricks Workflows) is necessary.
Once integrated, the Gradient UI will provide high level metrics and easy to use controls to monitor and manage your Apache Spark clusters.
From the Projects tab, click on the
Select the Databricks option. Sync can support Apache Spark run on EMR as well. Contact Sync to learn more via the Intercom chat button on the lower right hand corner.
For Databricks projects you can use the Databricks Auto Import wizard to easily create multiple projects, each linked to a Databricks Job in your workspace.
NOTICE: The import wizard will make the following changes to your selected Databricks Jobs:
- 1.Add the web-hook notification destination to the job so that we are notify on every successful run
- 2.Update the job cluster with the init script, env vars, and instance profile to collect worker instance and volume information.
Review the compatible Databricks jobs and select the jobs for which you would like to create a Gradient project and select
create projectsfor each of the selected jobs. By creating a project, the following properties will be added for each.
- Community Edition accounts are limited to only 3 Projects. To create more Projects, sign up for an Enterprise account.
You should now see the project[s] you created on you Projects summary dashboard. New projects will have a status of "Pending Setup" until the project is configured to receive logs for recommendations.
Sync Project ID: Each project is associated with a project_id parameter. This number will be important for future steps to link Databricks with Gradient. This number can be found at the top of the page within each project.