Sync Docs
Sync HomeLaunch GradientBook Demo
  • Sync Gradient
    • The Gradient Platform
      • How Does it Work?
    • Discover Quickstart
    • Add Workspace
      • Create Sync API Key
      • Add Databricks Workspace
        • AWS Databricks Setup
          • EventBridge Setup
        • Azure Databricks Setup
      • Webhook Setup
    • Project Setup
      • Import Jobs to Projects
      • Verify and Run Jobs
      • Generate and Apply Recommendation
    • Advanced Use Cases
      • Install the Sync-CLI
      • Manual Workspace Setup
        • AWS Instance Profile
      • Apache Airflow for Databricks
      • Gradient Terraform Integration
    • Project Settings
    • Account Settings
    • ROI Reporting
    • FAQ
  • Tutorials & Best Practices
    • Running Gradient in Production
      • Production Auto-Enabled
      • Optimization Windows
      • Development Clones
    • Demos
  • Developer Docs
    • Resources
    • Sync Python Library
    • Gradient CLI Walkthrough
  • Security
    • Privacy and Security Compliance
  • Trust Center
    • Portal
  • Product Announcements
    • Product Updates
  • Need Help?
    • Troubleshooting Guide
Powered by GitBook
On this page
  • AWS Instance profile access
  • Step1: Create a new Role in your AWS Console
  • Step 2: Add an in-line permission to the Role
  • Step 3: Add the instance profile to Databricks

Was this helpful?

Export as PDF
  1. Sync Gradient
  2. Advanced Use Cases
  3. Manual Workspace Setup

AWS Instance Profile

PreviousManual Workspace SetupNextApache Airflow for Databricks

Last updated 1 year ago

Was this helpful?

** Only needed for Self-Hosted collection in AWS Databricks **

You'll need to set up an AWS instance profile for Self-Hosted collection in AWS Databricks. This instance profile will be used by the cluster, running in your environment, to send logs to Gradient.

AWS Instance profile access

The Gradient Agent needs AWS access to retrieve instance market information during job execution. To access this information, Gradient uses which will leverage permissions granted through the cluster's instance profile. See Example AWS Profile below for required permissions.

Gradient reads and writes logs to the storage path defined in the cluster delivery configuration. If the logs are configured to be delivered to an S3 location, the cluster instance profile must have permission to read and write data to the S3 destination and it must include putObjectAcl permission.

Consult your company - The steps below outline a method to create an instance profile from scratch via the AWS console. Please consult with your company to understand the best way to create or modify existing instance profiles that is in line with your own policies.

Step1: Create a new Role in your AWS Console

In your AWS console, go to IAM > Roles and click on Create role

Select AWS service as the entity type and EC2 as the service

Gradient does not need any additional permissions at this point. Implement any default permissions you may need. If none are needed, click next.

Insert a name for the role. Below we use sync-minimum-access. Click on create Role once completed.

Step 2: Add an in-line permission to the Role

Click into the Role you just created, and under Permissions, click on Add permission > create inline policy

Click on the JSON editor

Copy and paste the code block below into the JSON policy editor.

Be sure to update <your-s3-bucket-path> to be the same s3 bucket path as where you store your Databricks logs (screen-shot from the Databricks cluster).

If you are using dbfs:// to store your logs, you don't need the s3 permission block in the instance profile script below

{
    "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "ec2:DescribeInstances",
                    "ec2:DescribeVolumes"
                ],
                "Resource": "*"
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:ListBucket",
                    "s3:PutObjectAcl"
                ],
                "Resource": [
                    "arn:aws:s3:::<your-s3-bucket-path>"
                ]
            }
        ]
}

Click on Next. On the next page click on Create Policy.

Step 3: Add the instance profile to Databricks

In the Databricks admin page, go to Instance profiles and click on "Add instance profile"

On the next page copy and paste the "Instance profile ARN" and "IAM role ARN" values from the AWS console Role's page. Click "add" to complete.

Done! You should now be able to select this instance profile in the cluster page of your jobs

Boto3
Screen shot of the Databricks cluster logging path, this should match the s3 bucket location in your instance profile