AWS Instance Profile

** Only needed for Self-Hosted collection in AWS Databricks **

You'll need to set up an AWS instance profile for Self-Hosted collection in AWS Databricks. This instance profile will be used by the cluster, running in your environment, to send logs to Gradient.

AWS Instance profile access

The Gradient Agent needs AWS access to retrieve instance market information during job execution. To access this information, Gradient uses Boto3 which will leverage permissions granted through the cluster's instance profile. See Example AWS Profile below for required permissions.

Gradient reads and writes logs to the storage path defined in the cluster delivery configuration. If the logs are configured to be delivered to an S3 location, the cluster instance profile must have permission to read and write data to the S3 destination and it must include putObjectAcl permission.

Consult your company - The steps below outline a method to create an instance profile from scratch via the AWS console. Please consult with your company to understand the best way to create or modify existing instance profiles that is in line with your own policies.

Step1: Create a new Role in your AWS Console

In your AWS console, go to IAM > Roles and click on Create role

Select AWS service as the entity type and EC2 as the service

Gradient does not need any additional permissions at this point. Implement any default permissions you may need. If none are needed, click next.

Insert a name for the role. Below we use sync-minimum-access. Click on create Role once completed.

Step 2: Add an in-line permission to the Role

Click into the Role you just created, and under Permissions, click on Add permission > create inline policy

Click on the JSON editor

Copy and paste the code block below into the JSON policy editor.

Be sure to update <your-s3-bucket-path> to be the same s3 bucket path as where you store your Databricks logs (screen-shot from the Databricks cluster).

If you are using dbfs:// to store your logs, you don't need the s3 permission block in the instance profile script below

{
    "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "ec2:DescribeInstances",
                    "ec2:DescribeVolumes"
                ],
                "Resource": "*"
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:ListBucket",
                    "s3:PutObjectAcl"
                ],
                "Resource": [
                    "arn:aws:s3:::<your-s3-bucket-path>"
                ]
            }
        ]
}

Click on Next. On the next page click on Create Policy.

Step 3: Add the instance profile to Databricks

In the Databricks admin page, go to Instance profiles and click on "Add instance profile"

On the next page copy and paste the "Instance profile ARN" and "IAM role ARN" values from the AWS console Role's page. Click "add" to complete.

Done! You should now be able to select this instance profile in the cluster page of your jobs

PreviousManual Workspace Setup NextApache Airflow for Databricks

Last updated 1 year ago

Was this helpful?