resource "aws_emr_cluster" "cluster" {
name = "emr-test-arn"
release_label = "emr-4.6.0"
applications = ["Spark", "Hadoop"]
Mentioning "Spark", "Hadoop" in applications parameter as shown above - will that install Hadoop and Spark on the EMR?
Or does it just "prepare" the cluster in some way to work with Hadoop and Spark (and we should perform additional steps to install Hadoop and Spark on the EMR cluster)?
You could follow the official docs with examples like below. It works fine
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_cluster
resource "aws_emr_cluster" "cluster" {
name = "emr-test-arn"
release_label = "emr-4.6.0"
applications = ["Spark"]
additional_info = <<EOF
{
"instanceAwsClientConfiguration": {
"proxyPort": 8099,
"proxyHost": "myproxy.example.com"
}
}
EOF
termination_protection = false
keep_job_flow_alive_when_no_steps = true
ec2_attributes {
subnet_id = aws_subnet.main.id
emr_managed_master_security_group = aws_security_group.sg.id
emr_managed_slave_security_group = aws_security_group.sg.id
instance_profile = aws_iam_instance_profile.emr_profile.arn
}
master_instance_group {
instance_type = "m4.large"
}
core_instance_group {
instance_type = "c4.large"
instance_count = 1
ebs_config {
size = "40"
type = "gp2"
volumes_per_instance = 1
}
bid_price = "0.30"
autoscaling_policy = <<EOF
{
"Constraints": {
"MinCapacity": 1,
"MaxCapacity": 2
},
"Rules": [
{
"Name": "ScaleOutMemoryPercentage",
"Description": "Scale out if YARNMemoryAvailablePercentage is less than 15",
"Action": {
"SimpleScalingPolicyConfiguration": {
"AdjustmentType": "CHANGE_IN_CAPACITY",
"ScalingAdjustment": 1,
"CoolDown": 300
}
},
"Trigger": {
"CloudWatchAlarmDefinition": {
"ComparisonOperator": "LESS_THAN",
"EvaluationPeriods": 1,
"MetricName": "YARNMemoryAvailablePercentage",
"Namespace": "AWS/ElasticMapReduce",
"Period": 300,
"Statistic": "AVERAGE",
"Threshold": 15.0,
"Unit": "PERCENT"
}
}
}
]
}
EOF
}
ebs_root_volume_size = 100
tags = {
role = "rolename"
env = "env"
}
bootstrap_action {
path = "s3://elasticmapreduce/bootstrap-actions/run-if"
name = "runif"
args = ["instance.isMaster=true", "echo running on master node"]
}
configurations_json = <<EOF
[
{
"Classification": "hadoop-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
}
}
],
"Properties": {}
},
{
"Classification": "spark-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
}
}
],
"Properties": {}
}
]
EOF
service_role = aws_iam_role.iam_emr_service_role.arn
}
Related
Hello I'm using Terraform to try and spin up Zookeeper for a development environment and I keep running into the following issue when I spin up the terraform.
Stopped reason Error response from daemon: create
ecs-clearstreet-basis-dev-Zookeeper-46-clearstreet-confluent-c2cf998e98d1afd45900:
VolumeDriver.Create: mounting volume failed: Specified port [2999] is
unavailable. Try selecting a different port.
I don't have this issue when attaching an EFS to a Fargate container.
Here is the terraform for reference.
resource "aws_ecs_task_definition" "ecs-zookeeper" {
family = "${var.project}-Zookeeper"
container_definitions = templatefile("./task-definitions/zookeeper.tftpl",
{
zookeeper_image = "${var.zookeeper_image}:${var.zookeeper_image_version}"
zookeeper_port = "${var.zookeeper_port}"
zookeeper_port_communication = "${var.zookeeper_port_communication}"
zookeeper_port_election = "${var.zookeeper_port_election}"
zookeeper-servers = "server.1=${var.project}1.${var.dns_zone}:2888:3888;2181"
zookeeper-elect-port-retry = "${var.zookeeper-elect-port-retry}"
zookeeper_4lw_commands_whitelist = "${var.zookeeper_4lw_commands_whitelist}"
aws_region = "${var.aws_region}"
}
)
task_role_arn = var.ecs-task-role-arn
network_mode = "awsvpc"
volume {
name = "resolv"
host_path = "/etc/docker_resolv.conf"
}
volume {
name = "Client-confluent"
efs_volume_configuration {
file_system_id = var.efs-fsid
root_directory = "/Platform/confluent"
transit_encryption = "ENABLED"
transit_encryption_port = 2999
authorization_config {
access_point_id = var.efs-confluent-fsap
iam = "ENABLED"
}
}
}
}
resource "aws_ecs_service" "ecs-zookeeper" {
name = "Zookeeper"
cluster = aws_ecs_cluster.ecs.id
task_definition = aws_ecs_task_definition.ecs-zookeeper.arn
enable_ecs_managed_tags = true
enable_execute_command = true
desired_count = 1
propagate_tags = "SERVICE"
launch_type = "EC2"
# only manual task rotation via task stop
deployment_minimum_healthy_percent = 33
deployment_maximum_percent = 100
network_configuration {
subnets = var.vpc_subnets
security_groups = [var.ECS-EC2-SG]
assign_public_ip = false
}
service_registries {
registry_arn = aws_service_discovery_service.discovery_service-zookeeper.arn
}
ordered_placement_strategy {
type = "spread"
field = "host"
}
ordered_placement_strategy {
type = "spread"
field = "attribute:ecs.availability-zone"
}
placement_constraints {
type = "memberOf"
expression = "attribute:program == PLATFORM"
}
lifecycle {
create_before_destroy = true
}
# count = var.zookeeper-instance-number
}
resource "aws_service_discovery_service" "discovery_service-zookeeper" {
name = "${var.project}-zookeeper"
dns_config {
namespace_id = aws_service_discovery_private_dns_namespace.discovery_namespace.id
dns_records {
ttl = 10
type = "A"
}
routing_policy = "MULTIVALUE"
}
health_check_custom_config {
failure_threshold = 1
}
# count = var.zookeeper-instance-number
}
Here is the Task Definition for reference
[
{
"name": "zookeeper",
"image": "${zookeeper_image}",
"cpu": 256,
"memory": 512,
"essential": true,
"portMappings": [
{
"containerPort": ${zookeeper_port},
"hostPort": ${zookeeper_port}
},
{
"containerPort": ${zookeeper_port_communication},
"hostPort": ${zookeeper_port_communication}
},
{
"containerPort": ${zookeeper_port_election},
"hostPort": ${zookeeper_port_election}
}
],
"environment": [
{
"name": "ZOO_SERVERS",
"value": "${zookeeper-servers}"
},
{
"name": "ZOO_STANDALONE_ENABLED",
"value": "false"
},
{
"name": "ZOO_ELECT_PORT_RETRY",
"value": "${zookeeper-elect-port-retry}"
},
{
"name": "ZOO_4LW_COMMANDS_WHITELIST",
"value": "${zookeeper_4lw_commands_whitelist}"
}
],
"mountPoints": [
{
"sourceVolume": "resolv",
"containerPath": "/etc/resolv.conf"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-region" : "${aws_region}",
"awslogs-group" : "/fargate/client/basis/program-zookeeper",
"awslogs-create-group": "true",
"awslogs-stream-prefix" : "program-zookeeper"
}
},
"workingDir": "/var/lib/zookeeper",
"mountPoints":[{
"sourceVolume": "client-confluent",
"containerPath": "/var/lib/zookeeper"
}]
}
]
Any help would be greatly appreciated!
I was previously having issues providing access to EFS from an ECS task (Providing access to EFS from ECS task)
This has now been resolved, inasmuch as the task starts, and it all looks fine.
The problem is that running df, or ls or touch on the mountpoint hangs indefinitely.
The task definition is below:
{
"taskDefinitionArn": "arn:aws:ecs:eu-west-2:000000000000:task-definition/backend-app-task:53",
"containerDefinitions": [
{
"name": "server",
"image": "000000000000.dkr.ecr.eu-west-2.amazonaws.com/foo-backend:latest-server",
"cpu": 512,
"memory": 1024,
"portMappings": [
{
"containerPort": 8000,
"hostPort": 8000,
"protocol": "tcp"
}
],
"essential": true,
"environment": [
],
"mountPoints": [
{
"sourceVolume": "persistent",
"containerPath": "/opt/data/",
"readOnly": false
}
],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/foo",
"awslogs-region": "eu-west-2",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "backend-app-task",
"taskRoleArn": "arn:aws:iam::000000000000:role/ecsTaskRole",
"executionRoleArn": "arn:aws:iam::000000000000:role/myEcsTaskExecutionRole",
"networkMode": "awsvpc",
"revision": 53,
"volumes": [
{
"name": "persistent",
"efsVolumeConfiguration": {
"fileSystemId": "fs-00000000000000000",
"rootDirectory": "/",
"transitEncryption": "ENABLED",
"transitEncryptionPort": 2049,
"authorizationConfig": {
"accessPointId": "fsap-00000000000000000",
"iam": "ENABLED"
}
}
}
],
"status": "ACTIVE",
"requiresAttributes": [
{
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"name": "ecs.capability.execution-role-awslogs"
},
{
"name": "ecs.capability.efsAuth"
},
{
"name": "com.amazonaws.ecs.capability.ecr-auth"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
},
{
"name": "ecs.capability.efs"
},
{
"name": "com.amazonaws.ecs.capability.task-iam-role"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.25"
},
{
"name": "ecs.capability.execution-role-ecr-pull"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"name": "ecs.capability.task-eni"
}
],
"placementConstraints": [],
"compatibilities": [
"EC2",
"FARGATE"
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "512",
"memory": "1024",
"registeredAt": "2022-03-08T14:23:47.391Z",
"registeredBy": "arn:aws:iam::000000000000:root",
"tags": []
}
According to the docs, hanging can occur when large amounts of data are being written to the EFS volume. This is not the case here, the EFS volume is new, and empty, with a size of 6KiB. I also tried configuring it with provisioned throughput, but that did not make any difference.
EDIT
IAM role definition:
data "aws_iam_policy_document" "ecs_task_execution_role_base" {
version = "2012-10-17"
statement {
sid = ""
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
# ECS task execution role
resource "aws_iam_role" "ecs_task_execution_role" {
name = var.ecs_task_execution_role_name
assume_role_policy = data.aws_iam_policy_document.ecs_task_execution_role_base.json
}
# ECS task execution role policy attachment
resource "aws_iam_role_policy_attachment" "ecs_task_execution_role" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution_role2" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess"
}
resource "aws_iam_policy" "ecs_exec_policy" {
name = "ecs_exec_policy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = ["ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel",
]
Effect = "Allow"
Resource = "*"
},
]
})
}
resource "aws_iam_role" "ecs_task_role" {
name = "ecsTaskRole"
assume_role_policy = data.aws_iam_policy_document.ecs_task_execution_role_base.json
managed_policy_arns = ["arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess","arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy", aws_iam_policy.ecs_exec_policy.arn]
}
I'm trying to create a databricks instance profile for use with a previously provisioned workspace and getting the following error when running terraform apply:
2022-01-25T09:32:31.063-0800 [DEBUG] provider.terraform-provider-databricks_v0.4.4: 400 Bad Request {
"error_code": "DRY_RUN_FAILED",
"message": "Verification of the instance profile failed. AWS error: You are not authorized to perform this o... (616 more bytes)"
}: timestamp=2022-01-25T09:32:31.062-0800
2022-01-25T09:32:31.063-0800 [WARN] provider.terraform-provider-databricks_v0.4.4: /api/2.0/instance-profiles/add:400 - Verification of the instance profile failed. AWS error: You are not authorized to perform this operation. Encoded authorization failure message: 5AzyUESoYe18kM...
This is what I see when I decode the Encoded authorization failure message:
{
"allowed": false,
"explicitDeny": false,
"matchedStatements": {
"items": []
},
"failures": {
"items": []
},
"context": {
"principal": {
"id": "AROA4A2DDDVLP3F64BTD7:databricks",
"arn": "arn:aws:sts::<AWS Account ID>:assumed-role/<AWS Account alias>-crossaccount/databricks"
},
"action": "iam:PassRole",
"resource": "arn:aws:iam::<AWS Account ID>:role/databricks-shared-ec2-role-for-s3",
"conditions": {
"items": [
{
"key": "aws:Region",
"values": {
"items": [
{
"value": "us-east-1"
}
]
}
},
{
"key": "aws:Service",
"values": {
"items": [
{
"value": "ec2"
}
]
}
},
{
"key": "aws:Resource",
"values": {
"items": [
{
"value": "role/databricks-shared-ec2-role-for-s3"
}
]
}
},
{
"key": "iam:RoleName",
"values": {
"items": [
{
"value": "databricks-shared-ec2-role-for-s3"
}
]
}
},
{
"key": "aws:Type",
"values": {
"items": [
{
"value": "role"
}
]
}
},
{
"key": "aws:Account",
"values": {
"items": [
{
"value": "<AWS Account ID>"
}
]
}
},
{
"key": "aws:ARN",
"values": {
"items": [
{
"value": "arn:aws:iam::<AWS Account ID>:role/databricks-shared-ec2-role-for-s3"
}
]
}
}
]
}
}
}
I'm trying to follow the databricks documentation.
Here's the relevant terraform code fragment:
data "aws_iam_policy_document" "instance-assume-role-policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"]
}
}
}
resource "aws_iam_role" "role_for_s3_access" {
name = "databricks-shared-ec2-role-for-s3"
description = "Role for shared access for Databricks"
assume_role_policy = data.aws_iam_policy_document.instance-assume-role-policy.json
}
data "aws_iam_policy_document" "pass_role_for_s3_access" {
statement {
effect = "Allow"
actions = ["iam:PassRole"]
resources = [aws_iam_role.role_for_s3_access.arn]
}
}
resource "aws_iam_policy" "pass_role_for_s3_access" {
name = "shared-pass-role-for-s3-access"
path = "/"
policy = data.aws_iam_policy_document.pass_role_for_s3_access.json
}
resource "aws_iam_role_policy_attachment" "pass_role_for_s3_access" {
policy_arn = aws_iam_policy.pass_role_for_s3_access.arn
role = aws_iam_role.role_for_s3_access.id
}
resource "aws_iam_instance_profile" "read" {
name = "sophi-aux_read_instance_profile"
role = aws_iam_role.role_for_s3_access.name
}
resource "time_sleep" "wait" {
depends_on = [aws_iam_instance_profile.read]
create_duration = "10s"
}
resource "databricks_instance_profile" "read" {
instance_profile_arn = aws_iam_instance_profile.read.arn
}
Any inputs will be greatly appreciated.
Your code looks correct to me.
It sounds like the EC2 role being used by Databricks doesn't have permissions to create an instance profile and/or role.
This is a permission you'll have to explicitly add to the EC2 role on the AWS side by allowing the CreateInstanceProfile and CreateRole actions.
Starting Step function with the following input
[{\"name\":\"S3_BUCKET\",\"value\":\"test-bucket\"},{\"name\":\"S3_KEY\",\"value\":\"key-name.txt\"}]'"
What is the correct way to pass this to ECS's container? here is what I have so far under the step functions parameters
"Overrides": {
"ContainerOverrides": [
{
"Name": "test",
"Environment": [
{ "Name": "S3_BUCKET", "Value.$": "$.S3_BUCKET"}
]
}
]
}
},
Here is the error message I am getting:
The JSONPath '$.S3_BUCKET' specified for the field 'Value.$' could not be found
Hopefully this will help someone else, took me some time to figure this out
If you are launching Step Function via cloudwatch and want to pass in some environment variables to ECS/Fargate. I ended up using constant instead of input_transformer
here is the terraform code for aws_cloudwatch_event_target
resource "aws_cloudwatch_event_target" "cloudwatch-ecs-target" {
arn = aws_sfn_state_machine.YOUR-STATEMACHINE.arn
rule = aws_cloudwatch_event_rule.CLOUDWATCH_RULE.name
role_arn = aws_iam_role.YOUR-ROLE.arn
input = <<JSON
{
"ENV1": "ABC",
"ENV2": "XYZ",
}
JSON
}
Here is how state machine terraform code looks like
resource "aws_sfn_state_machine" "STATEMACHINE" {
name = "STATEMACHINE-NAME"
role_arn = aws_iam_role.YOUR-ROLE.arn
definition = <<EOF
{
"StartAt": "Run Fargate Task",
"States": {
"Run Fargate Task": {
"Type": "Task",
"Resource": "arn:aws:states:::ecs:runTask.sync",
"Parameters": {
"LaunchType": "FARGATE",
"Cluster": "${aws_ecs_cluster.YOUR-ECS-CLUSTER.arn}",
"TaskDefinition": "${local.task_definition_arn_only}",
"NetworkConfiguration": {
"AwsvpcConfiguration": {
"Subnets": [
"${aws_subnet.subnet-a.id}", "${aws_subnet.subnet-b.id}", "${aws_subnet.subnet-c.id}"
],
"AssignPublicIp": "ENABLED"
}
},
"Overrides": {
"ContainerOverrides": [
{
"Name": "ECS-CLUSTER-NAME",
"Environment": [
{ "Name": "ENV1", "Value.$": "$.ENV1"},
{ "Name": "ENV2", "Value.$": "$.ENV2"},
]
}
]
}
},
"End": true
}
}
}
EOF
}
I have an AWS API Gateway (REST API) which is deployed through Terraform like that:
locals {
api = templatefile("${path.module}/backend-api/api.json", {
service-user-management = aws_lambda_function.user-management.invoke_arn
})
}
resource "aws_api_gateway_rest_api" "backend" {
name = "backend-api"
description = "Backend API"
body = local.api
endpoint_configuration {
types = ["REGIONAL"]
}
}
resource "aws_api_gateway_deployment" "backend" {
rest_api_id = aws_api_gateway_rest_api.backend.id
stage_name = "default"
triggers = {
redeployment = sha1(join(",", list(
local.api,
data.archive_file.user-management.output_base64sha256
)))
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_api_gateway_method_settings" "backend" {
rest_api_id = aws_api_gateway_rest_api.backend.id
stage_name = aws_api_gateway_deployment.backend.stage_name
method_path = "*/*"
settings {
metrics_enabled = true
logging_level = "INFO"
}
}
For reference, api.json looks like:
{
"openapi": "3.0.0",
"info": {
"version": "",
"title": ""
},
"paths": {
"/auth": {
"post": {
"summary": "User authentication",
"parameters": [],
"responses": {
"400": {
"description": "Invalid `password`"
}
},
"x-amazon-apigateway-integration": {
"uri": "${service-user-management}",
"passthroughBehavior": "when_no_match",
"httpMethod": "POST",
"type": "aws_proxy"
}
}
},
It works well but I want to convert it to API Gateway V2, I have tried that:
resource "aws_apigatewayv2_api" "backend" {
name = "backend-api-2"
description = "Backend API"
protocol_type = "HTTP"
disable_execute_api_endpoint = false
version = "v0.1"
body = local.api
}
resource "aws_apigatewayv2_deployment" "backend-default" {
api_id = aws_apigatewayv2_route.backend.api_id
description = "backend deployment"
lifecycle {
create_before_destroy = true
}
}
resource "aws_apigatewayv2_route" "backend" {
api_id = aws_apigatewayv2_api.backend.id
route_key = "$default"
}
It works well except that none of the endpoints have an integration, no lambda are bound.
What are the correct OpenAPI attributes?
Actually the OpenAPI format may have slightly changed, instead of directly declaring the integration, referring to it worked:
{
"openapi": "3.0.0",
"info": {
"version": "",
"title": ""
},
"paths": {
"/auth": {
"post": {
"summary": "User authentication",
"parameters": [],
"responses": {
"400": {
"description": "Invalid `password`"
}
},
"x-amazon-apigateway-integration": {
"$ref": "#/components/x-amazon-apigateway-integrations/user-management"
}
}
...
},
"x-amazon-apigateway-integrations": {
"user-management": {
"uri": "${service-user-management}",
"passthroughBehavior": "when_no_match",
"httpMethod": "POST",
"type": "aws_proxy"
}
}
}