Why can't I run my ECS task from AWS Lambda? - amazon-web-services

I am using Amazon Web Services and trying to run an ECS Task Definition on a Cluster triggered from a Lambda.
When I run this task manually in the ECS console and chose all of the same options as I'm passing to run_task, it runs just fine. I see logs in Cloudwatch and the effects of the task (updaing a database) have happened as expected. But when I run the task from a Lambda it does not work, but also gives me no errors that I can see.
Here's the Lambda definition:
import boto3
def lambda_handler(event, context):
print("howMuchSnowDoUpdate")
client = boto3.client('ecs')
response = client.run_task(
cluster='HowMuchSnow',
taskDefinition='HowMuchSnow:2',
count=1,
launchType='FARGATE',
networkConfiguration={
'awsvpcConfiguration': {
'subnets': [
'subnet-ebce7c8c',
],
'securityGroups': [
'sg-03bb63bf7b3389d42',
],
'assignPublicIp': 'DISABLED'
}
},
)
print(response)
I have given the Lambda's IAM role the policy of ECSFull. Before I did I was getting an expected permission denied when running run_task. But once I added that policy, the Lambda runs just fine with no errors reported and this is the response that I get from that print(response) line:
{'tasks': [{'taskArn': 'arn:aws:ecs:us-east-1:221691463461:task/10b2473f-482d-4f75-ab43-3980f6995b17', 'clusterArn': 'arn:aws:ecs:us-east-1:221691463461:cluster/HowMuchSnow', 'taskDefinitionArn': 'arn:aws:ecs:us-east-1:221691463461:task-definition/HowMuchSnow:2', 'overrides': {'containerOverrides': [{'name': 'HowMuchSnow'}]}, 'lastStatus': 'PROVISIONING', 'desiredStatus': 'RUNNING', 'cpu': '256', 'memory': '512', 'containers': [{'containerArn': 'arn:aws:ecs:us-east-1:221691463461:container/9a76562b-1fef-457f-ae04-0f0eb4003e7b', 'taskArn': 'arn:aws:ecs:us-east-1:221691463461:task/10b2473f-482d-4f75-ab43-3980f6995b17', 'name': 'HowMuchSnow', 'lastStatus': 'PENDING', 'networkInterfaces': []}], 'version': 1, 'createdAt': datetime.datetime(2019, 6, 17, 14, 57, 29, 831000, tzinfo=tzlocal()), 'group': 'family:HowMuchSnow', 'launchType': 'FARGATE', 'platformVersion': '1.3.0', 'attachments': [{'id': 'e6ec4941-9e91-47d1-adff-d406f28b1931', 'type': 'ElasticNetworkInterface', 'status': 'PRECREATED', 'details': [{'name': 'subnetId', 'value': 'subnet-ebce7c8c'}]}]}], 'failures': [], 'ResponseMetadata': {'RequestId': '3a2506ef-9110-11e9-b57a-d7e334b6f5f7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3a2506ef-9110-11e9-b57a-d7e334b6f5f7', 'content-type': 'application/x-amz-json-1.1', 'content-length': '1026', 'date': 'Mon, 17 Jun 2019 14:57:29 GMT'}, 'RetryAttempts': 0}}
To my eyes this looks alright. But the task never actually runs. I do see a pending task in tasks list in the ECS console for my cluster briefly. But it runs not nearly as long as the actual task should run. It produces no logs in CloudWatch like it does when I run manually. I see no errors in the logs either.
One thing I will note is that I have to pick a VPC when running the task manually from the console but that's not a valid argument to boto3's ECS run_task function so I don't pass it.
Anyone know what might be going wrong or where I might look for information?

Here's what works for me.
When setting up Lambda:
Role must have ECS run task abilities
Don't specify a VPC in the Lambda function settings itself
Here's the Lambda code (replacing subnets, security groups, etc. for your own).
import boto3
client = boto3.client('ecs')
cluster_name = "demo-cluster"
task_definition = "demo-task:1"
def lambda_handler(event, context):
try:
response = client.run_task(
cluster=cluster_name,
launchType = 'FARGATE',
taskDefinition=task_definition,
count = 1,
platformVersion='LATEST',
networkConfiguration={
'awsvpcConfiguration': {
'subnets': [
'subnet-0r6gh701',
'subnet-a73d7c10'
],
'securityGroups': [
"sg-54cb123f",
],
'assignPublicIp': 'ENABLED'
}
})
print(response)
return {
'statusCode': 200,
'body': "OK"
}
except Exception as e:
print(e)
return {
'statusCode': 500,
'body': str(e)
}

I had this problem and it turned out that I had commented out the CMD line at the end of my Dockerfile during debugging. As such the lambda ran, but no ECS task was logged. Uncommenting the CMD led to the ECS task running and logging again.

Related

AWS Boto3 - Datapoint within client.get_metric_statistics displays on one file but not on the other

I have this function that records a Cloudwatch metric called client.get_metric_statistics, when I execute it the Datapoint doesn't show but I extracted the metric function to another file by itself and when executed it displayed the datapoint with no issues.
The only thing different is that I had a InstanceId on the one that displayed fine and I had a AMIID as you see on my main script which has to be automated so I am not sure if AMIID is allowed to be used but I dont see why it shouldnt or what the issue is so i'm looking for some feedback.
import sys
import boto3
import time
ec2 = boto3.resource('ec2')
s3_resource = boto3.resource('s3', region_name='eu-west-1')
s3 = boto3.resource('s3')
instance = ec2.create_instances(
ImageId='ami-02ifd1b532b22l6h3',
MinCount=1,
MaxCount=1,
InstanceType='t2.nano',
KeyName = 'key1.pem',
SecurityGroupIds=[sg.group_id],
UserData = user_data,
)
from datetime import datetime, timedelta
time.sleep(300)
client = boto3.client("cloudwatch")
response = client.get_metric_statistics(
Namespace="AWS/EC2",
MetricName="CPUUtilization",
Dimensions=[{"Name": "AMIID", "Value": "ami-13add1h575a25e4d6"}],
StartTime=datetime.utcnow() - timedelta(seconds=200),
EndTime=datetime.utcnow(),
Period=300,
Statistics=["Average"],
Unit="Percent",
)
print(response)
for cpu in response["Datapoints"]:
print(cpu)
s3.Bucket(name='buket2')
ec2.SecurityGroup(id='sg-06b84927ae5rd3ad1')
{'Label': 'CPUUtilization', 'Datapoints': [], 'ResponseMetadata': {'RequestId': '', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '', 'content-type': 'text/xml', 'content-length': '357', 'date': 'Sun, 18 Jul 2021 00:26:57 GMT'}, 'RetryAttempts': 0}}
sg-06b84927ae5rd3ad1
From Amazon EC2 metric dimensions:
ImageId: This dimension filters the data you request for all instances running this Amazon EC2 Amazon Machine Image (AMI). Available for instances with Detailed Monitoring enabled.
You appear to be using AMIID instead of ImageId.
You can always view available dimensions using:
aws cloudwatch list-metrics --namespace 'AWS/EC2'
Avaiable dimensions for EC2 are listed in Amazon EC2 metric dimensions. There is no dimension called AMIID.
I guess you wanted to use ImageId which is valid dimension.

Compute environments are not displayed in AWS console

Compute environments created via boto3 are not displayed in AWS console. I can see them in the batch_client.describe_compute_environments() call response:
{
'computeEnvironmentName': 'name',
'computeEnvironmentArn': 'arn:aws:batch:us-east-1:<ID>:compute-environment/ml-retraining-compute-env-second',
'ecsClusterArn': 'arn:aws:ecs:us-east-1:<ID>:cluster/ml-retraining-compute-env-second_Batch_b18fcd09-8d7e-351b-bc0f-13ffa83a6b15',
'type': 'MANAGED',
'state': 'ENABLED',
'status': 'INVALID',
'statusReason': "CLIENT_ERROR - The security group 'sg-2436d85c' does not exist",
'computeResources': {
'type': 'EC2',
'minvCpus': 0,
'maxvCpus': 512,
'desiredvCpus': 24,
'instanceTypes': [
'optimal'
],
'subnets': [
'subnet-fa22de86'
],
'securityGroupIds': [
'sg-2436d85c'
],
'instanceRole': 'arn:aws:iam::<ID>:instance-profile/ecsInstanceRole',
'tags': {
'component': 'ukai-training-pipeline',
'product': 'Cormorant',
'jira_project_team': 'CORPRJ',
'business_unit': 'Threat Systems Products',
'created_by': 'ml-pipeline'
}
},
'serviceRole': 'arn:aws:iam::<ID>:role/AWSBatchServiceRole'
}
but the Compute Environments table on the Batch page in AWS console UI does not show anything. The table is empty. When I try to create compute environment with the same name again via boto3 call, I get this response:
ERROR - Error setting compute environment: An error occurred
(ClientException) when calling the CreateComputeEnvironment operation: Object already exists.
Based on the comments, the issue was the use of different region in the console.
The solution was to change the region.

How to check Spark step status programmatically (submitted on EMR cluster)?

I created a simple step function as follows :
Start -> Start EMR cluster & submit job -> End
I want to find out a mechanism to identify whether my spark step completed successfully or not?
I am able to start EMR cluster and attach a spark job to it, which successfully completes and terminates the cluster.
Followed steps in this link :
Creating AWS EMR cluster with spark step using lambda function fails with "Local file does not exist"
Now, I am looking to get the status, th ejob poller will get me information whether the EMR cluster created successfully or not.
I am looking at ways how I can find out Spark job status
from botocore.vendored import requests
import boto3
import json
def lambda_handler(event, context):
conn = boto3.client("emr")
cluster_id = conn.run_job_flow(
Name='xyz',
ServiceRole='xyz',
JobFlowRole='asd',
VisibleToAllUsers=True,
LogUri='<location>',
ReleaseLabel='emr-5.16.0',
Instances={
'Ec2SubnetId': 'xyz',
'InstanceGroups': [
{
'Name': 'Master',
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm4.xlarge',
'InstanceCount': 1,
}
],
'KeepJobFlowAliveWhenNoSteps': False,
'TerminationProtected': False,
},
Applications=[
{
'Name': 'Spark'
},
{
'Name': 'Hadoop'
}
],
Steps=[{ 'Name': "mystep",
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Jar': 'jar',
'Args' : [
<insert args> , jar, mainclass
]
}
}]
)
return cluster_id
You can use cli or sdk to list all steps for the cluster and then describe particular step to get its status.

How to get the current execution role in a lambda?

I'm having issues with a lambda that does not seem have the permissions to perform an action and want to get some troubleshooting information.
I can do this to get the current user:
print("Current user: " + boto3.resource('iam').CurrentUser().arn)
Is there a way to get the execution role at runtime? Better yet, is there a way to get the policies that are attached to this role dynamically?
They shouldn't change from when I created the lambda, but I want to verify to be sure.
Check this: list_attached_user_policies
Lists all managed policies that are attached to the specified IAM
user.
An IAM user can also have inline policies embedded with it.
If you want just the inline policies: get_user_policy
Retrieves the specified inline policy document that is embedded in the
specified IAM user.
Do not know, how much relevance this will bring to the OP.
But we can get lambda function configuration at runtime.
lambda_client = boto3.client('lambda')
role_response = (lambda_client.get_function_configuration(
FunctionName = os.environ['AWS_LAMBDA_FUNCTION_NAME'])
)
print(role_response)
role_arn = role_response['Role']
role_response will have role arn.
role_response =>
{'ResponseMetadata': {'RequestId': '', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'GMT', 'content-type': 'application/json', 'content-length': '877', 'connection': 'keep-alive', 'x-amzn-requestid': ''}, 'RetryAttempts': 0}, 'FunctionName': 'lambda_name', 'FunctionArn': 'arn:aws:lambda:<region>:<account_id>:function:lambda_arn', 'Runtime': 'python3.8', 'Role': 'arn:aws:iam::<account_id>:role/<role_name>', 'Handler': 'handlers.handle', 'CodeSize': 30772, 'Description': '', 'Timeout': 30, 'MemorySize': 128, 'LastModified': '', 'CodeSha256': '', 'Version': '$LATEST', 'VpcConfig': {'SubnetIds': [], 'SecurityGroupIds': [], 'VpcId': ''}, 'TracingConfig': {'Mode': 'PassThrough'}, 'RevisionId': '', 'State': 'Active', 'LastUpdateStatus': 'Successful'}

How to execute spark submit on amazon EMR from Lambda function?

I want to execute spark submit job on AWS EMR cluster based on the file upload event on S3. I am using AWS Lambda function to capture the event but I have no idea how to submit spark submit job on EMR cluster from Lambda function.
Most of the answers that i searched talked about adding a step in the EMR cluster. But I do not know if I can add add any step to fire "spark submit --with args" in the added step.
You can, I had to same thing last week!
Using boto3 for Python (other languages would definitely have a similar solution) you can either start a cluster with the defined step, or attach a step to an already up cluster.
Defining the cluster with the step
def lambda_handler(event, context):
conn = boto3.client("emr")
cluster_id = conn.run_job_flow(
Name='ClusterName',
ServiceRole='EMR_DefaultRole',
JobFlowRole='EMR_EC2_DefaultRole',
VisibleToAllUsers=True,
LogUri='s3n://some-log-uri/elasticmapreduce/',
ReleaseLabel='emr-5.8.0',
Instances={
'InstanceGroups': [
{
'Name': 'Master nodes',
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm3.xlarge',
'InstanceCount': 1,
},
{
'Name': 'Slave nodes',
'Market': 'ON_DEMAND',
'InstanceRole': 'CORE',
'InstanceType': 'm3.xlarge',
'InstanceCount': 2,
}
],
'Ec2KeyName': 'key-name',
'KeepJobFlowAliveWhenNoSteps': False,
'TerminationProtected': False
},
Applications=[{
'Name': 'Spark'
}],
Configurations=[{
"Classification":"spark-env",
"Properties":{},
"Configurations":[{
"Classification":"export",
"Properties":{
"PYSPARK_PYTHON":"python35",
"PYSPARK_DRIVER_PYTHON":"python35"
}
}]
}],
BootstrapActions=[{
'Name': 'Install',
'ScriptBootstrapAction': {
'Path': 's3://path/to/bootstrap.script'
}
}],
Steps=[{
'Name': 'StepName',
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar',
'Args': [
"/usr/bin/spark-submit", "--deploy-mode", "cluster",
's3://path/to/code.file', '-i', 'input_arg',
'-o', 'output_arg'
]
}
}],
)
return "Started cluster {}".format(cluster_id)
Attaching a step to an already running cluster
As per here
def lambda_handler(event, context):
conn = boto3.client("emr")
# chooses the first cluster which is Running or Waiting
# possibly can also choose by name or already have the cluster id
clusters = conn.list_clusters()
# choose the correct cluster
clusters = [c["Id"] for c in clusters["Clusters"]
if c["Status"]["State"] in ["RUNNING", "WAITING"]]
if not clusters:
sys.stderr.write("No valid clusters\n")
sys.stderr.exit()
# take the first relevant cluster
cluster_id = clusters[0]
# code location on your emr master node
CODE_DIR = "/home/hadoop/code/"
# spark configuration example
step_args = ["/usr/bin/spark-submit", "--spark-conf", "your-configuration",
CODE_DIR + "your_file.py", '--your-parameters', 'parameters']
step = {"Name": "what_you_do-" + time.strftime("%Y%m%d-%H:%M"),
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar',
'Args': step_args
}
}
action = conn.add_job_flow_steps(JobFlowId=cluster_id, Steps=[step])
return "Added step: %s"%(action)
AWS Lambda function python code if you want to execute Spark jar using spark submit command:
from botocore.vendored import requests
import json
def lambda_handler(event, context):
headers = { "content-type": "application/json" }
url = 'http://ip-address.ec2.internal:8998/batches'
payload = {
'file' : 's3://Bucket/Orchestration/RedshiftJDBC41.jar
s3://Bucket/Orchestration/mysql-connector-java-8.0.12.jar
s3://Bucket/Orchestration/SparkCode.jar',
'className' : 'Main Class Name',
'args' : [event.get('rootPath')]
}
res = requests.post(url, data = json.dumps(payload), headers = headers, verify = False)
json_data = json.loads(res.text)
return json_data.get('id')