Compute environments created via boto3 are not displayed in AWS console. I can see them in the batch_client.describe_compute_environments() call response:
{
'computeEnvironmentName': 'name',
'computeEnvironmentArn': 'arn:aws:batch:us-east-1:<ID>:compute-environment/ml-retraining-compute-env-second',
'ecsClusterArn': 'arn:aws:ecs:us-east-1:<ID>:cluster/ml-retraining-compute-env-second_Batch_b18fcd09-8d7e-351b-bc0f-13ffa83a6b15',
'type': 'MANAGED',
'state': 'ENABLED',
'status': 'INVALID',
'statusReason': "CLIENT_ERROR - The security group 'sg-2436d85c' does not exist",
'computeResources': {
'type': 'EC2',
'minvCpus': 0,
'maxvCpus': 512,
'desiredvCpus': 24,
'instanceTypes': [
'optimal'
],
'subnets': [
'subnet-fa22de86'
],
'securityGroupIds': [
'sg-2436d85c'
],
'instanceRole': 'arn:aws:iam::<ID>:instance-profile/ecsInstanceRole',
'tags': {
'component': 'ukai-training-pipeline',
'product': 'Cormorant',
'jira_project_team': 'CORPRJ',
'business_unit': 'Threat Systems Products',
'created_by': 'ml-pipeline'
}
},
'serviceRole': 'arn:aws:iam::<ID>:role/AWSBatchServiceRole'
}
but the Compute Environments table on the Batch page in AWS console UI does not show anything. The table is empty. When I try to create compute environment with the same name again via boto3 call, I get this response:
ERROR - Error setting compute environment: An error occurred
(ClientException) when calling the CreateComputeEnvironment operation: Object already exists.
Based on the comments, the issue was the use of different region in the console.
The solution was to change the region.
Related
I have build using boto3 a workflow that creates a compute environment, creates a job queue, registers a job definition and finally submits job. Trying 'ls' command works fine, however, when trying command 'docker run hello-world' does not work.
Code to create comp env:
response = client.create_compute_environment(
computeEnvironmentName=com_env_name,
type='MANAGED',
state='ENABLED',
computeResources={
'type': 'EC2',
'allocationStrategy': 'BEST_FIT',
'minvCpus': 0,
'maxvCpus': 5,
'instanceTypes': [
'c3.large',
],
'ec2Configuration': [{
'imageType': 'ECS_AL2',
}],
'subnets': [
subnet_id,
],
'securityGroupIds': [
sec_gr_id,
],
'instanceRole': 'ecsInstanceRole',
},
serviceRole = 'arn:aws:iam::blabla
)
The job queue is defined as:
response = batch_client.create_job_queue(
jobQueueName=queue_name,
state='ENABLED',
priority=1,
computeEnvironmentOrder=[
{
'order': 1,
'computeEnvironment': com_env_name
},
],
)
My goal is to run 'docker run hello-world'. The job definition is defined as follows:
response = batch.register_job_definition(
jobDefinitionName=job_def_name,
type='container',
containerProperties={
'image': 'custom-image',
'memory': 2048,
'vcpus': 2,
'command': ['ls'],
'environment': [
{
'name': "DOCKER_HOST",
'value': "unix:///var/run/docker.sock"
},
],
'volumes': [
{
'host': {
'sourcePath': '//var/run/docker.sock'
},
'name': 'docker'
}],
'mountPoints': [
{
'containerPath': '/var/run/docker.sock',
'sourceVolume': 'docker'
}],
},
)
Are the volumes and mount points properly set? What is missing? Is there a connection between dockers to establish? The output error after submitting the job is:
CannotStartContainerError: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "docker run hello-world": executable file not found in $PATH: unknown
The code for job submission is:
response = batch.submit_job(
jobDefinition=job_def_name,
jobName=job_nom,
jobQueue=job_queue_name,
containerOverrides={
'command': ['docker run hello-world',]
}
I notice two things:
You have an extra slash in your sourcePath
The error message you get seems to indicate that the docker executable doesn't exist in the image you're running. You'll need to use an image that supports docker in docker, such as the standard docker image
When I create a record in my hosted zone via the AWS Web Console, I can select the Routing Policy as "Simple".
When I try to create the same record programmatically via boto3, I seem to have no option to set a Routing Policy, and it is "Latency" by default.
What am I missing?
r53.change_resource_record_sets(
HostedZoneId=hz_id,
ChangeBatch={
'Changes': [{
'Action': 'UPSERT',
'ResourceRecordSet': {
'Name': root_domain,
'Type': 'A',
'Region': region,
'AliasTarget': {
'DNSName': f's3-website.{region}.amazonaws.com',
'EvaluateTargetHealth': False,
'HostedZoneId': s3_hz_id,
},
'SetIdentifier': str(uuid.uuid4())
}
}]
}
)
removing region and SetIdentifier works for me here - can't explain it though :)
With minimal experience in python, I am looking for ideas in building a lambda that tags AWS resources, especially ec2-instances and their volumes, by using prefix in the instance name tag.
Examples:
dev-cassandra-01
dev-cassandra-02
I would like to setup auto tagging for predefined set of tags by using the prefix "dev-cassandra" which should apply to any new instances created with this prefix.
Similarly different set of tags to different application instances.
This can be applied by provisioning tool for the regular instances, but not to the existing ASG instances.
You can insert tags upon instance creation by using resource-method create_instances
in the 'TagSpecifications' attribute.
Or you can also edit specific instance tags with the client-method create_tags for this method you need to get the instance id first.
The prefix logic can be added to the python script as desired.
Here are 2 examples of these methods:
create_instances:
# Getting resource object with aws credentials
s = boto3.Session(
region_name=<region_name>,
aws_access_key_id=<aws_access_key>,
aws_secret_access_key=<aws_secret_access_key>,
)
ec2 = s.resource('ec2')
# Some of these options are optional
instance = ec2.create_instances(
ImageId=<ami_id>,
MinCount=1,
MaxCount=1,
InstanceType=<instance_type>,
KeyName=<instance_key_pair>,
IamInstanceProfile={
'Name': <instance_profile>
},
SecurityGroupIds=[
<instance_security_group>,
],
TagSpecifications=[
{
'ResourceType': 'instance',
'Tags': [
{
'Key': <key_name>,
'Value': <value_data>,
},
]
},
],
)
create_tags:
# Next line is for getting client object from resource object
ec2_client = ec2.meta.client
ec2_client.create_tags(
Resources=[
<instance_id>,
],
Tags=[
{
'Key': 'Name',
'Value': <dev-cassandra><_your_name>
},
{
'Key': <key_name>,
'Value': <value_data>,
},
]
)
worth mentioning that the "instance name" is also a tag.
For example:
{
'Key': 'Name',
'Value': <dev-cassandra><_your_name>
}
I created a simple step function as follows :
Start -> Start EMR cluster & submit job -> End
I want to find out a mechanism to identify whether my spark step completed successfully or not?
I am able to start EMR cluster and attach a spark job to it, which successfully completes and terminates the cluster.
Followed steps in this link :
Creating AWS EMR cluster with spark step using lambda function fails with "Local file does not exist"
Now, I am looking to get the status, th ejob poller will get me information whether the EMR cluster created successfully or not.
I am looking at ways how I can find out Spark job status
from botocore.vendored import requests
import boto3
import json
def lambda_handler(event, context):
conn = boto3.client("emr")
cluster_id = conn.run_job_flow(
Name='xyz',
ServiceRole='xyz',
JobFlowRole='asd',
VisibleToAllUsers=True,
LogUri='<location>',
ReleaseLabel='emr-5.16.0',
Instances={
'Ec2SubnetId': 'xyz',
'InstanceGroups': [
{
'Name': 'Master',
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm4.xlarge',
'InstanceCount': 1,
}
],
'KeepJobFlowAliveWhenNoSteps': False,
'TerminationProtected': False,
},
Applications=[
{
'Name': 'Spark'
},
{
'Name': 'Hadoop'
}
],
Steps=[{ 'Name': "mystep",
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Jar': 'jar',
'Args' : [
<insert args> , jar, mainclass
]
}
}]
)
return cluster_id
You can use cli or sdk to list all steps for the cluster and then describe particular step to get its status.
I am using Amazon Web Services and trying to run an ECS Task Definition on a Cluster triggered from a Lambda.
When I run this task manually in the ECS console and chose all of the same options as I'm passing to run_task, it runs just fine. I see logs in Cloudwatch and the effects of the task (updaing a database) have happened as expected. But when I run the task from a Lambda it does not work, but also gives me no errors that I can see.
Here's the Lambda definition:
import boto3
def lambda_handler(event, context):
print("howMuchSnowDoUpdate")
client = boto3.client('ecs')
response = client.run_task(
cluster='HowMuchSnow',
taskDefinition='HowMuchSnow:2',
count=1,
launchType='FARGATE',
networkConfiguration={
'awsvpcConfiguration': {
'subnets': [
'subnet-ebce7c8c',
],
'securityGroups': [
'sg-03bb63bf7b3389d42',
],
'assignPublicIp': 'DISABLED'
}
},
)
print(response)
I have given the Lambda's IAM role the policy of ECSFull. Before I did I was getting an expected permission denied when running run_task. But once I added that policy, the Lambda runs just fine with no errors reported and this is the response that I get from that print(response) line:
{'tasks': [{'taskArn': 'arn:aws:ecs:us-east-1:221691463461:task/10b2473f-482d-4f75-ab43-3980f6995b17', 'clusterArn': 'arn:aws:ecs:us-east-1:221691463461:cluster/HowMuchSnow', 'taskDefinitionArn': 'arn:aws:ecs:us-east-1:221691463461:task-definition/HowMuchSnow:2', 'overrides': {'containerOverrides': [{'name': 'HowMuchSnow'}]}, 'lastStatus': 'PROVISIONING', 'desiredStatus': 'RUNNING', 'cpu': '256', 'memory': '512', 'containers': [{'containerArn': 'arn:aws:ecs:us-east-1:221691463461:container/9a76562b-1fef-457f-ae04-0f0eb4003e7b', 'taskArn': 'arn:aws:ecs:us-east-1:221691463461:task/10b2473f-482d-4f75-ab43-3980f6995b17', 'name': 'HowMuchSnow', 'lastStatus': 'PENDING', 'networkInterfaces': []}], 'version': 1, 'createdAt': datetime.datetime(2019, 6, 17, 14, 57, 29, 831000, tzinfo=tzlocal()), 'group': 'family:HowMuchSnow', 'launchType': 'FARGATE', 'platformVersion': '1.3.0', 'attachments': [{'id': 'e6ec4941-9e91-47d1-adff-d406f28b1931', 'type': 'ElasticNetworkInterface', 'status': 'PRECREATED', 'details': [{'name': 'subnetId', 'value': 'subnet-ebce7c8c'}]}]}], 'failures': [], 'ResponseMetadata': {'RequestId': '3a2506ef-9110-11e9-b57a-d7e334b6f5f7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3a2506ef-9110-11e9-b57a-d7e334b6f5f7', 'content-type': 'application/x-amz-json-1.1', 'content-length': '1026', 'date': 'Mon, 17 Jun 2019 14:57:29 GMT'}, 'RetryAttempts': 0}}
To my eyes this looks alright. But the task never actually runs. I do see a pending task in tasks list in the ECS console for my cluster briefly. But it runs not nearly as long as the actual task should run. It produces no logs in CloudWatch like it does when I run manually. I see no errors in the logs either.
One thing I will note is that I have to pick a VPC when running the task manually from the console but that's not a valid argument to boto3's ECS run_task function so I don't pass it.
Anyone know what might be going wrong or where I might look for information?
Here's what works for me.
When setting up Lambda:
Role must have ECS run task abilities
Don't specify a VPC in the Lambda function settings itself
Here's the Lambda code (replacing subnets, security groups, etc. for your own).
import boto3
client = boto3.client('ecs')
cluster_name = "demo-cluster"
task_definition = "demo-task:1"
def lambda_handler(event, context):
try:
response = client.run_task(
cluster=cluster_name,
launchType = 'FARGATE',
taskDefinition=task_definition,
count = 1,
platformVersion='LATEST',
networkConfiguration={
'awsvpcConfiguration': {
'subnets': [
'subnet-0r6gh701',
'subnet-a73d7c10'
],
'securityGroups': [
"sg-54cb123f",
],
'assignPublicIp': 'ENABLED'
}
})
print(response)
return {
'statusCode': 200,
'body': "OK"
}
except Exception as e:
print(e)
return {
'statusCode': 500,
'body': str(e)
}
I had this problem and it turned out that I had commented out the CMD line at the end of my Dockerfile during debugging. As such the lambda ran, but no ECS task was logged. Uncommenting the CMD led to the ECS task running and logging again.