I am about at my wits end with this one. I am currently working on setting up an auto deployment for a aws application. The application is running inside of ECS without a load balancer.
The idea here is that when a new docker image get pushed to ECR, cloudTrail picks up the log, cloudWatch triggers and alarm, said alarm goes to SNS, and SNS triggers a lambda to deploy. So far I have everything working except the lambda.
The code I am using is in python and is as follows:
import boto3
import pprint
import os
region = "us-east-2"
client = boto3.client('ecs', region_name=region)
response = client.list_task_definitions(familyPrefix= 'stage-dcs-task-000', status='ACTIVE')
def lambda_handler(event, context):
response = client.register_task_definition(
family='stage-dcs-task-000',
networkMode='bridge',
containerDefinitions=[
{
'name': 'dcs-stage-container-000',
'image': 'ecrUrlHere:latest',
'cpu': 10,
'memory': 500,
'portMappings': [
{
'containerPort': 8080,
'hostPort': 80,
'protocol': 'tcp'
},
],
'essential': True,
},
],
)
taskDefinitionRev = response['taskDefinition']['family'] + ':' + str(response['taskDefinition']['revision'])
response = client.update_service(
cluster='stage-secretName-conversation-service',
service='stage-dcs-service-000',
desiredCount=1,
taskDefinition=taskDefinitionRev,
deploymentConfiguration={
'maximumPercent': 200,
'minimumHealthyPercent': 100
},
forceNewDeployment=True
)
Now, this lambda fails with the error
"errorMessage": "An error occurred (InvalidParameterException) when calling the UpdateService operation: The container stage-dcs-container-000is not valid. The value for 'containerName' must already be specified in the task definition. Registry: ARN-LINK-HERE"
Looking at what gets created in AWS, the task definition was created correctly, and the container was named and configured correctly. Based on what I understand from the docs, my call to update-service is correct, I just have no idea why the container name would be invalid.
Something interesting to note, is I though the container names might have to be unique. So I started add the task revision number to the end of the container name in place of the 000, however, I always get back stage-dcs-container-000 back as the invalid name and have no idea where it might be coming from. Any ideas here?
After doing some more debugging I have figured out that it is the container name within the current task definition that is throwing the error. Based on container name rules, I cannot have dashes, so I changed them to underscores, but the same issues still persists.
resources: this is the tutorial I have been following https://medium.com/#YadavPrakshi/automate-zero-downtime-deployment-with-amazon-ecs-and-lambda-c4e49953273d
It took a little while for me to figure this one out, but I eventually got it solved. Truth be told I am not 100% sure what the fix was, but I know there were at least two errors.
The first was the fact that I had somehow managed to fall into a state where I had illegal characters in the name of different parts of the stack. To give an explicit example, I have dashes (-) in the name of the containers, which is an illegal character. To solve this is just ended up recreating the whole tech stack using camel casing.
The second was I had some error in the lambda function, specifically in the creation of the task revision. What I ended up noticing what that when I created things via the cli I somehow missed the docker requirement. How I got around this was I created the task revision in the AWS consul and then just copied the json into to register_task_definition method inside the lambda. This is what I started with that DID NOT WORK.
response = client.register_task_definition(
family='stage-dcs-task-000',
networkMode='bridge',
containerDefinitions=[
{
'name': 'dcs-stage-container-000',
'image': 'ecrUrlHere:latest',
'cpu': 10,
'memory': 500,
'portMappings': [
{
'containerPort': 8080,
'hostPort': 80,
'protocol': 'tcp'
},
],
'essential': True,
},
],
)
this is what I finished with that DOES WORK.
response = client.register_task_definition(
containerDefinitions=[
{
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 8080
}
],
"cpu": 10,
"environment": [],
"mountPoints": [],
"memoryReservation": 500,
"volumesFrom": [],
"image": "image uri here",
"essential": True,
"name": "productionDCSContainer"
}
],
family="productionDCSTask",
requiresCompatibilities=[
"EC2"
],
networkMode="bridge"
)
It should be noted that it is easy to fall into a state where you dead lock on the revision update, that is to say you have something like minPercentage = 100, maxPercentage = 100 and desired = 1. In this case, AWS will never kill the running instance to start the update. A way around this is have minInstance = 0, maxInstance = 200% and have forceNewDeployment = True.
Related
I'm trying to run DescribeUserPoolClient through python code and also through cloudshell, and this command doesn't return almost anything:
{
"UserPoolClient": {
"UserPoolId": "id",
"ClientName": "name",
"ClientId": "id",
"ClientSecret": "secret",
"LastModifiedDate": "2021-05-10T14:21:24.733000+00:00",
"CreationDate": "2021-05-10T14:21:24.733000+00:00",
"RefreshTokenValidity": 30,
"TokenValidityUnits": {},
"AllowedOAuthFlows": [
"client_credentials"
],
"AllowedOAuthScopes": [
":write"
],
"AllowedOAuthFlowsUserPoolClient": true
}
}
This is only parameters it returns. But documentation says that there should be a lot more like "ExplicitAuthFlows" and others. Is there something with aws or maybe something with my access rights?
For anyone having trouble with the same issue: If you have any property set by default and never touched it (never edited) amazon won't return them in request. This works for many other aws cli commands also.
Maybe it is common knowledge, but i struggled with it)
import json
import boto3,datetime
def lambda_handler(event, context):
cloudwatch = boto3.client('cloudwatch',region_name=AWS_REGION)
response = cloudwatch.get_metric_data(
MetricDataQueries=[
{
'Id': 'memory',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/RDS',
'MetricName': 'TotalMemory',
'Dimensions': [
{
"Name": "DBInstanceIdentifier",
"Value": "mydb"
}]
},
'Period': 30,
'Stat': 'Average',
}
}
],
StartTime=(datetime.datetime.now() - datetime.timedelta(seconds=300)).timestamp(),
EndTime=datetime.datetime.now().timestamp()
)
print(response)
The result is like below:
{'MetricDataResults': [{'Id': 'memory', 'Label': 'TotalMemory', 'Timestamps': [], 'Values': [], 'StatusCode': 'Complete'}]
If you are looking to get the configured vCPU/Memory then it seems like we need to call DescribeDBInstances API to get DBInstanceClass, which contains the hardware information from here
You would need to use one of the CloudWatch metric names from https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MonitoringOverview.html#rds-metrics and it seems like we can retrieve the currently available memory metric using the FreeableMemory. I was able to get data (in bytes) as seen from the RDS' Monitoring console while using this metric name from your sample code.
You can check the total amount of memory and other useful information associated with the RDS in the CloudWatch console.
Step1: Go to the CloudWatch console. Navigate to Log groups.
Step2: Search for RDSOSMetrics in the search bar.
Step3: Click on the log stream. You will be able to find all the details in the JSON. Your total memory would be present in the field titled memory.total. Sample result would be like this
{
"engine": "MYSQL",
"instanceID": "dbName",
"uptime": "283 days, 21:08:36",
"memory": {
"writeback": 0,
"free": 171696,
"hugePagesTotal": 0,
"inactive": 1652000,
"pageTables": 19716,
"dirty": 324,
"active": 5850016,
"total": 7877180,
"buffers": 244312
}
}
I have intentionally reduced the message in the JSON because of the size, but there will be many other useful fields that you can find here.
You can use custom jq command-line utility to extract the field that you want from these log groups.
You can read more about this here cloudwatch enhanced monitoring.
I'm running an AWS step function with parallel execution branches.
Each branch succeeds individually, however the overall function fails with the following error:
States.DataLimitExceeded - The state/task returned a result with a size exceeding the maximum number of characters service limit.
I then found an article from AWS that describes this issue and suggests a work around:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-lambda.html
That article says:
The Lambda invoke API includes logs in the response by default. Multiple Lambda invocations in a workflow can trigger States.DataLimitExceeded errors. To avoid this, include "LogType" = "None" as a parameter when you invoke your Lambda functions.
My question is where exactly do I put it? I've tried putting it various places in the state machine definition, however I get the following error:
The field 'LogType' is not supported by Step Functions
That error seems contrary to the support article, so perhaps I'm doing it wrong!
Any advice is appreciated, thanks in advance!
Cheers
UPDATE 1 :
To be clear, this is a parallel function, with 26 parallel branches. Each branch has a small output as per the example below. The biggest item in this data is the LogResult, which (when base64 decoded) is just the billing info. I think this info multiplied by 26 has led to the error, so I just want to turn this LogResult off!!!
{
"ExecutedVersion": "$LATEST",
"LogResult": "U1RBUlQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUKUkVQT1JUIFJlcXVlc3RJZDogZTgyY2U0ZDktMzI5Ni00ZTQ3LWI3MmQtYmJhMDMyNWJjNzBlCUR1cmF0aW9uOiA3NzI5Ljc2IG1zCUJpbGxlZCBEdXJhdGlvbjogNzgwMCBtcwlNZW1vcnkgU2l6ZTogMTAyNCBNQglNYXggTWVtb3J5IFVzZWQ6IDEwNCBNQglJbml0IER1cmF0aW9uOiAxMTY0Ljc3IG1zCQo=",
"Payload": {
"statusCode": 200,
"body": {
"signs": 63,
"nil": ""
}
},
"SdkHttpMetadata": {
"HttpHeaders": {
"Connection": "keep-alive",
"Content-Length": "53",
"Content-Type": "application/json",
"Date": "Thu, 21 Nov 2019 04:00:42 GMT",
"X-Amz-Executed-Version": "$LATEST",
"X-Amz-Log-Result": "U1RBUlQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUKUkVQT1JUIFJlcXVlc3RJZDogZTgyY2U0ZDktMzI5Ni00ZTQ3LWI3MmQtYmJhMDMyNWJjNzBlCUR1cmF0aW9uOiA3NzI5Ljc2IG1zCUJpbGxlZCBEdXJhdGlvbjogNzgwMCBtcwlNZW1vcnkgU2l6ZTogMTAyNCBNQglNYXggTWVtb3J5IFVzZWQ6IDEwNCBNQglJbml0IER1cmF0aW9uOiAxMTY0Ljc3IG1zCQo=",
"x-amzn-Remapped-Content-Length": "0",
"x-amzn-RequestId": "e82ce4d9-3296-4e47-b72d-bba0325bc70e",
"X-Amzn-Trace-Id": "root=1-5dd60be1-47c4669ce54d5208b92b52a4;sampled=0"
},
"HttpStatusCode": 200
},
"SdkResponseMetadata": {
"RequestId": "e82ce4d9-3296-4e47-b72d-bba0325bc70e"
},
"StatusCode": 200
}
I ran into exactly the same problem as you recently. You haven't said what your lambdas are doing or returning however I found that AWS refers to limits that tasks have within executions https://docs.aws.amazon.com/step-functions/latest/dg/limits.html#service-limits-task-executions.
What I found was that my particular lambda had an extremely long response with 10s of thousands of characters. Amending that so that the response from the lambda was more reasonable got past the error in the step function.
I had the problem a week ago.
I way I solved is like below:
You can define which portion of the result that is transmitted to the next step.
For that you have to use
"OutputPath": "$.part2",
In your json input you have
"part1": {
"portion1": {
"procedure": "Delete_X"
},
"portion2":{
"procedure": "Load_X"
}
},
"part2": {
"portion1": {
"procedure": "Delete_Y"
},
"portion2":{
"procedure": "Load_Y"
}
}
Once part1 is processed, you make sure that part1 is not sent in the output and the resultpath related to it. Just part 2 which is needed for the following steps is sent for the next steps.
With this: "OutputPath": "$.part2",
let me know if that helps
I got stuck on the same issue. Step function imposes a limit of 32,768 characters on the data that can be passed around between two states.
https://docs.aws.amazon.com/step-functions/latest/dg/limits.html
Maybe you need to think and breakdown your problem in a different way? That's what I did. Because removing the log response would give you some elasticity but your solution will not scale after a certain limit.
I handle large data in my Step Functions by storing the result in an S3 bucket, and then having my State Machine return the path to the result-file (and a brief summary of the data or a status like PASS/FAIL).
The same could be done using a DB if that's more comfortable.
This way won't have to modify your results' current format, you can just pass the reference around instead of a huge amount of data, and they are persisted as long as you'd like to have them.
The start of the Lambdas looks something like this to figure out if the input is from a file or plain data:
bucket_name = util.env('BUCKET_NAME')
if 'result_path' in input_data.keys():
# Results are in a file that is referenced.
try:
result_path = input_data['result_path']
result_data = util.get_file_content(result_path, bucket_name)
except Exception as e:
report.append(f'Failed to parse JSON from {result_path}: {e}')
else:
# Results are just raw data, not a reference.
result_data = input_data
Then at the end of the Lambda they will upload their results and return directions to that file:
import boto3
def upload_results_to_s3(bucket_name, filename, result_data_to_upload):
try:
s3 = boto3.resource('s3')
results_prefix = 'Path/In/S3/'
results_suffix = '_Results.json'
result_file_path = '' + results_prefix + filename + results_suffix
s3.Object(bucket_name, result_file_path).put(
Body=(bytes(json.dumps(result_data_to_upload, indent=2).encode('UTF-8')))
)
return result_file_path
result_path = upload_results_to_s3(bucket_name, filename, result_data_to_upload)
result_obj = {
"result_path": result_path,
"bucket_name": bucket_name
}
return result_obj
Then the next Lambda will have the first code snippet in it, in order to get the input from the file.
The Step Function Nodes look like this, where the Result will be result_obj in the python code above:
"YOUR STATE":
{
"Comment": "Call Lambda that puts results in file",
"Type": "Task",
"Resource": "arn:aws:lambda:YOUR LAMBDA ARN",
"InputPath": "$.next_function_input",
"ResultPath": "$.next_function_input",
"Next": "YOUR-NEXT-STATE"
}
Something you can do is, add "emptyOutputPath": "" to your json,
"emptyOutputPath": "",
"part1": { "portion1": { "procedure": "Delete_X"
}, "portion2":{ "procedure": "Load_X" } },
"part2": { "portion1": { "procedure": "Delete_Y"
}, "portion2":{ "procedure": "Load_Y" } }
That will allow you to do "OutputPath":"$.emptyOutputPath" which is empty and will clear ResultPath.
Hope that helps
Just following up on this issue to close the loop.
I basically gave up on using parallel lambdas in favour of using AQS message queues instead
I've got an AWS Simple Systems Manager document set up, that I'm trying to trigger from AWS Lambda using Python 2.7.
My call looks like this:
ssmCommand = ssm.send_command(
Targets = [
{'Key': 'tag:Backup',
'Values': [
'db']
}
],
DocumentName = 'set-backup-mode-begin',
TimeoutSeconds = 240,
Comment = 'Set Backup Mode',
Parameters = {},)
But, nothing happens. I think the problem is here:
Targets = [
{'Key': 'tag:Backup',
'Values': [
'db']
}
]
Currently, there is exactly one instance with tag "Backup" and value of "db". (If I can get this to work, there will be many more, but for now, just one.)
I think (maybe?), I'm doing something wrong, and the Targets is getting defined as null, so nothing happens? It appears to be syntactically correct. (It compiles and runs, but nothing happens.)
I'm brand new to Python and AWS Lambda. Can someone offer a suggestion?
I submitted a skill to Amazon for Alexa, and it failed certification due to intellectual property rights. Amazon suggested that I say the service is "for" the IP rights holder, so I modified the name and am now getting this error for everything I try to do.
{
"errorMessage": "Exception: TypeError: Cannot read property 'application' of undefined"
}
So far, I updated the Skill Name, Invocation Name, and Welcome Message. Is there something else I need to update or run on the dev portal to get this to work again?
Update: When I try to start the skill from the Alexa Development portal, I see this in the logs for
console.log("event.session.application.applicationId=" + event.session.application.applicationId);
{
"version": "1.0",
"session": {
"new": true,
"sessionId": "SessionId.8b65b2f5-0193-4307-9bef-88c116d9344b",
"application": {
"applicationId": "amzn1.echo-sdk-ams.app.5987b947-c8e9-4fc4-a0b8-2ba12c57ea59"
},
"attributes": null,
"user": {
"userId": "amzn1.ask.account.ABCDEFG" // masked my account value
}
},
"request": {
"type": "IntentRequest",
"requestId": "EdwRequestId.4d19f589-cdca-4303-99dc-0dc5cec781d2",
"timestamp": "2016-04-18T16:21:04Z",
"intent": {
"name": "DontKnowIntent"
}
}
}
The application ID matches the one supplied in the Alexa Development portal, so I don't think that's causing any issues. The property 'application' is only ever called after 'session', which clearly is defined. I don't know if the null attributes is causing an error. Maybe someone can look at a successful request?
Finally, here is my code: https://github.com/Shwheelz/alexa-skills-kit-js/blob/master/my_skills/pokemonTrivia/src/index.js
I have changed the name twice before on a node app and a Java 8 app. All I had to do was change the name (also changed invocation name) under the skill information. It worked first time out. Now my skill name did not update on alexa app once and did the other time. Since you are not certified, you may want to create the skill for scratch. This should only take about 5 or 10 mins. Just do not forget to change or add the new application Id of you lambda