I have an AWS Lambda function does operations against Kinesis Firehose.
The function uses backoff mechanism. (which at this time I think wasting my computation time).
But anyway, in some point in my code, I would like to fail the execution.
What command should I use in order to make the execution stop?
P.s.
I found out that there are commands such as:
context.done()
context.succeed()
context.fail()
I've got to tell you, I could not find any documentation about these commands in AWS documentation.
Those methods are available only for backward compatibility, since they were first introduced with Node.js v0.10.42. If you use NodeJS version 4.* or 6.*. Use the callback() function.
Check: Using the Callback Parameter in Lambda for more information how to take advantage of this function.
Here is my solution (probably not perfect, but it works for me)
time.sleep(context.get_remaining_time_in_millis() / 1000)
The code is in Python, but I am sure you can apply the same logic using any other language. The idea is make my lambda function "fall asleep" for the remaining time of "retries".
The full example may loo like this:
...
some code that processes logs from CloudWatch when my ECS container stops(job is finished)
...
# Send an email notification
sns_client = client('sns')
sns_client.publish(
TopicArn=sns_topic,
Subject="ECS task success detected for container",
Message=json.dumps(detail)
)
# Make sure it was send only once, therefore 'sleep'
# untill lambda stops 'retries'
time.sleep(context.get_remaining_time_in_millis() / 1000)
So, the email is send only once. I hope it helps!
Related
I would like to run one lambda function, that would return a list of parameters. Based on the number of parameters I would like to trigger another lambda functions to finish the process individually (e.g. 100 independent sub-lambda function).
Would like to know how this be done? It would be great if there are some github settings? Thanks a lot.
There are three options: first, call Invoke using the AWS SDK for your language.
Second, use Step Functions.
Third, write each parameter onto an SQS queue, and configure the second Lambda to be triggered by that queue. This is the approach that I'd use.
I have nearly 1000 items in my DB. I have to run the same operation on each item. The issue is that this is a third party service that has a 1 second rate limit for each operation. Until now, I was able to do the entire thing inside a lambda function. It is now getting close to the 15 minute (900 second) timeout limit.
I was wondering what the best way for splitting this would be. Can I dump each item (or batches of items) into SQS and have a lambda function process them sequentially? But from what I understood, this isn't the recommended way to do this as I can't delay invocations sufficiently long. Or I would have to call lambda within a lambda, which also sounds weird.
Is AWS Step Functions the way to go here? I haven't used that service yet, so I was wondering if there are other options too. I am also using the serverless framework for doing this if it is of any significance.
Both methods you mentioned are options that would work. Within lambda you could add a delay (sleep) after one item has been processed and then trigger another lambda invocation following the delay. You'll be paying for that dead time, of course, if you use this approach, so step functions may be a more elegant solution. One lambda can certainly invoke another--even invoking itself. If you invoke the next lambda asynchronously, then the initial function will finish while the newly-invoked function starts to run. This article on Asynchronous invocation will be useful for that approach. Essentially, each lambda invocation would be responsible for processing one item, delaying sufficiently to accommodate the service limit, and then invoking the next item.
If anything goes wrong you'd want to build appropriate exception handling so a problem with one item either halts the rest or allows the rest of the chain to continue, depending on what is appropriate for your use case.
Step Functions would also work well to handle this use case. With options like Wait and using a loop you could achieve the same result. For example, your step function flow could invoke one lambda that processes an item and returns the next item, then it could next run a wait step, then process the next item and so on until you reach the end. You could use a Map that runs a lambda task and a wait task:
The Map state ("Type": "Map") can be used to run a set of steps for
each element of an input array. While the Parallel state executes
multiple branches of steps using the same input, a Map state will
execute the same steps for multiple entries of an array in the state
input.
This article on Iterating a Loop Using Lambda is also useful.
If you want the messages to be processed serially and are happy to dump the messages to sqs, set both the concurency of the lambda and the batchsize property of the sqs event that triggers the function to 1
Make it a FIFO queue so that messages dont potentially get processed more than once if that is important.
When my AWS Step Functions' State Machine fails ExecutionsFailed , I'd like to trigger a lambda function in response to it.
Seems that you have to create a rule on CloudWatch; but I couldn't find description on how to do that (in particular how the Event Patterns supposed to look like).
p.s. in my case it happens due to exceeding 25,000 history limit (so not quite so easy to handle within state machine; without having to add loop counters etc.; so I'd prefer for it to fail; and then handle it via lambda)
Current workaround is to create a cron rule for a scheduled event on a cloudwatch; and the check the state machine; and in case it is failed; handle it.
We are experiencing double Lambda invocations of Lambdas triggered by S3 ObjectCreated-Events. Those double invocations happen exactly 10 minutes after the first invocation, not 10 minutes after the first try is complete, but 10 minutes after the first invocation happened. The original invocation takes anything in the range between 0.1 to 5 seconds. No invocations results in errors, they all complete successfully.
We are aware of the fact that SQS for example does not guarantee exactly-once but at-least-once delivery of messages and we would accept some of the lambdas getting invoked a second time due to results of the distributed system underneath. A delay of 10 minutes however sounds very weird.
Of about 10k messages 100-200 result in double invocations.
The AWS Support basically says "the 10 minute wait time is by design but we cannot tell you why", which is not at all helpful.
Has anyone else experienced this behaviour before?
How did you solve the issue or did you simply ignore it (which we could do)?
One proposed solution is not to use direct S3-lambda-triggers, but let S3 put its event on SNS and subscribe a Lambda to that. Any experience with that approach?
example log: two invocations, 10 minutes apart, same RequestId
START RequestId: f9b76436-1489-11e7-8586-33e40817cb02 Version: 13
2017-03-29 14:14:09 INFO ImageProcessingLambda:104 - handle 1 records
and
START RequestId: f9b76436-1489-11e7-8586-33e40817cb02 Version: 13
2017-03-29 14:24:09 INFO ImageProcessingLambda:104 - handle 1 records
After a couple of rounds with the AWS support and others and a few isolated trial runs it seems like this is simply "by design". It is not clear why, but it simply happens. The problem is neither S3 nor SQS / SNS but simply the lambda invocation and how the lambda service dispatches the invocations to lambda instances.
The double invocations happen somewhere between 1% and 3% of all invocations, 10 minutes after the first invocation. Surprisingly there are even triple (and probably quadruple) invocations with a rate of powers of the base probability, so basically 0.09%, ... The triple invocations happened 20 minutes after the first one.
If you encounter this, you simply have to work around it using whatever you have access to. We for example now store the already processed entities in a Cassandra with a TTL of 1 hour and only responding to messages from the lambda if the entity has not been processed yet. The double and triple invocations all happen within this one hour timeframe.
Not wanting to spin up a data store like Dynamo just to handle this, I did two things to solve our use case
Write a lock file per function into S3 (which we were already using for this one) and check for its existence on function entry, aborting if present; for this function we only ever want one of it running at a time. The lock file is removed before we call callback on error or success.
Write a request time in the initial event payload and check the request time on function entry; if the request time is too old then abort. We don't want Lambda retries on error unless they're done quickly, so this handles the case where a duplicate or retry is sent while another invocation of the same function is not already running (which would be stopped by the lock file) and also avoids the minimal overhead of the S3 requests for the lock file handling in this case.
I'm making an application that will continually send CFHTTP requests to a server to search for items, as well as sending further CFHTTP requests to perform actions on any returned results.
The issue I'm having is that the server has a maximum threshold of 3 requests per second and even when I try to implement a sleep call every 4 milliseconds it doesn't work properly as, although it delays, the CFHTTP requests can queue up if it takes them a couple of seconds to return so that it then tries to send multiple in the same second triggering the threshold to be exceeded.
Is there a way I can ensure that there are never more than 3 active CFHTTP requests?
I think you are going to need to implement some sort of logging widget as part of your process. The log will keep track of request frequency. If the threshold is not met, then you would just skip over that iteration of your CFHTTP call. I don't mean a file log or a database log, but something implemented in the application or even request scope depending on your implementation. There is no way to throttle CFHTTP itself. It is basically a very simplistic wrapper around a Java HTTP library which then goes straight to the underlying operating system.
If you're limiting concurrent requests, then first part of this answer applies. If you're looking to limit the number of requests per second, then the bit at the end applies. The question kind of asks both things.
If I understand correctly, you've got a number of threads (either as requests CF is processing or threads CF has created itself) which all need to make calls to the same rate-limited domain. What you need is a central way of co-ordinating access, combined with a nice way of controlling program execution.
I don't know of any native limits that CF might support (I'd be happy to be proven wrong) so you're likely to have to implement your own. The cheap'n'nasty way to do this is to increment and decrement a allowed_conenctions variable in a long-lived scope such as appliation. The downsides are that you have to implement checking all over the place and that if there are no spare connections, you'll have to wait somehow.
Really what you have is a resource pool (of allowed HTTP connections) and I'm guessing that you want your code to wait until a connection is free. CF does this kind of thing already for database connections.
In your case, there isn't really a need to keep anything in a pool (as HTTP connections aren't long-lived), other than a permit to use the resource. Java provides a class which ought to provide what you're after, the Semaphore.
I've not tried it but in theory, something like the snippet below ought to work:
//Application.cfc:onApplicationStart()
application.http_pool = CreateObject("java","java.util.concurrent.Semaphore").init(3)
//Meanwhile, elsewhere in your code
application.http_pool.acquire()
//Make my HTTP call
application.http_pool.release()
You could even wrap the HTTP object to provide this functionality without having to use the acquire/release each time, which would make it more reliable.
EDIT
It you're looking to limit rates, look at guava's RateLimiter which has the same general interface as Semaphore above, but implements rate limiting for you. You'd need to add guava to ColdFusion's classpath, or use JavaLoader or use CF10 which has classloading facilities built-in.