How to receive an endless WebSocket data source using AWS Lambda? - amazon-web-services

I want to crawl data from a WebSocket data source, usually WebSocket data is an endless stream, while an AWS Lambda function has a Timout limit, the maximum allowed value is 900 seconds.
If my Lambda function acts as a WebSocket client and connects to a WebSocket url, e.g., wss://ws-feed-public.sandbox.pro.coinbase.com, it starts to receive data for 900 seconds and get terminated by then.
How to keep my Lamda function running forever? Thanks!
Right now I'm running my crawler inside a Linux VM, is it possible to migrate it to AWS Lambda?

AWS Lambda functions run for a maximum of 900 seconds (15 minutes).
There is no way to extend this.
You should continue using an Amazon EC2 instance or a container (ECS, Fargate).
Fun fact: When initially released, the limit was 3 minutes. It was later extended to 5 minutes, then to 15 minutes.

Related

AWS Lambda function: Timeout after 900 secs

I am invoking an AWS Lambda function locally using aws-sam cli command and I have set the Timeout property to 900 seconds but still it shows function timeout error. However, when I was invoking this function in lambda handler in AWS Console these 900 seconds were enough for the inferencing.
Please help me figure out a solution for this issue and what is the maximum limit I can go for Timeout?
AWS Lambda functions (as at July 2021) can only run for a maximum of 15 minutes (which is 900 seconds).
Some people do 'interesting' things like:
Call another Lambda function to continue the work, or
Use AWS Step Functions to orchestrate multiple AWS Lambda functions
However, it would appear that your use-case is Machine Learning, which does not like to have operations stopped in the middle of processing. Therefore, AWS Lambda is not suitable for your use-case.
Instead, I would recommend using Amazon EC2 spot instances, which will likely be lower-cost for your use-case. While spot instances might occasionally be terminated, your use-case can probably handle the need to re-run some processing if this happens.

AWS Lambda hangs between invocations

I am using the following 3 services: Amazon S3, Amazon SQS and AWS Lambda.
The same configuration is created for processing for both CSV and EXCEL files (the lambda function that processes EXCEL files is just converting them to CSV files and re-uploading them to S3 in order for the other lambda function to process them)
AWS Lambda configurations:
Memory: 1024 MB
Timeout: 6 minutes
Reserved concurrency: 1 (for the current testing I don't need multiple parallel functions)
Retry attempts: 0
DLQ: none configured at the moment (will be added later)
For Amazon S3:
On a 's3:ObjectCreated:*' event, the S3 sends a message to a configured SQS queue.
The SQS queue has a Lambda trigger attached to it.
I have an external process that is uploading files to my S3 bucket.
This is the start of the entire workflow (S3 -->SQS --> Lambda)
This process has uploaded around 40 files in a very short period of time (some CSV files and some EXCEL files as well).
I was looking into the SQS queue and CloudWatch to see how the processing was going and I was able to see about 15 messages in flight for the SQS queue that is handling the CSV files and about 17 messages in flight for the SQS queue that is handling the EXCEL files and the logs in CloudWatch were being updated and everything was going good.
After about 15 seconds of processing everything stopped. Both lambda functions were just hanging. I was still seeing around 15 and 13 messages in flight for both SQS queues but absolutely nothing was being done on the AWS Lambda.
It looked like something went wrong.
After about 5 minutes of doing nothing both functions suddenly started to process the files. Both functions processed a couple of files for about 15 seconds and then silence once again.
After another 5 minutes of doing nothing both functions started again to process the files.
This happened a couple of times with 5 minutes breaks.
The Lambda functions are not doing any external calls or something that could make them hang. The waiting was between AWS Lambda invocations so it wasn't within my code.
For example:
2021-01-22T17:23:56.426+02:00 REPORT RequestId: d0a01831-ff93-5a71-83d6-40b50fd0affa Duration: 453.19 ms Billed Duration: 454 ms Memory Size: 1024 MB Max Memory Used: 319 MB
2021-01-22T17:29:41.860+02:00 START RequestId: 752f0eef-6738-5c24-ad52-566b96983c92 Version: $LATEST
What is making AWS Lambda to hang?
PS: If it would be helpful then I could attach the CloudWatch logs.
I think the culprit is "Reserved concurrency: 1". The SQS --> Lambda part involves an invisible middle man called event source mapping which polls the queue and invokes the Lambda for you.
When messages are available, [The event source mapping] Lambda reads up to 5 batches and sends them to your function.
Since you have configured the reserved concurrency to be 1, when the event source mapping tries to do 5 invocations at once, 4 of them get "Rate exceeded" error. The message will then be put back into the queue after the visibility timeout. After that, the queue will trigger the lambda again and this "try and error" process continues. Every time the lambda is only able to process 1 message, and the "hanging" behaviour you are seeing is actually the remaining 4 messages waiting for the visibility timeout.
Your use case may make for a workflow using AWS Step functions. You can break down each step into separate Lambda functions and track each step in AWS Step functions. This is a really nice tool. You can step through and see each step. For details, see:
Create AWS serverless workflows by using the AWS SDK for Java

Lambda timeout after 1 second

According to https://docs.aws.amazon.com/lambda/latest/dg/limits.html Lambda functions are supposed to timeout after 5 minutes but mine is getting a task timed out after 1 second. It reads a small text file from an S3 bucket, parses it and performs an action.
How can I increase the timeout?
Edit: after moving it to a different region I now get the same problem after a much more generous 3 seconds. I now have another problem which is there's no CloudFront trigger options in the eu-west-1 and eu-west-2 regions which I need to run it.
You can increase the lambda function timeout by 2 ways
Use the aws console
Use the CLI
Using AWS Console open the lambda function and modify the timeout setting over there
Using CLI use the --timeout flag to increase the timeout
https://docs.aws.amazon.com/cli/latest/reference/lambda/update-function-configuration.html

Warming Lambda Function with Cloudwatch schedule rules

I'm trying to warm a lambda (inside VPC which access a private RDS) function with cloudwatch. The rate is 5 minutes (only for experimental) I intend to make it 35 minutes later on.
After I saw the cloudwatch logs which indicate that the function has been called (and completed which I have set up a condition if no input was given, return an API gateway response immediately), I call the function from API gateway URL.
However, I'm still getting that cold starts which it return a response in 2sec. If I do it again, I get the response in 200ms.
So my question is:
What did I do wrong? can I actually warm a lambda function with a cloudwatch schedule?
Is dropping the request immediately affects this behaviour? the db connection is not established if the request is from cloudwatch
Thanks!
****EDIT****
I tried to connect to the db before I drop the function when it's called by cloudwatch. But it doesn't change anything. The first request through API call still around 2s and the next ones are around 200ms.
****EDIT 2****
I tried to remove the schedule entirely and the cold start achieves 9s. So I guess the 2s has discounted the cold start. Is it possible that the problem lies in other services? Such as API Gateway?

AWS Lambda not executing concurrently

So I defined a fairly simple AWS Lambda. I created an HTTP GET URL for it using AWS API Gateway. I deployed it and tested the URL in the browser and it worked. I then created a desktop app to call the URL, it only takes one query string parameter. I ran the code serially to call the URL 100 times with a different query string input each time, and saw that the lambda executed an average of 500 milliseconds each time.
I then changed my desktop app to issue the requests in parallel. I expected the overall time to take maybe 1 second or so to complete, given that the longest execution time was like 950 milliseconds on average. However, when I did this, it took more than 30 seconds to complete all the requests.
I've done other tests to know the desktop app really is issuing all the URL requests in parallel, so that's not the issue. I just don't understand why it didn't spin up 100 lambdas to service each URL request so that they executed concurrently. It appears that the requests were buffered.
The only difference between each URL is the query string parameter. I am, at this point, considering creating 100 different lambdas, each built with the different value previously passed in the query string, but each with a different URL so I can achieve actual concurrent execution.
Am I missing something?
AWS lambda by default provided concurrent execution upto 75, i. e at a time 75 lambdas can be created.
EDIT: By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 1000.
The previous limit was 75 per Lambdas. I didn't check the latest concurrent limit.
Here is the Documentation
If you need more concurrency you need to raise a case with AWS team.
Concurrent execution count will differ depending on whether or not your Lambda function is processing events from a stream-based event source.
Event sources that aren't stream-based – If you create a Lambda function to process events from event sources that aren't stream-based (for example, Amazon S3 or API Gateway), each published event is a unit of work. Therefore, the number of events (or requests) these event sources publish influences the concurrency.
You can use the following formula to estimate your concurrent Lambda function invocations:
events (or requests) per second * function duration
For example, consider a Lambda function that processes Amazon S3 events. Suppose that the Lambda function takes on average three seconds and Amazon S3 publishes 10 events per second. Then, you will have 30 concurrent executions of your Lambda function.
Request Rate
Request rate refers to the rate at which your Lambda function is invoked. For all services except the stream-based services, the request rate is the rate at which the event sources generate the events. For stream-based services, AWS Lambda calculates the request rate as follow:
request rate = number of concurrent executions / function duration
For example, if there are five active shards on a stream (that is, you have five Lambda functions running in parallel) and your Lambda function takes about two seconds, the request rate is 2.5 requests/second.
Source :- http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html