Amazon Lambda list running functions - amazon-web-services

How can I check the running Lambda functions running using the aws cli?
It seems that there is no a command to check it:
aws lambda XXXX
I have several scripts running, and I'd like to monitor the situation.
It is enough to show how many functions are running.
Thank you

Watching or monitoring the cloud watch logs would be the best way to monitor if a lambda is running or not. These logs are not real time, but may be near real time enough for your needs. You could ask CloudWatch for the last X minutes of log for a particular lambda and monitor the timing of the log statements. As Aniket Chopade stated though, knowing why you're trying to do this could help someone provide a better solution.

You will not see any state such as "running" in CLI ouput.
From users perspective, Lambda functions are always up running (actually right word is invokable) because they respond to triggers.
There is concept of keeping them "warm" , that means keeping one physical instance alive having containers running in it. But again these level of details are hidden from lambda's users.
I am curious why you want to know such state for lambda functions.

Related

Best AWS architecture solution for migrating data to cloud

Say I have 4 or 5 data sources that I access through API calls. The data aggregation and mining is all scripted in a python file. Lets say the output is all structured data. I know there are plenty of considerations, but from a high level, what would some possible solutions look like if I ultimately wanted to run analysis in BI software?
Can I host the python script in Lambda and set a daily trigger to run the python file. And then have the output stored in RDS/Aurora? Or since the applications I'm running API calls to aren't in AWS, would I need the data to be in an AWS instance before running a Lambda function?
Or host the python script in an EC2 instance, use lambda to trigger a daily refresh that just stores the data in EC2-ESB or Redshift?
Just starting to learn AWS cloud architecture so my knowledge is fairly limited. Just seems like there can be multiple solutions to any problem so not sure if the 2 ideas above are viable.
You've mentioned two approaches which are working. Ultimately it very depends on your use case, budget etc.. and you are right, usually in AWS you will have different solutions that can solve the same problem. For example, another possible solution could be to Dockerize your Python script and run it on containers services (ECS/EKS). But considering you just started with AWS I will focus on the approaches you mentioned as it's probably the most 2 common ones.
In short, based on your description, I would not suggest to go with EC2 because it adds complexity to your use case and moreover extra costs. If you can imagine the final setup, you will need to configure and manage the instance itself, its class type, AMI, your script deployment, access to internet, subnets, etc. Also a minor thing to clarify: you would probably set a cron expression on it to trigger your script (not a lambda reaching the EC2 !). As you can see, quite a big setup for poor benefits (except maybe gaining some experience with AWS ;)) and the instance would be idle most of the time which is far from optimum.
If you just have to run a daily Python script and need to store the output somewhere I would suggest to use lambda for the processing, you can simply have a scheduled event (prefered way is now Amazon EventBridge instead) that triggers your lambda function once a day. Then depending on your output and how you need to process it, you can use RDS obviously from lambda using the Python SDK but you can also use S3 as blob storage if you don't need to run specific queries - for example if you can store your output in json format.
Note that one limitation to lambda is that it can only run for 15 minutes straight per execution. The good thing is that by default lambda has internet access so you don't need to care about any gateway setup and can reach your external endpoints.
Also from a cost perspective running one lambda/day combined with S3 should be free or almost free. The pricing in lambda is very cheap. Running 24/7 an EC2 instance or RDS (which is also an instance) will cost you some money.
Lambda with storage in S3 is the way to go. EC2 / EBS costs add up over time and EC2 will limit the parallelism you can achieve.
Look into Step Functions as a way to organize and orchestrate your Lambdas. I have python code that copies 500K+ files to S3 and takes a week to run. If I copy the files in parallel (500-ish at a time) this process takes about 10 hours. The parallelism is limited by the sourcing system as I can overload it by going wider. The main Lambda launches the file copy Lambdas at a controlled rate but also terminates after a few minutes of run time but returns the last file updated to the controlling Step Function. The Step Function restarts the main Lambda where the last one left off.
Since you have multiple sources you can have multiple top level Lambdas running in parallel all from the same Step Function and each launching a controlled number of worker Lambdas. You won't overwhelm S3 but you will want to make sure you don't overload your sources.
The best part of this is that it costs pennies (at the scale I'm using it).
Once the data is in S3 I'm copying it up to Redshift and transforming it. These processes are also part of the Step Function through additional Lambda Functions.

Lambda execution time out after 15 mins what I can do?

I have a script running on Lambda, I've set the timeout to maximum 15 mins but it's still giving me time out error, there is not much infomation in the lofs, how I can solve this issue and spot what is taking soo much time? I tested the script locally and it's fairly quick.
Here's the error:
{
"errorMessage": "2020-09-10T18:26:53.180Z xxxxxxxxxxxxxxx Task timed out after 900.10 seconds"
}
If you're exceeding the 15 minutes period there are a few things you should check to identify:
Is the Lambda connecting to resources in a VPC? If so is it attached via VPC config, and do the target resources allow inbound access from the Lambda.
Is the Lambda connecting to a public IP but using VPC configuration? If so it will need a NAT attached to allow outbound access.
Are there any long running processes as part of your Lambda?
Once you've ruled these out consider increasing the available resources of your Lambda, perhaps its hitting a cap and is therefore performing slow. Increasing the memory will also increase the available CPU for you.
Adding comments in the code will log to CloudWatch logs, these can help you identify where in the code the slowness starts. This is done by simply calling the general output/debug function of your language i.e. print() in Python or console.log() in NodeJS.
If the function is still expected to last longer than 15 minutes after this you will need to break it down into smaller functions performing logical segments of the operation
A suggested orchestrator for this would be to use a step function to handle the workflow for each stage. If you need shared storage between each Lambda you can make use of EFS to be attached to all of your Lambdas so that they do not need to upload/download between the operations.
Your comment about it connecting to a SQL DB is likely the key. I assume that DB is in AWS in your VPC. This requires particular setup. Check out
https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html
https://docs.aws.amazon.com/lambda/latest/dg/services-rds-tutorial.html
Another thing you can do is enable debug level logging and then look at the details in CloudWatch after trying to run it. You didn't mention which language your lambda uses, so how to do this could be different for the language you use. Here's how it would be done in python:
LOGGER = logging.getLogger()
LOGGER.setLevel(logging.getLevelName('DEBUG'))

How to improve lambda performance?

Hi I am trying to understand the lambda architecture in depth. Below is my understanding about lambda.
Whenever we create lambda function, container will spin up. If we select python as run time the python container will spin up. Now there is cold start. For example, If we dint call lambda for long time, container will become inactive. It will call new container and it will take some time to spin up new container. This is cold start. Now I am bit confused here. If I want to avoid this delay what is the right approach? We can trigger lambda every 5 min using cloud watch. Any other good approaches to handle this?
Also there is /tmp folder where we can store static files. So /tmp is not part of container? Whenever new container spins up, /tmp data will be lost or remain? Can someone help me to understand this concepts and tell me to use best approaches to handle this? Any help would be appreciated. Thank you.
You are correct there is a cold start issue but it's been observed that it depends on a lot of factors(runtime, memory, zip size....for e.g. a java lambda will have more cold start compared to python) and basically it was a big problem for lambdas inside a user-defined VPC. wherein there is an overhead of creating an elastic network interface and then invoking the lambda. But the recent rollout has changed this and now you should not see this problem. improved-vpc-networking for lambda.
Also just in the reinvent 2019 aws have announced the Provisioned Concurrency So for lambda Functions using Provisioned Concurrency will execute with consistent start-up latency.
With Provisioned Concurrency, functions can instantaneously serve a
burst of traffic with consistent start-up latency for every invoke up
to the specified scale. Customers only pay for the amount of concurrency that they configure and for the period of time that it is configured.
Regarding the /tmp please note that Each Lambda function receives 512MB of non-persistent disk space in its own /tmp directory. So you cannot rely on it. Lambda limits If you are looking for persistent storage you should be using S3.

Is AWS Lambda the proper way of running a batch job?

I have a batch job that I need to run on AWS. I'm wondering what's the best service to use. The job needs to run once a day, so I think that naturally AWS Lambda with a CloudWatch Rule triggering it would do it. However, I'm starting to think that AWS Lambda is thought to be used as a service to handle requests. This AWS official library to integrate Spring-Boot is very oriented to handle HTTP requests, and when creating a lambda via AWS Console, only test cases that send an input to the lambda can be written.
Then, is this a use case for AWS Lambda? Also, these functions can run up to 15 minutes. What should I use if my job needs to run longer?
The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications that are responsive to events and new information.
If your batch is running within a limit of 15 minutes then you can go with a lambda function.
But if you want batch processing to be done, you should check AWS batch.
Here is nice article which demonstrates the usage of AWS batch.
If you are already using some batch framework like spring-batch, you can also take a look at ECS scheduled task with Fargate.
With ECS Fargate you can launch and stop container services that you need to run only at certain times.
Here are some related articles on Fargate event and scheduled task and Scheduled Tasks.
If you're confident that your function will only run at maximum of 15mins, AWS Lambda could be the solution. Here are the AWS Lambda limits that could help you decide on that.
Also note that lambda has cold start, it's when it will run slower at first but will eventually pick up the pace. Here are some good reads about it that could help you decide on the lambda direction, but feel free to check on any articles that could better explain at your disposal.
This one shows a brief lists that you would like to consider and the factors affecting it.
This one might have a deeper explanation of the cold start with regards to how it works internally.
What should I use if my job needs to run longer?
Depending on your infrastructure, you could maybe explore Scheduled Tasks

How to handle backpressure using google cloud functions

Using google cloud functions, is there a way to manage execution concurrency the way AWS Lambda is doing? (https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html)
My intent is to design a function that consumes a file of tasks and publish those tasks to a work queue (pub/sub). I want to have a function that consumes tasks from the work queue (pub/sub) and execute the task.
The above could result in a large number of almost concurrent execution. My dowstream consumer service is slow and cannot consume many concurrent requests at a time. In all likelyhood, it would return HTTP 429 response to try to slow down the producer.
Is there a way to limit the concurrency for a given Google Cloud functions the way it is possible to do it using AWS?
This functionality is not available for Google Cloud Functions. Instead, since you are asking to handle the pace at which the system will open concurrent tasks, Task Queues is the solution.
Push queues dispatch requests at a reliable, steady rate. They guarantee reliable task execution. Because you can control the rate at which tasks are sent from the queue, you can control the workers' scaling behavior and hence your costs.
In your case, you can control the rate at which the downstream consumer service is called.
This is now possible with the current gcloud beta! You can set a max that can run at once:
gcloud beta functions deploy FUNCTION_NAME --max-instances 10 FLAGS...
See docs https://cloud.google.com/functions/docs/max-instances
You can set the number of "Function invocations per second" with quotas. It's documented here:
https://cloud.google.com/functions/quotas#rate_limits
The documentation tells you how to increase it, but you can also decrease it to achieve the kind of throttling that you are looking for.
You can control the pace at which cloud functions are triggered by controlling the triggers themselves. For example, if you have set "new file creation in a bucket" as trigger for your cloud function, then by controlling how many new files are created in that bucket you can manage concurrent execution.
Such solutions are not perfect though because sometimes the cloud functions fails and get restart automatically (if you've configure your cloud function that way) without you having any control over it. In effect, the number of active instances of cloud functions will be sometimes more than you plan.
What AWS is offering is a neat feature though.