Is there a Lambda-Warmer package for dotnet? - amazon-web-services

I want to keep my AWS Lambda instances warm (even after the default warm time of 5 minutes). I saw good packages like this but unable to find any good examples using dotnet.
Could you suggest a Lambda-Warmer package for dotnet which could do the following, please?
keep the Lambda warm even after 5 minutes
keep concurrent instances warm (not just a single instance)

Related

Setup serverless local environment for AWS using serverless framework

Hi I am using the serverless framework to develop my application and I need to set it up in a local environment I am using API gateway, Lambda, VPC , SNS, SQS, and DB is connected via VPC peering, currently, everytime I am deploying and testing my code and its tedious process and takes 5 mins to deploy, Is there any way to set up a local environment to have everything in one place
It should be possible in theory, but it is not an easy thing to do. There are products like LocalStack that offer exactly this.
But, I would not recommend going that route. Ultimately, by design this will always be a huge cat and mouse game. AWS introduces a new feature or changes some minor detail of their implementation and products like LocalStack need to catch up. Furthermore, you will always only get an "approximation" of the "actual cloud". It never won't be a 100% match.
I would think there is a lot of work involved to get products like LocalStack working properly with your setup and have it running well.
Therefore, I would propose to invest the same time into proper developer experience within the "actual cloud". That is what we do: every developer deploys their version of the project to AWS.
This is also not trivial, but the end result is not a "fake version" of the cloud that might or might not reflect the "real cloud".
The key to achieve this is Infrastructure as code and as much automation as possible. We use Terraform and Makefiles which works very well for us. If done properly, we only ever build and deploy what we changed. The result is that changes can be deployed in seconds to AWS and the developer can test the result either through the Makefile itself or using the AWS console.
And another upside of this is, that in theory you need to do all the same work anyway for your continuous deployment, so ultimately you are reducing work by not having to maintain local deployments and cloud deployments.

AWS service for doing jobs

I have the following need - the code needs to call some APIs, get some data, and store them in a database (flat file will do for our purpose). As the APIs give access to a huge number of records, we want to split it into 30 parts, each part scraping a certain section of the data from the APIs. We want these 30 scrapers to run in 30 different machines - and for that, we have got a Python program that does the following:
Call the API, get the data, based on parameters (which part of the API to call)
Dump it to the local flatfile.
And then later, we will merge the output from the 30 files into one giant DB.
Question is - which AWS tool to use for our purpose? We can use EC2 instance, but we have to keep the EC2 console open on our desktop where we connect to it to run the Python program, it is not feasible to keep 30 connections open on my laptop. It is very complicated to get remote desktop on those machines, so logging there, starting the job and then disconnecting - this is also not feasible.
What we want is this - start the tasks (one each on 30 machines), let them run and finish by themselves, and if possible notify me (or I can myself check for health periodically).
Can anyone guide me which AWS tool suits our purpose, and how?
"We can use EC2 instance, but we have to keep the EC2 console open on
our desktop where we connect to it to run the Python program"
That just means you are running the script wrong, and you need to look into running it as a service.
In general you need to look into queueing up these tasks in SQS and then triggering either EC2 auto-scaling or Lambda functions depending on if your script will run inside the Lambda runtime restrictions.
This seems like a good application for Step Functions. Step Functions allow you to orchestrate multiple lambda functions, Glue jobs, and other services into a business process. You could write lambda functions that call the API endpoints and store the results in S3. Once all the data is gathered, your step function could trigger a lambda function, glue job, or something else that processes the data into your database. Step Functions help with error handling and retry and allow easy monitoring of your process.

Pyspark job freezes with too many vcpus

TLDR: I have a pyspark job that finishes in 10 minutes when I run it in a ec2 instance with 16 vcpus but freezes out (it doesn't fail, just never finishes) if I use an instance with over 20 vcpus. I have tried everything I could think of and I just don't know why this happens.
Full story:
I have around 200 small pyspark jobs that for a matter of costs and flexibility I execute using aws batch with spark dockers instead of EMR. Recently I decided to experiment around the best configuration for those jobs and I realized something weird: a job that finished quickly (around 10 minutes) with 16 vcpus or less would just never end with 20 or more (I waited for 3 hours). First thing I thought is that it could be a problem with batch or the way ecs-agents manage the task, so I tried running the docker in an ec2 directly and had the same problem. Then I thought the problem was with the docker image, so I tried creating a new one:
First one used with spark installed as per AWS glue compatible version (https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-1.0/spark-2.4.3-bin-hadoop2.8.tgz)
New one was ubuntu 20 based with spark installed from the apache mirror (https://apache.mirror.digitalpacific.com.au/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz)
Same thing happened. Then I decided the problem was with using docker at all, so I installed everything directly in the ec2, had the same result. Tried changing spark version, also the same thing happened. Thought it could be a problem with hardware blocking too many threads, so I switched to an instance with AMD, nothing changed. Tried modifying some configurations, memory amount used by driver, but it always has the same result: 16 vcpus it work, more than it, it stops.
Other details:
According to the logs it seems to always stop at the same point: a parquet read operation on s3, but the parquet file is super small (> 1mb) so I don't think that is the actual problem.
After that it still has logs sometimes but nothing really useful, just "INFO ContextCleaner: Cleaned accumulator".
I use s3a to read the files from s3.
I don't get any errors or spark logs.
I appreciate any help on the matter!
Stop using the Hadoop 2.7 binaries. They are woefully obsolete, especially for S3 connectivity. replace all the hadoop 2.7 artifacts with Hadoop 2.8 ones, or, preferably, Hadoop 3.2 or later, with the consistent dependencies.
set `spark.hadoop.fs.s3a.experimental.fadvise" to random.
If you still see problems, see if you can replicate them on hadoop 3.3.x, and if so: file a bug.
(advice correct of 2021-03-9; the longer it stays in SO unedited, the less it should be believed)

Shutdown an ec2 instance after finishing computing

I’m coding a chess engine for a project at school and i need more computing power than my pc can offer.
So i turned to AWS and especially EC2. I want to test different algorithms.
I know how to start the instance and how to begin the computing on the instance but as soon as the computing is finished, I would like to send automatically the data files on s3 (i know the command but not how to automatically execute it) and shutdown the instance to avoid paying for nothing.
Thank you for your help,
You mention your script is in python, in that case probably you could execute it with AWS Lambda and configure rules in CloudWatch Events Rules to define the time to run the script. Consider the maximum time that Lambda allows is 15 minutes so if it is not suitable for your case you can check ECS Fargate Scheduled Tasks to run the process at the defined time and then delete the container.
As pointed out by samtoddler on Jan 11, 2021 :
a call to os.shutdown() after the computing provided what I wanted !
Thank you

Can I run 1000 AWS micro instances simultaneously?

I have a simple pure C program, that takes an integer as input, runs for a while (let's say an hour) and then returns me a text file. I want to run this program 1000 times with input integers from 1 to 1000.
Currently I'm running this program in parallel (4 processors), what takes 250 hours. The program is such that it fits in a AWS micro instance (I've tested it). Would it be possible to use 1000 micro instances in AWS to do the whole job in one hour? (at a cost of ~20$ - $0.02/instance)?
If it is possible, does anybody have some guidlines on how to do that?
If it is not, does anybody have an also low-budget alternative to that?
Thanks a lot!
In order to achieve this, you will need to:
Create a S3 bucket to store your bootstrapping scripts, application, input data and output data
Custom Lightweight AMI: you might want to create a custom lightweight AMI which knows about downloading the bootstrapping script
Bootstrapping Script: will download your software from your S3 bucket, parse a custom instance tag which will contain the integer [1..1000] and download any additional data.
Your application: does the processing stuff.
End of processing script is which uploads the result to a another result S3 bucket and terminates the instance, you might also want to send a SNS notification to communicate the end of processing status.
If you need result consolidation you might want to create another instance and use it as a coordinator, waiting for all "end of processing" notifications in order to finish the processing. In this case you might consider using in the future the Amazon's Hadoop map reduce engine, since it will do almost all this heavy lifting for free.
You didn't specify what language you'd like to do this in and since the app you use to deploy your program doesn't have to be in the same language I would recommend C#. There are a number of examples around the web on how to programmatically spawn new Amazon Instances using the SDK. For example:
How to start an Amazon EC2 instance programmatically in .NET
You can create your own AMIs with the program already present but that might end up being a pain if you want to make adjustments to it since it will require you to recreate the entire AMI. I'd recommend creating an extra instance or simply hosting the program in a location which is accessible from a public URL. Then I would create some kind of service which would be installed on the AMI ahead of time to allow me to specify the URL to download the app from along with whatever command line parameters I wanted for that particular instance.
Hope this helps.
Be carefull by default you can only spawn 20 instance by zone. You need to ask amazon in order to use more instances.
If you want low cost but don't care about delay you should use spot instances.