AWS Lambda - Work with services outside Amazon services? - amazon-web-services

After reading about AWS Lambda I've taken a quite interest in it. Although there is one thing that I couldn't really find any info on. So my question is, is it possible to have lambda work outside Amazon services? Say if I have a database from some other provider, would it be possible to perform operations on it through AWS Lambda?

Yes.
AWS Lambda functions run code just like any other application. So, you can write a Lambda function that calls any service on the Internet, just like any computer. Of course, you'd need access to the external service across the Internet and access permissions.
There are some limitations to Lambda functions, such as functions only running for a maximum of five minutes and only 500MB of local disk space.
So when should you use Lambda? Typically, it's when you wish some code to execute in response to some event, such as a file being uploaded to Amazon S3, data being received via Amazon Kinesis, or a skill being activated on an Alexa device. If this fits your use-case, go for it!

Related

How would I create a Minecraft EC2 server that automaticaly starts when someone tries to use it

Currently, I have a working modded Minecraft server working running on a C5 EC2 instance. The problem is that I have to manually start and stop the server which can be annoying for my friends. I was wondering if it would be possible to automate the EC2 state so that it runs as soon as a player attempts to join the sever. This would be similar to how Minecraft realms behaves which I heard Mojang was using AWS for:
https://aws.amazon.com/blogs/aws/hosting-minecraft-realms-on-aws/
I have looked up tutorials for this and this is the best I could come across:
https://github.com/trevor-laher/OnDemandMinecraft
The problem with this solution is that it requires to make a separate website to log users in and start the EC2 instance while I want the startup and shutdown to be completely automatic.
I would appreciate any guidance.
If the server is off, it would not be possible to "connect" to the server. Therefore, another mechanism is required that can be used to start the server.
Combine that with your desire to minimise costs and the only real solution is to somehow trigger an AWS Lambda function, which could start the server.
There are a few ways you could have users trigger the AWS Lambda function:
Make a call to API Gateway
Upload an object to Amazon S3
Somehow put a message in an SNS topic or an SQS queue
Trigger an Amazon CloudWatch Alarm (which calls Lambda via SNS)
...and probably other ways
When considering a method to use, you should consider security implications such as:
Whether only authorized users should be able to trigger the Lambda function, or is it okay that anybody (eg a web crawler) might trigger it.
Whether you willing to give your friends AWS credentials (not a good idea) that they could use to start the server directly, or whether it should be an indirect method.
Frankly, I would recommend the following architecture:
Create an AWS Lambda function that turns on the server
Create an API Gateway that triggers the Lambda function
Give a URL to your friends that calls the API Gateway and passes a 'secret' (effectively a password)
The API Gateway will call the Lambda function, passing the secret
The Lambda function confirms that the secret is correct and starts the Amazon EC2 instance with Minecraft installed
Here is a tutorial that shows many of these concepts: Build an API Gateway API with Lambda Integration
The purpose of the secret is to avoid the server from starting if an unauthorized person (or a bot) happens to hit the API Gateway endpoint. They will not provide the secret, so the server will not be started.
Stopping the server after a period of non-use is a different matter. The library you referenced might be able to assist with finding a way to do this. You could have a script running on the Minecraft server that monitors the game and, after a period of inactivity, simply calls the operating system to perform a Shutdown.
You could use a BungeeCord hub server that then allows user to begin a connection to the main server and spin it up via. AWS.
This would require the bungee server to be always up however, but the cost of hosting a small bungee server should be relatively cheap.
I don't think there's any way you could do this without having a small server that receives the initial request to spin up the AWS machine.

Idea and guidelines on end to end AWS solution

I want to build an end to end automated system which consists of the following steps:
Getting data from source to landing bucket AWS S3 using AWS Lambda
Running some transformation job using AWS Lambda and storing in processed bucket of AWS S3
Running Redshift copy command using AWS Lambda to push the transformed/processed data from AWS S3 to AWS Redshift
From the above points, I've completed pulling data, transforming data and running manual copy command from a Redshift using a SQL query tool.
Doubts:
I've heard AWS CloudWatch can be used to schedule/automate things but never worked on it. So, if I want to achieve the steps above in a streamlined fashion, how to go about it?
Should I use Lambda to trigger copy and insert statements? Or are there better AWS services to do the same?
Any other suggestion on other AWS Services and of the likes are most welcome.
Constraint: Want as many tasks as possible to be serverless (except for semantic layer, Redshift).
CloudWatch:
Your options here are either to use CloudWatch Alarms or Events.
With alarms, you can respond to any metric of your system (eg CPU utilization, Disk IOPS, count of Lambda invocations etc) when it crosses some threshold, and when this alarm is triggered, invoke a lambda function (or send SNS notification etc) to perform a task.
With events you can use either a cron expression or some AWS service event (eg EC2 instance state change, SNS notification etc) to then trigger another service (eg Lambda), so you could for example run some kind of clean-up operation via lambda on a regular schedule, or create a snapshot of an EBS volume when its instance is shut down.
Lambda itself is a very powerful tool, and should allow you to program a decent copy/insert function in a language you are familiar with. AWS has several GitHub repos with lots of examples too, see for example the serverless examples and many samples. There may be other services which could work for you in your specific case, but part of Lambda's power is its flexibility.

AWS application consistent snapshots of EC2 instances

I'm currently setting up a small Lambda to take snapshots of all the important volumes of our EC2 instances. To guarantee application consistency I need to trigger actions inside the instances: One to quiesce the application before the snapshot and one to wake it up again after the snapshot is done. So far I have no clue how to do this.
I've thought about using SNS or SQS to notify the instances about start and stop of the snapshot, but that has several problems:
I'll need to install (and develop) a custom listener inside the instances.
I'll not get feedback if the quiescing/wake-up is done.
So here's my question: How can I trigger an action inside an instance from an Lambda?
But maybe I'm approaching this from the wrong direction. Is there really no simple backup solution? I know azure has a snapshot based backup service that can do application consitent backups. Did I just miss an equivalent AWS service?
Edit 1:
Ok, it looks like the feature 'Run Command' of AWS Systems Manager is what I really need. It allows me to run scripts, Ansible playbooks and more inside an EC2 instance. When I've got a working solution I'll post the necessary steps.
You can trigger a Lambda function on demand:
Using AWS Lambda with Amazon API Gateway (On-Demand Over HTTPS)
You can invoke AWS Lambda functions over HTTPS. You can do this by
defining a custom REST API and endpoint using Amazon API Gateway, and
then mapping individual methods, such as GET and PUT, to specific
Lambda functions. Alternatively, you could add a special method named
ANY to map all supported methods (GET, POST, PATCH, DELETE) to your
Lambda function. When you send an HTTPS request to the API endpoint,
the Amazon API Gateway service invokes the corresponding Lambda
function. For more information about the ANY method, see Step 3:
Create a Simple Microservice using Lambda and API Gateway.

Accessing Large files stored in AWS s3 using AWS Lambda functions

I have more than 30GB file stored in s3,and I want to write an Lambda function which will access that file, parse it and then run some algorithm on the same.
I am not sure if my lambda function can take that big file and work on it as Max execution time for Lambda function is 300 sec(5 min).
I found AWS S3 feature regarding faster acceleration, but will it help?
Considering the scenario other than lambda function can any one suggest any other service to host my code as micro service and parse the file?
Thanks in Advance
It is totally based on the processing requirements and frequency of processing.
You can use Amazon EMR for parsing the file and run the algorithm, and based on the requirement you can terminate the cluster or keep it alive for frequent processing. https://aws.amazon.com/emr/getting-started/
You can try using Amazon Athena (Recently launched) service, that will help you for parsing and processing files stored in S3. The infrastructure need will be taken care by Amazon. http://docs.aws.amazon.com/athena/latest/ug/getting-started.html
For Complex Processing flow requirements, you can use combinations of AWS services like AWS DataPipeline - for managing the flow and AWS EMR or EC2 - to run the processing task.https://aws.amazon.com/datapipeline/
Hope this helps, thanks

Is amazon lambda suitable for web scraping?

If I create a function to get webpages. Will it execute it on different IP per execution so that my scraping requests dont get blocked?
I would use this AWS pipeline:
Where at source on the left you will have an EC2 instance with JAUNT which then feeds the URLS or HTML pages into a Kinesis Stream. The Lambda will do your HTML parsing and via Firehose stuff everything into S3 or Redshift.
The JAUNT can run via a standard WebProxy service with a rotating IP.
Yes, lambda by default executes with random IPs. You can trigger it using things like event bridge so you can have a schedule to execute the script every hour or similar. Others can possibly recommend using API Gateway, however, it is highly insecure to expose API endpoints available for anyone to trigger. So you have to write additional logic to protect it either by hard coded headers or say oauth.
AWS Lambda doesn't have a fixed IP source as mentioned here
however, I guess this will happen when it gets cooled down, not during the same invocation.
Your internal private IP address isn't what is seen by the web server, it's the public ip address from AWS. As long as you are setting the headers and signaling to the webserver that your application is a bot. See this link for some best practices on creating a web scrapper:
https://www.blog.datahut.co/post/web-scraping-best-practices-tips
Just be respectful of how much data you pull and how often is the main thing in my opinion.
Lambda is triggered when a file is placed in S3, or data is being added to Kinesis or DynamoDB. That is often backwards of what a web scraper needs, though certainly things like S3 could perform as a queue/job runner.
Scraping on different IPs? Certainly lambda is deployed on many machines, though that doesn't actually help you, since you can't control the machines or their IP.