My job is to move our existing java calculation (servlet as WAR file) from our own server to AWS. This is a calculation without user interface or database. Other companies should be able to call the calculation in their programs. The servlet takes a post request with Json payload and the response sends Json payload back to client after the calculation is performed. The calculation is relatively heavy and therefore time-consuming (1-2 sec.).
I have decided to use AWS Elastic Beanstalk for the cloud computing but I'm in doubt as to what EB Environment to use - Server or Worker environment? and if I should use AWS API Gateway in front of EB?
Hopefully somebody can clarify this for me.
Worker environment produces an SQS queue where you submit your jobs into. To enable access to it from outside of AWS you would have to front it with API Gateway (preferred way).
However, the worker environment works in asynchronous way. It does not return job results to the caller. You would need to have some other mechanism for your clients to get the results back, e.g. though different API call.
An alternative is web environment where the clients get the response back directly from your json processing application. 1-2 seconds is not that long of a wait for an HTTP request.
For more complex solution based on EB, one could look at Creating links between Elastic Beanstalk environments. You would have a front-end environment for your clients linked with worker environment that does the json job processing.
The other way would be to rewrite the app into lambda, if possible of course. Lambda seems as a good fit for 1-2 seconds processing tasks.
Related
Currently, I have a working modded Minecraft server working running on a C5 EC2 instance. The problem is that I have to manually start and stop the server which can be annoying for my friends. I was wondering if it would be possible to automate the EC2 state so that it runs as soon as a player attempts to join the sever. This would be similar to how Minecraft realms behaves which I heard Mojang was using AWS for:
https://aws.amazon.com/blogs/aws/hosting-minecraft-realms-on-aws/
I have looked up tutorials for this and this is the best I could come across:
https://github.com/trevor-laher/OnDemandMinecraft
The problem with this solution is that it requires to make a separate website to log users in and start the EC2 instance while I want the startup and shutdown to be completely automatic.
I would appreciate any guidance.
If the server is off, it would not be possible to "connect" to the server. Therefore, another mechanism is required that can be used to start the server.
Combine that with your desire to minimise costs and the only real solution is to somehow trigger an AWS Lambda function, which could start the server.
There are a few ways you could have users trigger the AWS Lambda function:
Make a call to API Gateway
Upload an object to Amazon S3
Somehow put a message in an SNS topic or an SQS queue
Trigger an Amazon CloudWatch Alarm (which calls Lambda via SNS)
...and probably other ways
When considering a method to use, you should consider security implications such as:
Whether only authorized users should be able to trigger the Lambda function, or is it okay that anybody (eg a web crawler) might trigger it.
Whether you willing to give your friends AWS credentials (not a good idea) that they could use to start the server directly, or whether it should be an indirect method.
Frankly, I would recommend the following architecture:
Create an AWS Lambda function that turns on the server
Create an API Gateway that triggers the Lambda function
Give a URL to your friends that calls the API Gateway and passes a 'secret' (effectively a password)
The API Gateway will call the Lambda function, passing the secret
The Lambda function confirms that the secret is correct and starts the Amazon EC2 instance with Minecraft installed
Here is a tutorial that shows many of these concepts: Build an API Gateway API with Lambda Integration
The purpose of the secret is to avoid the server from starting if an unauthorized person (or a bot) happens to hit the API Gateway endpoint. They will not provide the secret, so the server will not be started.
Stopping the server after a period of non-use is a different matter. The library you referenced might be able to assist with finding a way to do this. You could have a script running on the Minecraft server that monitors the game and, after a period of inactivity, simply calls the operating system to perform a Shutdown.
You could use a BungeeCord hub server that then allows user to begin a connection to the main server and spin it up via. AWS.
This would require the bungee server to be always up however, but the cost of hosting a small bungee server should be relatively cheap.
I don't think there's any way you could do this without having a small server that receives the initial request to spin up the AWS machine.
I am looking for a service or framework in Native AWS which given, a csv file, creates a task and process that task asynchronously and returns a task id or job id to the client and notifies the client when the task is completed. Some requirements for this:
Client should be able to check the progress of the task by job id at any time.
Processing of entire task can take more than 15 mins.
There should be a way for clients to see the reasons of failures.
All the business logic would be at line item level. (this is the only thing developer should care about)
Is there any in-built service or framework for that in Native AWS? I know one can build this kind of service using some SQS, Lambda, SNS, Dynamodb but I am just looking if there is a already available AWS offering for it, which can do all of these?
The closest service to this concept is AWS Step Functions.
However, it would just be one component of a solution. You would still need to create the compute component by using Amazon EC2 or AWS Lambda. You would need to build the interface for users, add authentication, notifications, etc.
Bottom line: There is no AWS service that does what you describe. However, there are the building blocks if you wish to create one yourself.
I have a system that consists of one central server, many mobile clients and many worker server. Each worker server has its own database and may be on the customer infrastructure (when he purchases the on premise installation).
In my current design, mobile clients send updates to the central server, which updates its database. The worker servers periodically pull the central to get updated information. This "pull model" creates a lot of requests and is still not suficient, because workers often use outdated information.
I want a "push model", where the central server can "post" updates to "somewhere", which persist the last version of the data. Then workers can "subscribe" to this "somewhere" and be always up-to-date.
The main problems are:
A worker server may be offline when an update happen. When it come back online, it should receive the updates it lost.
A new worker server may be created and need to get the updated data, even the data that was posted before it exists.
A bonus point:
Not needing to manage this "somewhere" myself. My application is deployed at AWS, so if there's any combination of services I can use to achieve that would be great. Everything I found has limited time data retention.
The problems with a push model are:
If clients are offline, the central system would need a retry method, which would generate many more requests than a push model
The clients might be behind firewalls, so cannot receive the message
It is not scalable
A pull model is much more efficient:
Clients should retrieve the latest data when the start, and also at regular intervals
New clients simply connect to the central server -- no need to update the central server with a list of clients (depending upon your security needs)
It is much more scalable
There are several options for serving traffic to pull requests:
Via an API call, powered by AWS API Gateway. You would then need an AWS Lambda function or a web server to handle the request.
Directly from DynamoDB (but the clients would require access credentials)
From an Amazon S3 bucket
Using an S3 bucket has many advantages: Highly scalable, a good range of security options (public; via credentials; via pre-signed URLs), no servers required.
Simply put the data in an S3 bucket and have the clients "pull" the data. You could have one set of files for "every" client, and a specific file for each individual client, thereby enabling individual configuration. Just think of S3 as a very large key-value datastore.
I'm hoping to use Amazon Lambda to run some background tasks for my web app. These particular tasks will only need to run once for the app (not once per user), so I'd like any user to see in the UI if a task is already running, and I'd like to disable the UI that allows them to start that task again.
Does Lambda offer a way to check the status of a function to see if it is running? If not, what is the best way to persist this info to my web app? Am I taking the wrong approach here altogether?
Lambda functions are supposed to be stateless and keeping functions stateless enables AWS Lambda to rapidly launch as many copies of the function as needed to scale to the rate of incoming events. While AWS Lambda’s programming model is stateless, your code can access stateful data by calling other web services, such as Amazon S3 or Amazon DynamoDB.
If I create a function to get webpages. Will it execute it on different IP per execution so that my scraping requests dont get blocked?
I would use this AWS pipeline:
Where at source on the left you will have an EC2 instance with JAUNT which then feeds the URLS or HTML pages into a Kinesis Stream. The Lambda will do your HTML parsing and via Firehose stuff everything into S3 or Redshift.
The JAUNT can run via a standard WebProxy service with a rotating IP.
Yes, lambda by default executes with random IPs. You can trigger it using things like event bridge so you can have a schedule to execute the script every hour or similar. Others can possibly recommend using API Gateway, however, it is highly insecure to expose API endpoints available for anyone to trigger. So you have to write additional logic to protect it either by hard coded headers or say oauth.
AWS Lambda doesn't have a fixed IP source as mentioned here
however, I guess this will happen when it gets cooled down, not during the same invocation.
Your internal private IP address isn't what is seen by the web server, it's the public ip address from AWS. As long as you are setting the headers and signaling to the webserver that your application is a bot. See this link for some best practices on creating a web scrapper:
https://www.blog.datahut.co/post/web-scraping-best-practices-tips
Just be respectful of how much data you pull and how often is the main thing in my opinion.
Lambda is triggered when a file is placed in S3, or data is being added to Kinesis or DynamoDB. That is often backwards of what a web scraper needs, though certainly things like S3 could perform as a queue/job runner.
Scraping on different IPs? Certainly lambda is deployed on many machines, though that doesn't actually help you, since you can't control the machines or their IP.