Amazon Web Services (AWS) EC2 , EMR, S3 [closed] - amazon-web-services

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I have been learning aws for quite sometime. I would like to confirm the overall picture of what I have learned so far : I take a normal PC as an analogy to this :
**EC2 similar to arithmetic and logical unit of PC
EMR similar to the OS of PC
S3 similar to the hard-disk of PC**
Please correct me if am wrong and explain me the AWS EC2,EMR,S3 with comparison to another system/service etc.
(Please dont direct to amazon doc links/tutorials as I have crossed all those and I want to confirm my understanding)
Thanks in advance

I think your analogies are reasonable from a 10,000 foot view. However, I wouldn't say they are correct since there are a lot of subtleties involved. Let me list a few.
EC2 does handle compute side of your application hence it does have a similar role to an ALU has in a microprocessor. However, two major differences.
a) EC2 is not like the ALU because EC2 consists of the ability to launch/terminate new compute resources. An ALU by definition is a fixed compute entity while EC2 by definition is a system for provisioning compute resources. Very different.
b) EC2 is not stateless but an ALU is. EC2-provided instances have disk, memory, etc. Thus they can carry the entire state of application. S3 is not a required component. In a computer, ALU by itself isn't useful you additional memory is required.
EMR to OS. EMR is really just Hadoop. Hadoop is a task distribution platform. EMR is like an OS in that it does task scheduling. However, a major part of an OS is doing arbitration between different app threads. Whereas, Hadoop is about taking a big data problem and running it in a distributed fashion across many computers. It does no resource arbitration and works on one problem at a time. Thus, its not really like an OS. Apache Yarn to me is closer to an OS btw.
Your S3 analogy is also partially correct. AWS has many types of storage. There is Ephermal storage which is like memory and goes away when an instance dies. There is EBS volumes that are permanent disks attached to instances (or sitting idle) with data on them. S3 is the third type of storage which is like having a web storage. You can upload files to S3 and access them. S3 is very much like a remote disk. To complete, AWS also has Glacier which is archival storage which is even more distant than S3.
Hope this helps.

Related

Are there AWS services to facilitate an on-premise DR environment? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 days ago.
Improve this question
I have a customer who already has a Production workload running in AWS, and they are exploring DR options. They have a bit of on-prem infrastructure available and were wondering if they could use their on-prem data center as their DR environment for AWS. In searching through the docs, or Googling a solution, everything that is returned is the more typical use case of running production on-prem and DR in AWS. Has anyone seen this reverse setup, and if so, what was the approach?
I would recommend against this approach.
Due to the elastic nature of cloud compute, it is possible to deploy Disaster Recovery systems in the cloud when required, without having them running all the time. This is a different approach to traditional on-premises DR where equipment is sitting unused "just in case" it is required.
This also means that a DR system can be deployed in the cloud that is identical to Production, whereas attempting to deploy on "a bit of on-prem infrastructure" would be quite complex because it would not be identical to Production.
If your customer deployed to AWS using "Infrastructure as a Service" (eg using CloudFormation or Terraform), then deploying the DR system would be as simple as running a script.
An alternative approach to Disaster Recovery is not to fail-over, but instead always run systems in parallel. For example, instead of two web servers running in one Availability Zone, run one web server in each of two different Availability Zones. A Load Balancer would be able to direct traffic to both web servers. If one Availability Zone was to experience a disaster, the web server and load balancer running in the other Availability Zone would continue to operate. This approach is "High Availability" as opposed to Disaster Recovery. Under such an architecture, the system can keep operating even when suffering failures, as opposed to having to "fail-over" to an alternative site. Plus, it avoids having to "fail-back" to the original site, which is typically the hardest part of it all.
An analogy: High Availability is a bit like having two small trucks instead of one large truck -- work can continue even if one truck fails, and another truck can be 'launched' relatively quickly. In contrast, Disaster Recovery would react to a broken truck by pulling a horse out of the stable.
For more information, see: Disaster recovery options in the cloud - Disaster Recovery of Workloads on AWS: Recovery in the Cloud

AWS best way to handle high volume transactions [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am writing a system that has extremely high volume of transactions, CRUD and I am working with AWS. What are the considerations that I must keep in mind given that none of the data should be lost?
I have done some research and they say to use SQS queues to make sure that data is not lost. What other backup, redundancy, quick processing considerations should I keep in mind?
So if you want to create a system that is highly resilient, whilst also being redundant I would advise you to take a read of the AWS Well Architected Framework. This will go into more detail that a person can provide on stack overflow.
Regarding individual technologies:
If you're transactional like you said, then you should look at using a relational data store for storing the data. I'd recommend taking a look at Amazon Aurora, it has built in features like auto scaling of read onlys and multi master support. Whilst you might be expecting large number, by using autoscaling you will only pay for what you use.
Try to decouple your APIs, have a dumb validation layer before handing off to your backend if you can help it. Technologies like SQS (as you mentioned before) help with decoupling when you combine with Lambda.
SQS guarantees at least once, so if your system should not write duplicates you'll want to account for idempotency in your application.
Also use a dead letter queue (DLQ) to handle any failed actions.
Ensure any resources residing in your VPC are spread across availability zones.
Use S3, EC2 Backup Manager and RDS snapshots to ensure data is backed up. Most other services has some sort of backup functionality you can enable.
Use autoscaling wherever possible to ensure you're reducing costs.
Build any infrastructure using an IaC tool (CloudFormation or Terraform), and any provisioning of resources via a tool like (Ansible, Puppet, Chef). Try to follow a pre baked AMI workflow to ensure that it is quick to return to the base server state.

Data Engineering - infrastructure/services for efficient extraction of data (AWS) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Let's assume the standard data engineering problem:
every day at 3.00 AM connect to an API
download data
store them in a data lake
Let's say there is a python script that does the API hit and storage, but that is not that important.
Ideally I would like to have some service that comes alive, runs this script and kills itself... So far, I thought about those possibilities (using AWS services):
(AWS) Lambda - FaaS, ideal match for the usecase. But there is a problem: bandwith of the function (limited RAM/CPU) and timeout of 5 mins.
(AWS) Lambda + Step Functions + range requests: fire multiple Lambdas in parallel, each downloading a part of the file. Coordination via Step Functions. It solves the issue of 1) but it feels very complicated.
(AWS EC2) Static VM: classic approach: I have a VM, I have a python interpreter, I have a cron -> every night I run the script. Or every night, I can trigger a build of new EC2 machine using CloudFormation, run the script and then kill it. Problems: feels very old-school - like there has to be a better way to do it.
(AWS ECS) Docker: I have very little experience with docker. Probably similar to the VM case, but feels more versatile/controllable. I don't know if there is a good orchestrator for this kind of job and how easy it is (firing docker and killing it)
How I see it:
Exactly what I would like to have, but it is not good for downloading big data because of the resource constrains.
Complicated workaround for 1)
Feels very oldschool, additional devops expenses
Don't know a lot about this topic, feels like the current state-of-art
My question is: what is the current state-of-art for this kind of job? What services are useful and what are the experiences with them?
A variation on #3... Launch a Linux Amazon EC2 instance with a User Data script, with Shutdown Behavior set to Terminate.
The User Data script performs the download and copies the data to Amazon S3. It then executes sudo shutdown -h to turn off the instance. (Or, if the script is complex, the User Data script can download a program from an S3 bucket, then execute it.)
Linux EC2 instances are now charged per-second, so think of it like a larger version of Lambda that has more disk space and does not have a 5-minute limit.
There is no need to use CloudFormation to launch the instance because then you'd just need to delete the CloudFormation stack. Instead, just launch the instance directly with the necessary parameters. You could even create a Launch Template with the parameters and then simply launch an instance using the Launch Template.
You could even add a few smarts to the process and launch the instance using Spot Pricing (set the bid price to normal On-Demand pricing, since worst case you'll just pay the normal price). If the Spot Instance won't launch due to insufficient spare capacity, then launch an On-Demand instance instead.

Moving wordpress to Amazon Web Services [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm planning to move my Website which is using multiple wordpress to Amazon Services. However, my daily vistors are about 22,000 and sometimes goes to over 90k and the web crashes! However, the hosting company charge me nearly $100 including support without support it would cost $50. the average bandwidth is about 400GB.
Can I ask please how much will it cost me? and how I can start with Amazon Services?
Kind regards
Start out by looking at the different types of hosting that Amazon offers and which one will be the correct fit for your site. Amazon's EC2 (Elastic Cloud Computing) is the servers that you can get hosted in the cloud.
Depending on how much storage space and bandwidth, the costs differ. They have a helpful cost guide on their EC2 page. They offer different pricing for the different types of servers you need. They have on demand spot instances which can be brought up and down on the fly. If you need a server to be running constantly you can put a down payment and have a reserved instance to provide the server.
You can calculate your fees depending on your current usage from the tools AWS provides. http://calculator.s3.amazonaws.com/calc5.html
This is also a good article for getting started with using WordPress under AWS.
http://wp.tutsplus.com/tutorials/scaling-caching/deploy-your-wordpress-blog-to-the-cloud/
AWS also provides a Free Tier of services provided you stay under a certain amount of usage. That is detailed at http://aws.amazon.com/free/ . I also found this YouTube video on setting up EC2 instances very helpful. http://www.youtube.com/watch?v=JPFoDnjR8e8 . From what I understand, unless your WordPress install gets a crazy number of hits you will probably fall under the Free Tier.

Is Amazon Web Services reasonably priced for a personal server? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I currently have a Linux, Apache, MySQL, PHP, Postfix web server that I setup on a spare computer at home that I am exploring transferring to Amazon Web Services. It's about as simple of a personal web server as it gets, I mainly use it for personal experimentation for PHP development, I have a blog, it hosts my e-mail, plus I do some C++ development on the server and run some small executable and networked personal applications.
The only traffic the server really sees is me (on a daily basis), plus some web crawlers, and the occasional hit from a Google search.
Is it reasonable to transfer my server to Amazon Web Services? Or is Amazon Web Services specifically targeted to larger scale servers? What's about the cheapest cost I can expect to pay for this hosting?
I tried using the AWS Simple Monthly Calculator but had a hard time estimating the numbers. Perhaps someone is doing something similar to my plans, and can inform me of what they are paying.
One of the reasons I am interested in AWS, is I am contemplating using my website as cloud storage for a mobile application I am working on, and if that application takes off quickly, I would like to be able to quickly scale to the traffic.
If you need a simple setup, it is sufficient to use a t1.micro instance. The monthly price for such an instance (depending on the location of the server) is about 15 US$. If you plan to run your server for a longer time, consider using reserved instances. You pay a one-time fee and get reduced hourly prices afterwards. If you run your server all the time, you should use a "High Utilization" instance. I think you won't get a lot of traffic and EBS requests, so I would focus on the main part regarding costs which is the EC2 instance hours.
Here is a basic example calculation with the above setup as a start. This calculation does not include a 1-year-free trial that Amazon offers.
If you need to scale, then you have a lot of options available. You can launch bigger instances if you need it. Have a look at the instance types page to get an overview (also includes details on the Micro instance). If scaling and possible upgrades are a main factor in your decision, then you should consider AWS.
Is it reasonable to transfer my server to Amazon Web Services
I think yes. Amazon has list of Linux versions where you can fetch free server with no payment. Bear in mind that, for example, for free server DB you can't connect to your DB from external IP (aka from external DB tool). But port redirection will work.
Usually I use Amazon for demo versions ( < 10k users). But its work great.