AWS: Instances and Reliability - amazon-web-services

Short of creating ginormous instances, is there any way to either force instances to run on separate physical machines or detect how many physical machines are being used by multiple instances of the same image on Amazon Web Services (AWS)?
I'm thinking about reliability here. If I fool myself into thinking that I have three independent servers for fault tolerance purposes (think Paxos, Quicksilver, ZooKeeper, etc.) because I have three different instances running, but all three end up running on the same physical machine, I could be in for a very, very rude surprise.
I realize the issue may be forced by using separate regions, but it would be nice to know if there is an intra-region or even intra-availability-zone solution, as I'm not sure I've ever seen AWS to actually give me more than one availability-zone choice in the supposedly multi-choice pulldown menu when creating an instance.
OK, I appreciate the advice from the first two to answer my question, but I was trying to simplify the problem without writing a novel by positing 3 machines in one region. Let me try again - as I scale a hypothetical app stack up/outward, I'm going to both statically and dynamically ("elastically") add instances. Of course, any manner of failure/disaster can happen (including an entire data center burning to the ground due to, say, an unfortunate breakroom accident involving a microwave, a CD, and two idiots saying "oh yeah? well watch this!!!"), but by far the most likely is a hard machine failure of some sort, followed not too far behind by a dead port. Running multiple instances of the same type T on a single piece of virtualized hardware adds computation power, but not fault tolerance. Obviously, if I'm scaling up/out, I'm most likely going to be using "larger" instances. Obviously, if AWS' largest machine has a memory size M and a number of processors C, if I choose an instance with memory size m such that m > (M/2) or with a number of CPU size c such that c > (C/2), then I will guarantee my instances run on separate machines. However, I don't know what Mmax and Cmax are today; I certainly don't know what they will be a year from now, or two years from now, and so on, as Amazon buys Bigger Better Faster. I know this sounds like nitpicking and belaboring the point, but not knowing how instances are distributed or if there is a mechanism to control instance distribution means I can make genuine mistakes in assumptions either in calculating the effecting F+1 or 2F+1 using current distributed computing algorithms or evaluating new algorithms for use in new applications, sharding and locality decisions, minimum reserved vs. elastic instance counts for portions of the appstack that see less traffic, etc.

You always have at least two availability zones per region, and that should work for high availability scenarios. Intra-az would not go very far on reliability, as a whole az may go down (unlikely, but possible).
If you absolutely must force "intra-az separate hardware", dedicated instances in different accounts would achieve that, but would cost more and would not be much better.

Not only are there multiple availability zones (think separate data centers), within each region, you can also have servers split up into different regions (west coast, east coast, Europe etc).
As far as redundancy and reliability is concerned, your much better off spreading your work across AZ's and regions, then trying to figure out or ensure that instances within a single AZ are on the same piece of hardware.

Related

What are some possible causes for CPU Spikes in AWS that are resolved by a new deployment?

I have several enterprise applications in AWS ECS built in Java that begin to incur high CPU usage - I mean a large spike - and it continues until I re-release. The code version can be exactly the same, but for some reason, by releasing and renewing the resources, the CPU goes back down to a sane level.
What are some possible causes of a scenario like this? There is no memory spike to correlate the CPU spikes with. It is not a code issue because different applications with completely different code bases experience the same symptoms. But obviously not all applications have this issue, so it is a tangible, solvable problem (not a "bug" in AWS).
What techniques can I use to diagnose and once and for all fix this wide spread problem in our services? If it involves a CPU dump, what would one look for, and what indications would I expect to see that would point to specific problems?

several t2.micro better than a single t2.small or t2.medium

I read EC2's docs: instance types, pricing, FAQ, burstable performance and also this about CPU credits.
I even asked the following AWS support and the answer wasn't clear.
The thing is, according to the docs (although not too clear) and AWS support, all 3 instance types have the same performance while bursting, it's 100% usage of a certain type of CPU core.
So this is my thought process. Assuming t2.micro's RAM is enough and that the software can scale horizontally. Having 2 t2.micro has the same cost as 1 t2.small, assuming the load distributed equally between them (probably via AWS LB) they will use the same amount of total CPU and consume the same amount of CPU credits. If they were to fall back to baseline performance, it would be the same.
BUT, while they are bursting, 2 t2.micro can achieve x2 the performance of a t2.small (again, for the same cost). Same concept applies to t2.medium. Also using smaller instances allows for tigther auto (or manual) scaling which allows one to save money.
So my question is, given RAM and horizontal scale is not a problem, why would one use other than t2.micro.
EDIT: After some replies, here are a few notes about them:
I asked on AWS support and supposedly, each vCPU of the t2.medium can achieve 50% of "the full core". This means the same thing I said applies to t2.medium (if what they said was correct).
T2.micro instances CAN be used on production. Depending on the technology and implementation, a single instance can handle over 400 RPS. I do, and so does this guy.
They do require a closer look to make sure credits don't go low, but I don't accept that as a reason not to use them.
Your analysis seems correct.
While the processor type isn't clearly documented, I typically see my t2.micro instances equipped with one Intel Xeon E5-2670 v2 (Ivy Bridge) core, and my t2.medium instances have two of them.
The micro and small should indeed have the same burst performance for as long as they have a reasonable number of CPU credits remaining. I say "a reasonable number" because the performance is documented to degrade gracefully over a 15 minute window, rather than dropping off sharply like the t1.micro does.
Everything about the three classes (except the core, in micro vs small) multiplies by two as you step up: baseline, credits earned per hour, and credit cap. Arguably, the medium is very closely equivalent to two smalls when it comes to short term burst performance (with its two cores) but then again, that's also exactly the capability that you have with two micros, as you point out. If memory is not a concern, and traffic is appropriately bursty, your analysis is sensible.
While the t1 class was almost completely unsuited to a production environment, the same thing is not true of the t2 class. They are worlds apart.
If your code is tight and efficient with memory, and your workload is appropriate for the cpu credit-based model, then I concur with your analysis about the excellent value a t2.micro represents.
Of course, that's a huge "if." However, I have systems in my networks that fit this model perfectly -- their memory is allocated almost entirely at startup and their load is relatively light but significantly variable over the course of a day. As long as you don't approach exhaustion of your credit balances, there's nothing I see wrong with this approach.
There is a lot's of moving targets here. What are your instances are doing?
You said the traffic varies over the day but not spiky. So if you wish to "Closely follow" the load with a small amount of t2.micro instances, you won't be able to use too much bursting, because at each upscaling you will have a low CPU credits. So if most of your instances are running only when they are under load, they will never collect CPU credits.
Also you loose time and money with each startup time and the unused but started usage hours, so doing a too frequent up/down scaling isn't the most cost efficient.
Last but not least, the operating system, other softwares has more or less a fix overhead, running it 2 times instead of one, may takes more resources away from your application in a system, where you gets CPU credits only under 20% of load.
If you want extreme cost efficiency, use spot instances.
The credit balance assigned to each instance varies. So while two micros could provide double the performance of small during burst, it will only be able to do so for half as long.
I generally prefer at least two instances for availability purposes. But with the burstable model, workload also comes into consideration. Are you looking at sustained load? or are you expecting random spikes throughout the day?

Erlang concurrency/distribution - then vs. now

As to the question of how many nodes can be in an erlang system on a practical (not theoretical) level, I've seen answers ranging from 100 in most cases, to one answer which stated "150-200 max."
I was surprised on seeing this, because wasn't erlang designed for massive concurrency and distribution in order to implement telecom networks, phone switches, etc? If so, wouldn't you assume (I know I did) that this would entail more than 100 nodes in a system (I always assumed in the hundreds, possibly thousands)?
I guess my question is: What was considered "massive concurrency/distribution" back when these old telecoms used erlang? How many machines would they typically have connected together, running erlang and doing concurrency?
Just curious, and thanks for any answers.
you got the answer, for a cluster of node, with current technology, a practical limit is from 100 to 200 nodes: because we are speaking of almost transparent distribution. The reason for this limitation are explained in the documentation and in few words are due to the mutual survey of each nodes, so the bandwidth and resources available for your application are decreasing faster and faster.
To have more nodes, you must program the cooperation between cluster and/or single nodes. The libraries offer some facilities to do that but of course it is not transparent, and not erlang specific.
It is also recommended for security reason to avoid huge cluster: today in an erlang cluster you can do what you want in any other node without restriction.
It depends. It depends on lots of things you didn't specify or define, and I suspect that if you specified enough that a "real" answer was possible you would be disappointed because it wouldn't be useful. That's why these sorts of questions are generally discouraged.
You don't say what date range you mean by "when these old telecoms used Erlang". They still use it (it's never had traction outside of Ericsson and there was never a time when Ericsson used it significantly more than the present). Here's a video of them talking about using Erlang on their SGSN-MME: http://vimeo.com/44718243
You don't say what you mean by "an Erlang system". Is that a single machine? Erlang did not have SMP support when it started (is that the time frame you're asking about?). Do you mean concurrent processes?
Is that a single cluster using net_kernel:connect_node/1? How are you defining a cluster? Erlang clusters, by default, are a complete mesh. That limits the maximum size based on the performance limits of the network and the machine's interfaces. But you can connect nodes in a chain and then there's no limit. But if you count that as a cluster, why not count it when you use your own TCP connections instead of just net_kernel's. There are lots of Ericsson routers in use on the Internet, so we could think of the Internet as one "system" where many of its component routers are using Erlang.
In the video I linked, you can see that in the early 2000s, Ericsson's SGSN product was a single box (containing multiple machines) that could serve maybe a few thousand mobile phones simultaneously. We might assume that each connected phone had one Erlang process managing it, plus a negligible number of system processes.

Difference between Fork/Join and Map/Reduce

What is the key difference between Fork/Join and Map/Reduce?
Do they differ in the kind of decomposition and distribution (data vs. computation)?
One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.
F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.
M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.
There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.
The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.
What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory,
single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.
But there is a lot more to read there if you're up for it.

What challenges promote the use of parallel/concurrent architectures?

I am quite excited by the possibility of using languages which have parallelism / concurrency built in, such as stackless python and erlang, and have a firm belief that we'll all have to move in that direction before too long - or will want to because it will be a good/easy way to get to scalability and performance.
However, I am so used to thinking about solutions in a linear/serial/OOP/functional way that I am struggling to cast any of my domain problems in a way that merits using concurrency. I suspect I just need to unlearn a lot, but I thought I would ask the following:
Have you implemented anything reasonably large in stackless or erlang or other?
Why was it a good choice? Was it a good choice? Would you do it again?
What characteristics of your problem meant that concurrent/parallel was right?
Did you re-cast an exising problem to take advantage of concurrency/parallelism? and
if so, how?
Anyone any experience they are willing to share?
in the past when desktop machines had a single CPU, parallelization only applied to "special" parallel hardware. But these days desktops have usually from 2 to 8 cores, so now the parallel hardware is the standard. That's a big difference and therefore it is not just about which problems suggest parallelism, but also how to apply parallelism to a wider set of problems than before.
In order to be take advantage of parallelism, you usually need to recast your problem in some ways. Parallelism changes the playground in many ways:
You get the data coherence and locking problems. So you need to try to organize your problem so that you have semi-independent data structures which can be handled by different threads, processes and computation nodes.
Parallelism can also introduce nondeterminism into your computation, if the relative order in which the parallel components do their jobs affects the results. You may need to protect against that, and define a parallel version of your algorithm which is robust against different scheduling orders.
When you transcend intra-motherboard parallelism and get into networked / cluster / grid computing, you also get the issues of network bandwidth, network going down, and the proper management of failing computational nodes. You may need to modify your problem so that it becomes easier to handle the situations where part of the computation gets lost when a network node goes down.
Before we had operating systems people building applications would sit down and discuss things like:
how will we store data on disks
what file system structure will we use
what hardware will our application work with
etc, etc
Operating systems emerged from collections of 'developer libraries'.
The beauty of an operating system is that your UNWRITTEN software has certain characteristics, it can:
talk to permanent storage
talk to the network
run in a command line
be used in batch
talk to a GUI
etc, etc
Once you have shifted to an operating system - you don't go back to the status quo ante...
Erlang/OTP (ie not Erlang) is an application system - it runs on two or more computers.
The beauty of an APPLICATION SYSTEM is that your UNWRITTEN software has certain characteristics, it can:
fail over between two machines
work in a cluster
etc, etc...
Guess what, once you have shifted to an Application System - you don't go back neither...
You don't have to use Erlang/OTP, Google have a good Application System in their app engine, so don't get hung up about the language syntax.
There may well be good business reasons to build on the Erlang/OTP stack not the Google App Engine - the biz dev guys in your firm will make that call for you.
The problems will stay almost the same inf future, but the underlying hardware for the realization is changing. To use this, the way of compunication between objects (components, processes, services, how ever you call it) will change. Messages will be sent asynchronously without waiting for a direct response. Instead after a job is done the process will call the sender back with the answer. It's like people working together.
I'm currently designing a lightweighted event-driven architecture based on Erlang/OTP. It's called Tideland EAS. I'm describing the ideas and principles here: http://code.google.com/p/tideland-eas/wiki/IdeasAndPrinciples. It's not ready, but maybe you'll understand what I mean.
mue
Erlang makes you think of the problem in parallel. You won't forget it one second. After a while you adapt. Not a big problem. Except the solution become parallel in every little corner. All other languages you have to tweak. To be concurrent. And that doesn't feel natural. Then you end up hating your solution. Not fun.
The biggest advantages Erlang have is that it got no global garbage collect. It will never take a break. That is kind of important, when you have 10000 page views a second.