I'm working on a project for a client who does message pre and post-processing at very high volumes. I'm trying to figure out a reliable configuration between two or three API servers that will push outgoing email messages to any of two or more instances of Postfix. This is for outbound only and it's preferred not to have a single point of failure.
I am a bit lost within the AWS ecosystem and all I know is we cannot use SES and the client is set up for high volume smtp with Amazon, so throttling is not an issue.
I've looked into ELB, HAProxy, and a few other things but the whole thing has now gotten muddy and I'm not sure if I'm just overthinking it now.
Any quick thoughts would be appreciated.
Thanks
Related
We have a REST API. Right now our /health makes an smoke test on every dependency we have (a database and a couple microservices) and then returns 200 if there are no errors.
The problem is that not all dependencies are mandatory for our application to work. So while a problem accessing the database can be critical, problems accessing some microservices will only affect a small portion of our app.
On top of that we have Amazon ELB. It doesn't seem right to tag our app as unhealty only because one dependency is unhealty. ELB should only try to recover the unhealty dependency and with that our app will be healty again.
Which leads to the question: what should we actually check in our health-check? because it looks like we shouldn't be checking for any dependency at all. On the other hand, it's actually realy helpful to know the status of our app accessing all its dependencies (e.g for troubleshooting problems), so is it common to use some other endpoint for that purpose (say /sanity or /diagnostics)?
Do not go overboard trying to check for every service, every dependency, etc. in your health check. Basically think of your health check as a Go / No Go test so that the load balancer knows if the service is running.
Load balancers will not recover failed instances. They will just take your service offline. Auto Scaling Groups can recover failed instances by creating new instances and terminating failed instances. CloudWatch can monitor your instances and report problems and cause events to happen (e.g. rebooting).
You can implement more comprehensive tests that run internal to your server and that chose a reporting / recovery path. Examples might include sending an SNS notification to your email or cell phone account, rebooting the server, etc.
Amazon has a number of services to help monitor, report and manage services. Look into CloudWatch for monitoring, SNS or SES for reporting, ASG for auto scaling, etc.
Think thru what type of fault tolerance, high availability and recovery strategy you need for your service. Then implement an approach that is simple enough so that the monitoring itself does not become a point of failure.
First of all my knowledge of ActiveMQ, AMQPS and AWS auto scaling is fairly limited and I have been handed over this task where I need to create a scalable broker architecture for messaging over AMQPS using ActiveMQ.
In my current architecture if have a single machine ActiveMQ broker where the messaging is happening over AMQP + SSL and as a need of the product there is a publisher subscriber authentication (TLS authentication) to ensure correct guys are talking to each other. That part is working fine.
Now the problem is that I need to scale the whole broker thing over AWS cloud with auto-scaling in my mind. Without auto-scaling, I assume I can create a master slave architecture using EC2 instances, but then adding more slaves will be more like a manual process than automatic.
I want to understand wether below two options can solve the purpose -
ELB + ActiveMQ nodes being auto scaled
Something like a Bitnami powered ActiveMQ AMI running with auto scaling enabled.
In first case where ELB is there, I understand that ELB terminates SSL which will fail my mutual authentication. Also I am not sure wether my Pub/Sub model will still work where different ActiveMQ instances are independently running with no shared DB as such. If yes, if anyone can offer a pointer or a reference material it will be a help as I am not able to find one by myself.
In second case again, my concern is that when multiple instances are running with ActiveMQ how they will coordinate between each other and ensure that everyone has access to data being held up in queue.
The questions may be lame, but if any pointer it will be helpful.
AJ
My customer wants to organize a live conference which will be stream to
2000-4000 people online. (he will do that 1 per month)
I don't think I can just use 1 server with wowza.
What would be the suitable/simplest solution ?
I heard something like amazon EC2 : can someone help me and help me to
choose the right solution/infrastructure to live stream to 2000-4000
persons.
Does EC2 automatically resizes its bandwidth according to number of viewers
?
You can use Elastic Load Balancing to split your traffic between your users so that no one server is stressed. If you're doing it internationally, you can also use Route 53 to automatically route users to the closest server to them.
Or if you want to do the same thing without having to be dependent on AWS, Wowza has a solution for this as well, Dynamic Load Balancing.
When using Wowza's solution, it looks like you could also use this feature to distribute your live stream across all of those servers.
I'd most definitely use a CDN. You can simply push a stream to a CDN, or let the CDN pull a stream from your encoder / media server. There's no need for ELB, Route 53 or DLB. A single EC2 instance connected to CloudFront will do. If you don't want to go AWS, then choose another CDN like BitGravity, Edgecast, StreamZilla, Limewire, etc. etc. Wowza has built-in support for connecting a stream to a CDN.
However, there are solutions where you won't even need a media server. There are several services on the web that can help you publish a stream and distribute it to 1000s of users.
So I am changing lots of my sites' "live" updates (currently working over AJAX) to use Websockets. Tried Pusher.com, pricing ridiculously high for my amount of traffic, so I got slanger (cheers Steve!) up on a big fat EC2 instance, redis EC instance, all good. Now for about 100M frames/day it seems to work fine, but I'd like to think ahead and consider what happens when I'll have even more traffic.
AWS ELBs do not support WS communication as I have read around here and on the AWS forums so far (quite lame considering people are asking for this since WS first popped up, thanks AWS!). So I am thinking to:
0) start with one instance, ws.mydomain.com
1) set up an auto-scaling group
2) cloudwatch alert on average CPU/memory usage
3) when it goes above 75% fire a SQS message saying something like "scale up now"
4) when message at #3 is received by some other random server polling the queue then fire up a new instance, add it to the group (ohnoes, that AWS API again!) and add the public IP to the Route53 DNS for ws.mydomain.com, so there will be 2 of them
5) when load drops fire another message, basically doing everything the other way around
So the question is: Could this work or should it be easier to go with an ELB in front of the slanger nodes?
TIA
Later edit:
1) don't care if we don't get the client IP
2) the slanger docs advertise that connection states are stored into redis so it does not matter to which node the clients connect to, so we don't need any session stickiness
When writing a web app with Django or such, what's the best way to connect to dynamic EC2 instances, such as a cluster of Redis or memcache instances? IP addresses change between reboots, etc. Elastic IPs are limited to 5 by default - what are some other options for auto-discovering/auto-updating which machines are available?
Late answer, but use Boto: http://boto.cloudhackers.com/en/latest/index.html
You can use security groups, tags, and other means to hit the EC2 API and pick the instances/IPs for each thing (DB Server, caching server, etc.) at load-time. We do this with great success in deployment, and are moving that way with our Django settings.py, as well.
One method that I heard mentioned recently in an AWS webinar was to store this sort of information in SimpleDB. Essentially, you would use SimpleDB as the central configuration location, and each instance that you launch would register its IP etc. with this configuration, so you would always have a complete description of all of your instances in one place. I haven't seen this in practice so I don't know what the best practices would be exactly, but the idea sounds reasonable. I suppose you could use SNS or something to signal all the other instances whenever the configuration changes, so everyone could refresh their in-memory cache of the configuration.
I don't know the AWS administrative APIs yet really, but there's probably an API call to list your EC2 instances, at which point you could use some sort of custom protocol to ping each of them and ask it what it is -- part of the memcache cluster, Redis, etc.
I'm having a similar problem and didn't found a solution yet because we also need to map Load Balancers addresses.
For your problem, there are two good alternatives:
If you are not using EC2 micro instances or load balancers, you should definitely use Amazon Virtual Private Cloud, because it lets you control instances IPs and routing tables (check all limitations before using this service).
If you are only using EC2 instances, you could write a script that uses the EC2 API tools to run the command ec2-describe-instances to find all instances and their public/private IPs. Then, the script could parameterize instances names to hosts and update /etc/hosts. Finally, you should put the script in the crontab of every computer/instance that need to access the EC2 instances (see ec2-describe-instances).
If you want to stay with EC2 instances (I'm in the same boat, I've read that you can do such things with their VPC or use an S3 bucket or something like that.) but with EC2, I'm in the middle of writing stuff like this...it's all really simple up till the part where you need to contact the server with a server from your data center or something. The way I'm doing it currently is using the API to create the instance and start it...then once its ready, I contact the server to execute a powershell script that I have on the server....the powershell renames the computer and reboots it...that takes care of needing the hostname and MAC for our data center firewalls. I haven't found a way yet to remotely rename a computer.
As far as knowing the IP, the elastic IPs are the way to go. They say you're only allowed 5 and gotta apply for more but we've been regularly requesting more and they give em to us..we're up to like 15 now and they haven't complained yet.
Another option if you dont' want to do all the computer renaming and such...you could use DHCP and set your computer up so when it boots it gets the computer name and everything from DHCP....I'm not sure how to do this exactly, I've come across very smart people telling me that's the way to do it during my research for Amazon.
I would definitely recommend that you get into the Amazon API...I've been working with it for less than a month and I can do all kinds of crazy things. My code can detect areas of our system that are getting stressed, spin up 10 amazon servers all configured to act as whatever needs stress relief, and be ready to send jobs to all in less than 7 minutes. Brings a tear to my eye.
The documentation is very complete...the API itself is a work of art and a joy to program against...I've very much enjoyed working with it. (and no, i dont' work for them lol)
Do it the traditional way: with DNS. This is what it was built for, so use it! When a machine boots, have it ask for the domain name(s) related to its function, and use that for your configuration. If it stops responding, re-resolve the DNS (or just do that periodically anyway).
I think route53 and the elastic load balancing stuff can be used to do this, if you want to stick to Amazon solutions.