Ideas for scaling chat in AWS?

I'm trying to come up with the best solution for scaling a chat service in AWS. I've come up with a couple potential solutions:
Redis Pub/Sub - When a user establishes a connection to a server that server subscribes to that user's ID. When someone sends a message to that user, a server will perform a publish to the channel with the user's id. The server the user is connected to will receive the message and push it down to the appropriate client.
SQS - I've thought of creating a queue for each user. The server the user is connected to will poll (or use SQS long-polling) that queue. When a new message is discovered, it will be pushed to the user from the server.
SNS - I really liked this solution until I discovered the 100 topic limit. I would need to create a topic for each user, which would only support 100 users.
Are their any other ways chat could be scaled using AWS? Is the SQS approach viable? How long does it take AWS to add a message to a queue?

Building a chat service isn't as easy as you would think.
I've built full XMPP servers, clients, and SDK's and can attest to some of the subtle and difficult problems that arise. A prototype where users see each other and chat is easy. A full features system with account creation, security, discovery, presence, offline delivery, and friend lists is much more of a challenge. To then scale that across an arbitrary number of servers is especially difficult.
PubSub is a feature offered by Chat Services (see XEP-60) rather than a traditional means of building a chat service. I can see the allure, but PubSub can have drawbacks.
Some questions for you:
Are you doing this over the Web? Are users going to be connecting and long-poling or do you have a Web Sockets solution?
How many users? How many connections per user? Ratio of writes to reads?
Your idea for using SQS that way is interesting, but probably won't scale. It's not unusual to have 50k or more users on a chat server. If you're polling each SQS Queue for each user you're not going to get anywhere near that. You would be better off having a queue for each server, and the server polls only that queue. Then it's on you to figure out what server a user is on and put the message into the right queue.
I suspect you'll want to go something like:
A big RDS database on the backend.
A bunch of front-end servers handling the client connections.
Some middle tier Java / C# code tracking everything and routing messages to the right place.
To get an idea of the complexity of building a chat server read the XMPP RFC's:
SQS/ SNS might not fit your chatty requirement. we have observed some latency in SQS which might not be suitable for a chat application. Also SQS does not guarantee FIFO. i have worked with Redis on AWS. It is quite easy and stable if it is configured taking all the best practices in mind.

I've thought about building a chat server using SNS, but instead of doing one topic per user, as you describe, doing one topic for the entire chat system and having each server subscribe to the topic - where each server is running some sort of long polling or web sockets chat system. Then, when an event occurs, the data is sent in the payload of the SNS notification. The server can then use this payload to determine what clients in its queue should receive the response, leaving any unrelated clients untouched. I actually built a small prototype for this, but haven't done a ton of testing to see if it's robust enough for a large number of users.

HI realtime chat doesn't work well with SNS. It's designed for email/SMS or service 1 or a few seconds latency is acceptable. In realtime chat, 1 or a few seconds are not acceptable.
Latency (i.e. “Realtime”) for PubNub vs SNS
Amazon SNS provides no latency guarantees, and the vast majority of latencies are measured over 1 second, and often many seconds slower. Again, this is somewhat irrelevant; Amazon SNS is designed for server-to-server (or email/SMS) notifications, where a latency of many seconds is often acceptable and expected.
Because PubNub delivers data via an existing, established open network socket, latencies are under 0.25 seconds from publish to subscribe in the 95% percentile of the subscribed devices. Most humans perceive something as “realtime” if the event is perceived within 0.6 – 0.7 seconds.

the way i would implement such a thing (if not using some framework) is the following:
have a webserver (on ec2) which accepts the msgs from the user.
use Autoscalling group on this webserver. the webserver can update any DB on amazon RDS which can scale easily.
if you are using your own db, you might consider to decouple the db from the webserver using the sqs (by sending all requests the same queue), and then u can have a consumer which consume the queue. this consumer can also be placed behind an autoscalling group, so that if the queue is larger than X msgs, it will scale (u can set it up with alarms)
sqs normally updates pretty fast i.e less than one second. (from the moment u sent it, to the moment it appears on the on the queue), and rarely more than that.

Since a new AWS IoT service started to support WebSockets, Keepalive and Pub/Sub couple months ago, you may easily build elastic chat on it. AWS IoT is a managed service with lots of SDKs for different languages including JavaScript that was build to handle monster loads (billions of messages) with zero administration.
You can read more about update here:
Last SQS update (2016/11): you can now use Amazon Simple Queue Service (SQS) for applications that require messages to be processed in a strict sequence and exactly once using First-in, First-out (FIFO) queues. FIFO queues are designed to ensure that the order in which messages are sent and received is strictly preserved and that each message is processed exactly once.
Now on, implementing SQS + SNS looks like a good idea too.


gunicorn + django + telegram + mqtt client

We use gunicorn with django and django-telegrambot. We also have a MQTT client in an own app. When some MQTT messages arrive we send Telegram messages and the other way around. The Problem is now that when we use gunicorn with multiple workers, we have multiple MQTT Clients, so that when a MQTT message arrives we will send multiple times the same Telegram message.
When we use gunicorns preload with workers, we only have one MQTT client, but then all processes share the same Telegram TCP connection and we get wired SSL errors. As an alternative we could use only use on process and multiple threads, but then sometimes MQTT and Telegram messages gets not processed (idk why).
Is there a way to get this running?
Instead of using webhooks one could use botpolling, but django-telegrambot says:
Polling mode by management command (an easy to way to run bot in local machine, not recommended in production!)
I'm not familiar with the django-telegrambot library, so I can't judge why the authors chose to make this statement (maybe ask on the GitHub repository …). However, both polling and webhooks are officially supported by Telegram (see here). IMHO both have pros and cons. Webhooks may have a slight performance benefit over polling, but also require more work to set up. Polling requires you to continuously fetch for updates, which can be seen as downside. OTOH with webhooks you have to have a webserver running. For small to medium sized bots (in terms of usernumber), polling should be fine - I'm using polling without problems for my (rather small) bots.
Please take this with a grain of salt as I'm far from being an expert on networking topics.

How to implement a request-response pattern on Google Cloud PubSub?

I have multiple clients A (main application) and multiple clients B (payments service).
If I publish a message from client A that will be processed and answered on client B (publishing an answer in another topic), how to capture this answer on client A?
The problem is that client A has multiple instances, so I can't guarantee that the exactly same instance that triggered the request will receive the response (PubSub will randomly pick one instance).
Saw that other brokers like RabbitMQ have "reply-to" option. Is there anything similar on Google PubSub?
That way, I could simulate a "synchronous" operation on client A and only answer to the user when processing/response is finished, instead of dealing with this check on front-end every time.
Thank you!
Decoupling publishers from subscribers is one of the core features of Cloud Pub/Sub, which follows the publish-subscribe pattern. There’s currently no support in Cloud Pub/Sub for sending responses from subscribers directly to an entity that published a given message.
You could work around this by including information about the instance of client A that published a given message, so client B could figure out which instance of client A to notify once processing has finished. For example, client B could send an RPC directly to the publisher, or if there are few enough instances of client A, they can each have dedicated topics where they receive “processing complete” messages as subscribers (on a topic that client B is the publisher for).
Potential issues to watch out for while you think about the right approach:
Cloud Pub/Sub offers at-least-once delivery. There is a possibility of having duplicate messages sent to subscribers, and your system will need to be resilient to this.
What happens if a given instance of client A or client B crashes at any point in your process? Would it introduce the risk of processing erroneous/duplicate payments?

Are websockets a suitable lowest latency and robust real-time communication protocol between two nearby servers in the same AWS Availability Zones?

Suitable technologies I am aware of:
Please suggest others if they are a better fit for my problem.
For this use case I have just two machines, the sender and the receiver, and it's important to note they are fixed "nearby" each other, as they will be in the same availability zone on AWS. Answers which potentially relate to message passing over large spans of the internet aren't necessarily applicable. Note also the receiver server isn't queuing these up as tasks, it will just be forwarding select message feeds to website visitors over a websocket. The sending server does a lot of pre-processing and collating to the messages.
The solution needs to:
Be very high throughput. At present the sending server is processing about 10,000 messages per second (written in Rust) without breaking a sweat. Bursty traffic may increase this up to 20,000 or a bit more. I know zeromq can handle this.
Robust. The communication pipe will be open 24/7 365 days per year. My budget is extremely limited in terms of setting up clusters of machines as failovers so I have to do the best I can with two machines.
Message durability isn't required or a concern, the receiving server isn't required to store anything, it just needs all the data. The sender server asynchronously writes a durable 5 second summary of the data to a database and to a cache.
Messages must retain the order in which they are sent.
Low latency. This is very important as the data needs to be as realtime as possible.
A websocket seems to get this job done for 1 to 4. What I don't know is how robust a websocket is for communication that's 24 hours a day 7 days a week. I've observed websocket connections getting dropped online in general (Of course I will write re-connect code, heartbeat mointoring if required but still this concerns me). I also wonder if the high throughput is too much for the websocket.
I have zero experience in this kind of problem but I have a very good websocket library that I'm comfortable using. I ruled out Apache Kafka as it seems expensive to get high throughput, tricky to manage with dev ops (zookeeper) and seems overkill as I don't need durability and it's only communication between 2 machines. So I'm hoping for a simple solution.
It sounds like you are explaining precicely what EC2 cluster placement groups provide:
Edit: You should be able to create the placement group with 2 machines to cover your limited budget. Using larger instances, according to your budget, will also support higher network throughput.
Point 4 looks like it would be supported by SQS FIFO though, despite the fact that SQS FIFO queues only support up to 3,000 messages per second with batching.
A managed streaming solution like Kinesis Data Streams would definitely cover your use case, at scale, much better than a raw web socket. Using Kinesis Client Libraries, you can write your consumer to read from the stream.
AWS also has a Managed Kafka service to rule out the overhead and management of necessary components like Apache ZK:

Expose Amazon SQS directly to clients or via an Webservice as proxy

I would like to use Amazon SQS in my application to queue requests from other external systems that don't belong to me.
What is the better way of doing this, directly expose the SQS Queue and the required messageformat OR publish a web service (WCF) that queues the request.
Also I read that SQS is relative slow for a singe access, but am I right that it can handle easyly a lot of concurrent accesses from different clients?
This is largely a matter of preference and depends a bit on your situation. But my recommendation would be to wrap it with your own web-service.
Building your web-service allows you to do things like validation, throttling, schema versioning etc. E.g. you can reject invalid messages with immediate synchronous feedback to the sender. If the external systems are publishing directly to your queue, then invalid messages become your problem not theirs, and if you revise your schema and want to reject old-schema messages then you either have to drop them or set up a separate back-channel to feed back information to the publisher. That adds unnecessary complexity to your system. Having a web-service would even let you switch to other queuing technologies later if you need to.
But building your own web-service has downsides too: will your own service be able to handle the same load as the SQS API with the same low latency? It won't scale infinitely like SQS, so how responsive will you need to be to changes in load? Have you got the resources to manage a separate service? And it's more work than just giving a client's AWS account permission to publish to your queue.
If you're happy with the extra work involved, and you want a more future-proof system, IMHO it's worth building the web-service wrapper.

Best practices for sending automated daily emails from web service

I am running a web service that currently sends confirmation emails out to new users via the gmail smtp servers. As I'm only getting a few new users each day, this hasn't been a problem.
I've recently added new features to the webapp that will require a customized message to be sent out to each user every day. Think of this as similar to the regular messages LinkedIn sends out that give you a status report on the activity in your network. Every user's message will be different. With thousands of users, this means thousands of unique messages will be sent each day.
Edit: I've since found that these types of email are called "transactional or relationship messages". Spamtacular has a good article on differentiating between marketing and transactional email.
I don't think using gmail's smtp servers will cut it anymore, but I don't know that for sure. I don't know what gmail's maximum outgoing messages per account is (it might be 100/day), but they limit outgoing mail to 500 recipients per message. I'm not sending a single message to 500 recipients, but I'm going to be sending 1000's of customized messages with each recipient getting one per day.
I'm interested to learn any best practices for doing this (especially for Java-based webapps). Here are some of my thoughts and concerns on it:
Should I set up my own outgoing mail server? If I do this, it seems like I'll have all sorts of other issues to worry about, such as preventing mail server abuse, monitoring bounces, allowing ways to opt-out of emails, etc. Are there any tools or services to help with this? Maybe something like OpenEMM or a services like MailChimp? But those seem focused more toward email marketing campaigns.
I don't think I should have the webapp itself handle sending emails as it currently is for new user signups. I'm thinking I should setup a separate messaging server that can access the same backend/datastore as the webapp. Thoughts on this?
Should I consider setting up some sort of message queueing service to help with this, such as JMS, RabbitMQ, ActiveMQ, etc.?
Do I need to provide users a way to opt-out? Do I need to flag these as bulk messages? I don't really consider these email marketing messages, but I'm unsure what is considered appropriate or proper netiquette.
Any advice is appreciated. I'm also very interested in open source tools or web services that simplify things and could help me to ramp up as quickly as possible.
With regard to your first question, yes, you should set up your own mail server. Using gmail to do this might work for a while, but they are likely to shut you down in short order when they see this kind of activity. You could sign up for a business account and use app engine to send messages. Here's a link with information about mail quotas for that service.
Regarding your second and third questions, It would be a good idea to have messages queued by the web app and sent out by a centralized service rather than having the app send out the messages on its own.
Usually I would just use a database table as a queue - the web app inserts rows for each message it wants to send. A service/scheduled task app would grab new messages out of the table and send them off. This gives you lots of flexibility if you want to switch mail servers later, better reliability if the mail server is down, easier diagnostics if there are problems with recipients not getting messages, and the ability to resend messages. As for using JMS/MQ to do this - probably not necessary. IMO a database table used as a queue would give you more flexibility here than an actualy JMS-based queue system.
As for opt outs, YES - you should give people a way to opt out. I don't think you need to flag the messages as bulk though.
On the architecture side of things I would definitely consider decoupling the sending of the emails from the main service via some form of asynchronous message queuing (or facsimile thereof using database as an intermediary). Another benefit of this approach is that if the SMPT server\network is down you could build in retry semantics, additionally for future scalability you could implement multiple mail senders reading from the same queue or implement sending throttling or scheduling (i.e send n messages per hour), etc etc.