Running Flask port 80 on Elastic-Beanstalk Worker

Running Flask port 80 on Elastic-Beanstalk Worker - amazon-web-services

Given an AWS Elastic-Beanstalk Worker box, is it possible to use Flask/port:80 to serve the messages coming in from the associated SQS queue?
I have seen conflicting information about what is going on, inside an ELB-worker. The ELB Worker Environment page says:
Elastic Beanstalk simplifies this process by managing the Amazon SQS queue and running a daemon process on each instance that reads from the queue for you. When the daemon pulls an item from the queue, it sends an HTTP POST request locally to http://localhost/ on port 80 with the contents of the queue message in the body. All that your application needs to do is perform the long-running task in response to the POST.
This SO question Differences in Web-server versus Worker says:
The most important difference in my opinion is that worker tier instances do not run web server processes (apache, nginx, etc).
Based on this, I would have expected that I could just run a Flask-server on port 80, and it would handle the SQS messages. However, the post appears incorrect. Even the ELB-worker boxes have Apache running on them, apparently for doing health-checks (when I stopped it, my server turned red). And of course it's using port 80...
I already have Flask/Gunicorn on an EC2 server that I was trying to move to ELB, and I would like to keep using that - is it possible? (Note: the queue-daemon only posts messages to port 80, that can't be changed...)
The docs aren't clear, but it sounds like they expect you to modify Apache to proxy to Flask, maybe? I hope that's not the only way.
Or, what is the "correct" way of setting up an ELB-worker to process the SQS messages? How are you supposed to "perform the long-running task"?
Note: now that I've used ELB more, and have a fairly good understanding of it - let me make clear that this it not the use-case that Amazon designed the ELB-workers for, and it has some glitches (which will be noted). The standard use-case, basically, is that you create a simple Flask app, and hook it into an ELB-EC2 server, that is configured to make it simple to run that Flask app.
My use-case was, I already had an EC2 server with a large Flask app, running under gunicorn, as well as various other things going on. I wanted to use that server (as an image) to build the ELB server, and have it respond to SQS-queue messages. It's possible there are better solutions, like just writing a queue-polling daemon, and that no-one else will ever take this option, but there it is...

The ELB worker is connected to an SQS queue, by a daemon that listens to that queue, and (internally) posts any messages to http://localhost:80. Apache is listening on port 80. This is to handle health-checks, which are done by the ELB manager (or something in the eco-system). Apache passes non-health-check-requests, using mod_wsgi, to the Flask app that was uploaded, which is at:
/opt/python/current/app/application.py
I suspect it would be possible but difficult to remove Apache and handle the health-checks some other way (flask), thus freeing up port 80. But that's enough of a change that I decided it wasn't worth it.
So the solution I found, is to change which port the local daemon posts to - by reconfiguring it via a YAML config-file, it will post to port 5001, where my Flask app was running. This mean Apache can continue to handle the health-checks on port 80, and Flask can handle the SQS messages from the daemon.
You configure the daemon, and stop/start it (as root):
/etc/aws-sqsd.d/default.yaml
/opt/elasticbeanstalk/addons/sqsd/hooks/stop-sqsd.sh
/opt/elasticbeanstalk/addons/sqsd/hooks/start-sqsd.sh
/opt/elasticbeanstalk/addons/sqsd/hooks/restart-sqsd.sh
Actual daemon:
/opt/elasticbeanstalk/lib/ruby/bin/aws-sqsd
/opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.2.0/gems/aws-sqsd-2.3/bin/aws-sqsd
Glitches:
If you ever use the ELB GUI to configure daemon options, it will over-write the config-file, and you will have to re-edit the Port (and re-start the daemon).
Note: All of the HTTP traffic is internal, either to the ELB eco-system or the worker - so it is possible to close off all external ports (I keep 22 open), such as Port 80. Otherwise your Worker has Apache responding to http://:80 posts, meaning it's open to the world. I assume the server is configured fairly securely, but Port 80 doesn't need to be open at all, for healthchecks or anything else.

Related

How to design an event listener app to only process events once and also be HA and scalable?

I have a situation where I have a NodeJs app that runs as an event listener. This NodeJs app listens for external events outside of my application through websocket.
I need each of the events coming in to only be processed once by my Nodejs app.
However, it's also crucial to ensure that this particular NodeJs app instance can auto-scale up/down when needed and is highly available so that it wouldn't be a bottleneck.
Usually, when it comes to scaling and HA, the first thing that come to my mind is to run a few of instances of it with a load balancer, or run multiple containers on something like ECS. Doing so would introduce multiple instances of the Nodejs app and would also mean each of the same events from the websocket will get processed more than once by all the instances/containers which received it.
What would be a good solution and design to tackle such a problem?

Not sure I fully understand the situation here but I think what you are saying is that you have a socket server that emit to other services, however that a single instance, even with dedicated resources is subject to bottlenecks.
Assuming what I have said is in-line with the question what you probably want to look at (not sure if you using socket.io or not) is the redis socket.io package. This will essentially use redis to store the sockets so you can cluster your socket server and not have it sending duplicates or missed users.
To your question about scale, you for sure would want to use containers for this, we actually use digitalocean 'apps' as an easy way to deploy our containers without having to manage Kuberneties and docker images, only downside right now is no auto scale, however scaling out is just a click of a button and with alerts setup we know when to scale up or down.
With this setup, we have our socket server running with managed redis server, when we need more socket server we just tick it up and we have more throughput.

Performance issues AWS - Docker - Django

I have an application deployed with docker on an EC2 instance: t3a.xlarge.
My application is using 7 different containers (cf image docker-ps.png):
A Django App, as an API (using python 3.6)
An Angular Application (using Angular2+)
A memcached server
A cerbot (using letsencrypt to automatically renew my SSL
certificats)
A Nginx, used as a reverse proxy to serve my angular application and
my Django API
A Postgres database
A Pgadmin in order to mananage my database
The issues happen when we send a push notification to our users using Firebase (around 42,000 users). The API is not responding during a certain amount of time: from 1min to 6min.
The Django API use the webserver Gunicorn (https://gunicorn.org/ ) with this configuration:
gunicorn xxxx_api.wsgi -b 0.0.0.0:80 --max-requests 500 --max-requests-jitter 50 --enable-stdio-inheritance -k gevent --workers=16 -t 80
The server or the container never crashed. When I check the metrics, we never use more than 60% of the CPU. Here is a screenshot of some metrics when the notification has been sent: https://ibb.co/Mc0v7R1
Is it because we are using too much bandwidth than our instance allowed us to use? Or should I use another AWS service?

Memory utilisation metrics are not captured for ec2 instances since OS level metrics are not available to AWS. You can collect custom metrics by your self
Reference:
https://awscloudengineer.com/create-custom-cloudwatch-metrics-centos-7/

I think your problem is about the design, you could try sending your push notifications as an async queue using things like SNS & SQS (it's AWS Way) or Celery & Redis (it's a traditional way)
If you choose the traditional way this post could help you
https://blog.devartis.com/sending-real-time-push-notifications-with-django-celery-and-redis-829c7f2a714f

I think Its because of queuing Http requests to firebase. I believe that you are sending 42000 firebase requests in a loop. I/O calls are blocking in nature. if you are running the Django app in single thread using gunicorn. these 42000 http calls will block the new calls until they are finished. they will stay in queue until the connection is alive or the requests are within nginx threshold. I don't think 42000 push notifications will exhaust memory and processing unless payload is too high.

Coordinating master and worker machines

If this question seems basic to more IT-oriented folks, then I apologize in advance. I'm not sure it falls under the ServerFault domain, but correct me if I'm wrong...
This question concerns some backend operations of a web application, hosted in a cloud environment (Google). I'm trying to assess options for coordinating our various virtual machines. I'll describe what we currently have, and those "in the know" can maybe suggest a better way (I hope!).
In our application there are a number of different analyses that can be run, each of which has different hardware requirements. They are typically very large, and we do NOT want these to be run on the application server (referred to as app_server below).
To that end, when we start one of these analyses, app_server will start a new VM (call this VM1). For some of these analyses, we only need VM1; it performs the analysis and sends a HTTP POST request back to app_server to let it know the work is complete.
For other analyses, VM1 will in turn will launch a number of worker machines (worker-1,...,worker-N), which run very similar tasks in parallel. Once the task on a single worker (e.g. worker-K) is complete, it should communicate back to VM1: "hey, this is worker-K and I am done!". Once all the workers (worker-1,...,worker-N) are complete, VM1 does some merging operations, and finally communicates back to app_server.
My question is:
Aside from starting a web server on VM1 which listens for POST requests from the workers (worker-1,..), what are the potential mechanisms for having those workers communicate back to VM1? Are there non-webserver ways to listen for HTTP POST requests and do something with the request?
I should note that all of my VMs are operating within the same region/zone on GCE, so they are able to communicate via internal IPs without any special firewall rules, etc. (e.g. running $ ping <other VM's IP addr> works). I obviously do not want any of these VMs (VM1, worker-1, ..., worker-N) to be exposed to the internet.
Thanks!

Sounds like the right use-case for Cloud Pub/Sub. https://cloud.google.com/pubsub
In your case workers would be publishing events to the queue and VM1 would be subscribing to them.
Hard to tell from your high - level overview if it can be a match, but take a look at Cloud Composer too https://cloud.google.com/composer/

Websocket server on AWS EC2 instance doesn't respond after two days of inactivity

We are using AWS EC2(ubuntu-xenial-16.04-amd64-server) instance for running PHP Websocket server.
We are using following command, in order to keep WebSocket server running continuously.
nohup php -q server.php >/dev/null 2>&1 &
It is running very well up to two days.But if no client has
connected to WebSocket server in last two days,it automatically stops
responding.
I checked the status of WebSocket port with this command (lsof -i:9000).I got following output(5&6)
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
php 1467 ubuntu 4u IPv4 17137 0t0 TCP *:9000 (LISTEN)
It seems WebSocket server is running.But client(i.e. mobile application) is not able to connect.
Is there any specific reason behind this problem? We are not able to figure out exact issue.

You'll need to provide more information for SO community to be able to help you.
Let's look at the layers for your infrastructure make an educated guess where the problem might be.
We have:
external connector (the mobile app)
PHP script acting as a server (receiver).
OS (Ubuntu)
OS kernel can kill running processes for various misbehaves. Most common OOM-killer (out-of-memory).
It's not uncommon to see PHP scripts becoming unresponsive especially when stream (sockets) programming is involved, we'll need to see that code.
You are saying that everything is fine for two days, so we can rule out external connector problem and concentrate on mismanaging of resources problems: garbage-collection, memory leaks, stream leaks, etc. Some external process is either killing your PHP script or PHP script itself becomes unresponsive.
The investigation should start at:
Sharing the server.php, and then moving to
Log analysis.

How to integrate web sockets with a django wsgi

We have a significantly complex Django application currently served by
apache/mod_wsgi and deployed on multiple AWS EC2 instances behind a
AWS ELB load balancer. Client applications interact with the server
using AJAX. They also periodically poll the server to retrieve notifications
and updates to their state. We wish to remove the polling and replace
it with "push", using web sockets.
Because arbitrary instances handle web socket requests from clients
and hold onto those web sockets, and because we wish to push data to
clients who may not be on the same instance that provides the source
data for the push, we need a way to route data to the appropriate
instance and then from that instance to the appropriate client web
socket.
We realize that apache/mod_wsgi do not play well with web sockets and
plan to replace these components with nginx/gunicorn and use the
gevent-websocket worker. However, if one of several worker processes
receive requests from clients to establish a web socket, and if the
lifetime of worker processes is controlled by the main gunicorn
process, it isn't clear how other worker processes, or in fact
non-gunicorn processes can send data to these web sockets.
A specific case is this one: A user who issues a HTTP request is
directed to one EC2 instance (host) and the desired behavior is that data is
to be sent to another user who has a web socket open in a completely
different instance. One can easily envision a system where a message
broker (e.g. rabbitmq) running on each instance can be sent a message
containing the data to be sent via web sockets to the client connected
to that instance. But how can the handler of these messages access
the web socket, which was received in a worker process of gunicorn?
The high-level python web socket objects created gevent-websocket and
made available to a worker cannot be pickled (they are instance
methods with no support for pickling), so they cannot easily be shared
by a worker process to some long-running, external process.
In fact, the root of this question comes down to how can web sockets
which are initiated by HTTP requests from clients and handled by WSGI
handlers in servers such as gunicorn be accessed by external
processes? It doesn't seem right that gunicorn worker processes,
which are intended to handle HTTP requests would spawn long-running
threads to hang onto web sockets and support handling messages from
other processes to send messages to the web sockets that have been
attached through those worker processes.
Can anyone explain how web sockets and WSGI-based HTTP request
handlers can possibly interplay in the environment I've described?
Thanks.

I think you've made the correct assesment that mod_wsgi + websockets is a nasty combination.
You would find all of your wsgi workers hogged by the web sockets and an attempt to (massively) increase the size of the worker pool would probably choke the server because of the memory usage and context switching.
If you like to stick with the synchronous wsgi worker architecture (as opposed to the reactive approach implemented by gevent, twisted, tornado etc), I would suggest looking into uWSGI as a application server. Recent versions can handle some URLs in the old way (i.e. your existing django views would still work the same as before), and route other urls to a async websocket handler. This might be a relatively smooth migration path for you.

It doesn't seem right that gunicorn worker processes, which are intended to handle HTTP requests would spawn long-running threads to hang onto web sockets and support handling messages from other processes to send messages to the web sockets that have been attached through those worker processes.
Why not? This is a long-running connection, after all. A long-running thread to take care of such a connection would seem... absolutely natural to me.
Often in these evented situations, writing is handled separately from reading.
A worker that is currently handling a websocket connection would wait for relevant message to come down from a messaging server, and then pass that down the websocket.
You can also use gevent's async-friendly Queues to handle in-code message passing, if you like.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js