Threading vs Containers in Orchestration - amazon-web-services

I am working on re-designing an existing Spring Boot application which is trying to automate a series of executions of work that the company is currently having done manually by it's staff. It has a main application that works almost as an orchestration application in the sense that it is the service that will call other applications to get the overall piece of work done. It has 7 sub-systems that it invokes, 3 of these systems need to be invoked in some form of order and complete before the other 4 are invoked but they can be invoked asynchronously.
All of these sub-systems have now been moved to Spring Microservices and the application I'm working on must invoke these microservices (some in order and some asynchronously), it is possible that my application will be called more than once at the same time so I need to consider that multiple containers may be needed for each sub-system. I've implemented Open-Feign to invoke each of the Microservices.
They also have the plan in the not so distant future to move this to AWS ECS/Fargate, however for the time being it is going to be run in Linux VM's and the containers are created on the same private network for communication. I'm wondering if I should remove ThreadPoolTaskExecutor completely and just invoke a new container for each simultaneous request to my application, however I've read that threads on a process are still faster and have less overhead than creating a process on a container and considering there's not going to be many containers invoked simultaneously I'm perplexed as to the best approach.
Any advice would be appreciated.

Unless each request scales up the memory consumption linearly in the application by at least an additional 1Gb (then new pod base memory would not be that much compared with it) it is an overkill to spin up a new pod for each request... it is 200Mb of additional memory for each request and I don't see a benefit nor the need.

Related

How to create one docker container per web service request?

I have a quite heavy batch process (a python script called "run_simulation.py") on which I have very little control, it can be launched by a single user through a web api but it read and writes from disk so it wouldn't handle parallel requests.
Now, I'd like to have one docker container instanciated per request so that all requests can be handled in parallel, what would be the way to do this ? Is this even doable with Docker ? What would be the module responsible to instanciate the container and pass the http request to it ?
Generally you don’t do this. There are two good reasons for that: if you unconditionally launch a container per request it becomes very easy to swamp your system with these background jobs to the point where none can progress; and the setup that would allow you to launch more Docker containers would also give you unlimited root-level access to the host, which you don’t want in a process that accepts network requests.
A better approach is to set up a job queue system. RabbitMQ is popular and open-source, but by no means the only option. When you receive a request that needs background work, you add a job to the queue and return immediately. Meanwhile, you have some number of worker processes which accept jobs from the queue and do the work.
This gives you a couple of benefits. You control how much work can be done in parallel (by controlling the number of worker containers). If you need to do more work by setting up a second server (or even more), they can all connect back to the same queue server, without requiring a complex multi-host container setup. If your workers crash (or get OOM-killed) their jobs will be returned to the queue and can be picked up and retried by other workers. If you decide Docker doesn’t work for you, or that you need a different orchestrator (Nomad, Kubernetes) you can run this exact same setup without making any code changes, just changing the deployment configuration.

Running multiple app instances on a single container in PCF

We have an internal installation of PCF.
A developer wants to push a stateless (obeys 12 factor rules) nodejs app which will spawn other app instances i.e leverage nodejs clustering as per https://nodejs.org/api/cluster.html. Hence there would be multiple processes running on each container. Any known issues with this from a PCF perspective? I appreciate it violates the rule/suggestion of one app instance per container but that is just a suggestion :) All info welcome.
Regards
John
When running an application on Cloud Foundry that spawns child processes, the number one thing you need to watch out for is memory consumption. You set a memory limit when you push your application which is for the entire container. That includes the parent process, whatever child processes are spawned and a little overhead (currently init process, sshd process & health check).
Why is this a problem? Most buildpacks make the assumption that only one process will be running and that it will consume as much memory as possible while staying under the defined memory limit. They attempt to configure the software which is running your application to do this. When you spawn child processes, this breaks the buildpack's assumptions and can create scenarios where your application will exceed the defined memory limit. When this happens, even by one byte, the process will be killed and restarted.
If you're concerned with scaling your application, you should not try to spin off child processes in one extra large container. Instead, let the platform help you and scale up the number of application instances. The platform can easily do this and by using multiple smaller containers you can scale just as well. In fact, if you already have a 12-factor app, it should be well positioned to work in this manner.
Good luck!

Running RabbitMQ+Celery in the same server as production environment

I'm running a Django app in an EC2 instance, which uses RabbitMQ + Celery for task queuing. Are there any drawbacks to running my RabbitMQ node from the same EC2 instance as my production app?
The answer to this questions really depends on the context of your application.
When you're faced with scenarios you should always consider a few things.
Seperation of concerns
Here, we want to make sure that if one of the systems are not responsible for the running of other systems. This includes things like
If the ec2 instance running all the stuff goes down, will the remaining tasks in queue continue running
if my RAM is full, will all systems remain functioning
Can I scale just one segment of my app without having to redesign infrastructure.
By having rabbit and django (with some kind of service, wsgi, gunicorn, waitress etc) all on one box, you loose a lot of resource contingency.
Although RAM and CPU may be abundant, there is a limit to IO, disk writes, network writes etc. This means that if for some reason you have a heavy write function, all other systems may suffer as a result. If you have a heavy write to RAM funciton, the same applies.
So really the downfalls from keeping things in one system that I can see from your question and my own experience are as follows.
Multiple points of failure. If your one instance of rabbit fails, your queues and tasks stop working.
If your app starts generating big traffic, other systems start to contend for recourses.
If any component goes down, that could mean other downtime of other services.
System downtime means complete downtime of all components.
Lots of headaches when your application demands more resources with minimal downtime.
Lots of web traffic will slow down task running
Lots of task running will slow down web requests
Lots of IO will slow down all the things
The rule of thumb that I usually follow is keep single points of failures far from each other - that way you only need to manage those components. A good use case for this would be to use an EC2 instance for your app, another for your workers and another for your rabbit. That way you can apply smaller/bigger instances for just those components if you need to. You can even create AMIs and create autoscaling groups - if it is your use case.
Here are some articles for reference
Seperation of concern
Modern design architectures
Single points of failure
TLDR; If you can run on one EC2 you should but make it easy to scale today.
Both Joshnidhin and Giannis covered the RAM, IO and CPU aspects.
I have run production apps in single instances with containerization and slept with peace of mind that if tomorrow suddenly lots of people want what I have built, I can scale pretty quickly by deploying those containers on different instances instead of one single instance.
Docker allows you to put a limit on CPU consumption and memory usage for each container hence you can also be sure that they will not step into each other.
If we take EC2 instance out of this question it becomes:
Are there any drawbacks in running RabbitMQ Node on the same server as my productions app?
I would say it depends on various things like, kind of workloads and its composition, complexity of the workload, do you expect growth in usage etc.
If your workload is well behaved and the server is big enough for both (app + task q) then why not as there will be only one server to manage. Make sure to protect these 2 process from each other by limiting their system resource usage.
If your traffic is not well behaved then you might want more the one server. In this case having dedicated servers is better (separation of concerns) as you will have to manage more than one server.
Now back to EC2, all the above still apply. EC2 makes horizontal scaling of applications easier so if you have them on separate instance then you can scale them individually and cost effectively. If not when you scale there will be wastage of resources.

How to restrict the resources within one JVM

I'm trying with WSO2 products, and I'm thinking about a scenario where bad code could take up all the CPU time (e.g. dead loop or so). I did try it with WSO2 AS with 2 tenants, A and B. And A's bad code does affect B and B's app will have a very long reponse delay or even stuck. Is there a way to restrict the CPU usage of a tenant? Thanks!
At the moment, you will have to setup your environment in what is known as private jet mode, where each tenant gets its own JVM, if you need total isolation.
In a shared environment, we have stuck thread detection which will ensure that critical threads will not run for more than a specified time period. We have plans for CPU usage limiting on per tenant basis. This would be available in a future release.
My suggestion would be to not run two tenants in one application server. Run two separate processes on the same machine. Better yet, run two separate processes in separate OS-level containers (like a jail or an lxc container). Or separate virtual machines if you can't use containers.
Operating systems give you tools for controlling CPU use - rlimit and nice for processes, and implementation-specific facilities for containers and VMs. Because they're implemented in the OS (or virtual machine manager), they are capable of doing this job correctly and reliably. There's no way an application server can do it anywhere near as well.
In any case, having separate applications share an application server and JVM is a terrible idea which should have been put to death in the '90s. There's just no need for it, and it introduces so many potential headaches.

How to serve CPU intensive webservice requests in the cloud?

Background: I'm running a webservice in which each request involves a fair amount of computations (up to 10 seconds on a quadcore machine).
Each request can be broken down to about 150 independent (and equally small) subtasks.
What I'm after: l'm looking for a hosting service that allows me to serve these kinds of requests efficiently in a scalable manner.
What I've considered: I've looked into Google App Engine and Rackspace.
It seems to me as if GAE is intended for simple requests, requiering litte resources to process. Problem with something like Rackspace is that I can't tell in advance how many vCPUs I may need (and even if I knew how big future spikes would be, I don't want to sit with, say, 40 servers idling the rest of the time)
Questions:
Would it be possible to use GAE in the following way:
For each request, split it up into 150 subtasks
Process all subtasks independently by doing 150 concurrent HTTP requests to the same webapp (but through a differrnt method)
Collect the results from the "subresults" and return a response to the original request.
Is there any possibility that Map Reduce for GAE could be of any help?
Is there any other service better suited for this task?
Yes, this is possible. The usual way would be to use Task Queue, possibly via DeferredTask helper class.
1.3 Normal web requests (to frontend instances) are limited to 30s, so doing this in synchronous way is not guaranteed to succeed. Also note that instances are artificially limited to do 10 parallel requests (if multithreading is enabled).
Yes, this is a job for map reduce. But note that map reduce is async - you give it tasks to do and it will be done sometime in the future.
Given the processing you need you might want to look at GAE backends (they are long running with multithrading and come in different sizes). If you need even more processing power, then you might want to look at Compute Engine.
Unless all of these 150 subtasks are read-only activities, trying to run them all in a single thread is just not safe. Web requests are unreliable - people can cancel, hit refresh if it takes too long, close windows in the middle, or just time out due to network issues. The background HTTP requests, likewise, can have a whole mess of problems. The standard solution is to have your front-end code simply build a list of things that need to be done, so it can get back to the user quickly, and have a back-end 'worker' process handle the (potentially unreliable) subtasks. Depending on what your application is doing, you might bounce the user to a "working" screen (like searching for airfare) where they can safely wait for the results to come up, or it might just be stuffed away as a "pending" job (like ordering something from Amazon).
There's countless different ways to handle this basic workflow. If you stick with Google App Engine, they have a "task queue" as part of the platform - providing a simple mechanisms for creating & dispatching background tasks. If you go with Rackspace, their cloud offering is less of a unified platform so you'll have to either roll your own queue or get one to plug into your setup.