Deploying distributed queue mechanism on GCP - google-cloud-platform

Background
I have a system working on Google Cloud. This system built on Micro-services architecture. The application communicating with each other with queues. Currently, I using Azure Storage Queues and want to move this managed service into GCP too.
Requirements
Mandatory:
Max processing time - 1 second to 1 hour.
Ack mechanism to mark tasks as completed.
Scaleable solution - should support the load of a few thousands of messages per second.
Support pulling of messages.
Nice to have:
Handle priorities - some tasks queues are more important than others.
Managed solution - I prefer to use a managed solution rather than install it my self.
Solutions I have checked
I already checked those services and I refused it because:
Microsoft Azure Storage Queue - Outside of google cloud.
Google Cloud PubSub - Not meet the mandatory requirement - Processing time is limited to 10 minutes.
Google Cloud Task - Seems like mostly designed for serverless applications, which are not suitable for my application.
I also check the RabbitMQ solution which seems to support all my requirements, except the one of 'managed solution'. It seems like GCP not provide RabitMQ as a service. So, I'll have to deploy it into virtual machines and make all the configuration my self...
The question
To be honest, it seems like the solution that I'm looking for should be pretty common. Nothing really special. But, all of the services I described here have some critical cons.
Which service did I miss here?

Related

What is the best alternative of ParkMyCloud to rightsize and schedule when your resources run and stop for GCP?

Context: We are working on GCP and using ParkMyCloud(PMC) to schedule the resources run and stop. The developers from different teams in the company have access to the run/stop operations on the PMC application. This is to reduce the costs on GCP.
Problem: ParkMyCloud is no more available for GCP (because it is bought by IBM and the substituted application is very expensive)
The application which are proposed to be an alternative to PMC are doing only monitoring&security in general.
My question is that if there is an application which performs the following:
Resources (instances, sql operations etc.) can by shut down manually,
Scheduling the the resources run and stop is possible and can be customised.
After googling I have found some candidates such as Doit-Flexsave and Rapid7-InsightCloudSec, but these are doing only Monitoring, Reporting or Security. I am not looking for such tools.
Another solution is to use Cloud Functions and Scheduler, but for now I would like to know that if I can find a tool like ParkMyCloud.

Use Azure Batch for non-parallel work? Is there a better option?

I have a scenario where our Azure App Service needs to run a job every night. The job cannot scale to multiple machines -- it involves downloading a large data file, and does special processing on it (only takes a couple minutes). Special software will be required to be installed as well. A lot of memory will be needed on the machine for the computation, therefore I was thinking one of the Ev-series machines. For these reasons, I cannot run the job as a web job on the Azure App Service, and I need to delegate it elsewhere.
Anyway, I have experience with Azure Batch so at first I was thinking of Azure Batch. But I am not sure this makes sense for my scenario because the work cannot scale to multiple machines. Does it make sense to have a pool with a single node and single vm on the node? When I need to do the work, an Azure web job enqueues the job, and the pool automatically sizes from 0 to 1?
Are there better options out there? I was look to see if there are any .NET libraries to spin up a single VM and start executing work on it, then disable the VM when done, but I couldn't find anything.
Thanks!
For Azure Batch, the scenario of a single VM in a single pool is valid. Azure Container Instances or Azure Functions would appear to be a better fit, however, if you can provision the appropriate VM sizes for your workload.
As you suggested, you can combine Azure Functions/Web Jobs to enqueue the work to an Azure Batch Job. If you have autoscaling or an autopool set on the Azure Batch Pool or Job, respectively, then the work will be processed and the compute resources will be deallocated after (assuming you have the correct settings in-place).

Cloud computing service to run thousands of containers in parallel

Is there any provider, that offers such an option out of the box? I need to run at least 1K concurrent sessions (docker containers) of headless web-browsers (firefox) for complex UI tests. I have a Docker image that I just want to deploy and scale to 1000 1CPU/1GB instances in second, w/o spending time on maintaining the cluster of servers (I need to shut them all down after the job is done), just focuse on the code. The most close thing I found so far is Amazon ECS/Fargate, but their limits have no sense to me ("Run containerized applications in production" -> max limit: 50 tasks -> production -> ok). Am I missing something?
I think that AWS Batch might be a better solution for your use case. You define a "compute environment" that provides a certain level of capacity, then submit tasks that are run on that compute environment.
I don't think that you'll find anything that can start up an environment and deploy a large number of tasks in "one second": in my experience it takes about a minute or two ramp-up time for Batch, although once the machines are up and running they are able to sequence jobs quickly. You should also give consideration to whether it makes sense to run all 1,000 jobs concurrently; that will depend on what you're trying to get out of your tests.
You'll also need to be aware of any places where you might be throttled (for example, retrieving configuration from the AWS Parameter Store). This talk from last year's NY Summit covers some of the issues that the speaker ran into when deploying multiple-thousands of concurrent tasks.
You could use lambda layers to run headless browsers (I know there are several implementations for chromium/selenium on github, not sure about firefox).
Alternatively you could try and contact the AWS team to see how much the limit for concurrent tasks on Fargate can be increased. As you can see at the documentation, the 50 task is a soft limit and can be raised.
Be aware if you start via Fargate, there is some API limit on the requests per second. You need to make sure you throttle your API calls or you use the ECS Create Service.
In any case, starting 1000 tasks would require 1000 seconds, which is probably not what you expect.
Those limits are not there if you use ECS, but in that case you need to manage the cluster, so it might be a good idea to explore the lambda option.

Simple task queue using Google Cloud Platform : issue with Google PubSub

My task : I cannot speak openly about what the specifics of my task are, but here is an analogy : every two hours, I get a variable number of spoken audio files. Sometimes only 10, sometimes 800 or more. Let's say I have a costly python task to perform on these files, for example Automatic Speech Recognition. I have a Google Intance managed group that can deploy any number of VMs for executing this task.
The issue : right now, I'm using Google PubSub. Every two hours, a topic is filled with audio ids. Instances of the managed group can be deployed depending on the size of queue. The problem is, only one worker get all the messages from the PubSub subscription, while the others are not receiving any, perhaps because the queue is not that long (maximum ~1000 messages). This issue is reported in a few cases in the python Google Cloud github, and it is not clear if it is the intended purpose of PubSub, or just a bug.
How could I implement the equivalent of a simple serverless task queue in Python and Google Cloud, and can spawn instances based on a given metric, for example the size of the queue ? Is this the intended purpose of PubSub ?
Thanks in advance.
In App Engine you can create push queues and set rate/concurrency limits and Google will handle the rest for you. App Engine will scale as needed (e.g. increase Python instances).
If you're outside of App Engine (e.g. GKE), the pubsub Python client library may be pulling many messages at once. We had a hard time controlling this (for google-cloud-pubsub==0.34.0) as well so we ended up writing a small adjustment on top of google-cloud-pubsub calling SubscriberClient.pull with max_messages set). The server side pubsub API does adhere to max_messages.

How to convert a WAMP stacked app running on a VPS to a scalable AWS app?

I have a web app running on php, mysql, apache on a virtual windows server. I want to redesign it so it is scalable (for fun so I can learn new things) on AWS.
I can see how to setup an EC2 and dump it all in there but I want to make it scalable and take advantage of all the cool features on AWS.
I've tried googling but just can't find a simple guide (note - I have no command line experience of Linux)
Can anyone direct me to detailed resources that can lead me through the steps and teach me? Or alternatively, summarise the steps in an answer so I can research based on what you say.
Thanks
AWS is growing and changing all the time, so there aren't a lot of books to help. Amazon offers training that's excellent. I took their three day class on Architecting with AWS that seems to be just what you're looking for.
Of course, not everyone can afford to spend the travel time and money to attend a class. The AWS re:Invent conference in November 2012 had a lot of sessions related to what you want, and most (maybe all) of the sessions have videos available online for free. Building Web Scale Applications With AWS is probably relevant (slides and video available), as is Dissecting an Internet-Scale Application (slides and video available).
A great way to understand these options better is by fiddling with your existing application on AWS. It will be easy to just move it to an EC2 instance in AWS, then start taking more advantage of what's available. The first thing I'd do is get rid of the MySql server on your own machine and use one offered with RDS. Once that's stable, create one or more read replicas in RDS, and change your application to read from them for most operations, reading from the main (writable) database only when you need completely current results.
Does your application keep any data on the web server, other than in the database? If so, get rid of all local storage by moving that data off the EC2 instance. Some of it might go to the database, some (like big files) might be suitable for S3. DynamoDB is a good place for things like session data.
All of the above reduces the load on the web server to just your application code, which helps with scalability. And now that you keep no state on the web server, you can use ELB and Auto-scaling to automatically run multiple web servers (and even automatically launch more as needed) to handle greater load.
Does the application have any long running, intensive operations that you now perform on demand from a web request? Consider not performing the operation when asked, but instead queueing the request using SQS, and just telling the user you'll get to it. Now have long running processes (or cron jobs or scheduled tasks) check the queue regularly, run the requested operation, and email the result (using SES) back to the user. To really scale up, you can move those jobs off your web server to dedicated machines, and again use auto-scaling if needed.
Do you need bigger machines, or perhaps can live with smaller ones? CloudWatch metrics can show you how much IO, memory, and CPU are used over time. You can use provisioned IOPS with EC2 or RDS instances to improve performance (at a cost) as needed, and use difference size instances for more memory or CPU.
All this AWS setup and configuration can be done with the AWS web console, or command-line tools, or SDKs available in many languages (Python's boto library is great). After learning the basics, look into CloudFormation to automate it better (I've written a couple of posts about that so far).
That's a bit of the 10,000 foot high view of one approach. You'll need to discover the details of each AWS service when you try to use them. AWS has good documentation about all of them.
Depending on how you look at it, this is more of a comment than it is an answer, but it was too long to write as a comment.
What you're asking for really can't be answered on SO--it's a huge, complex question. You're basically asking is "How to I design a highly-scalable, durable application that can be deployed on a cloud-based platform?" The answer depends largely on:
The specifics of your application--what does it do and how does it work?
Your tolerance for downtime balanced against your budget
Your present development and deployment workflow
The resources/skill sets you have on-staff to support the application
What your launch time frame looks like.
I run a software consulting company that specializes in consulting on Amazon Web Services architecture. About 80% of our business is investigating and answering these questions for our clients. It's a multi-week long project each time.
However, to get you pointed in the right direction, I'd recommend that you look at Elastic Beanstalk. It's a PaaS-like service that abstracts away the underlying AWS resources, making AWS easier to use for developers who don't have a lot of sysadmin experience. Think of it as "training wheels" for designing an autoscaling application on AWS.