Scaling Google Cloud Pull subscribers - google-cloud-platform

I'm considering Google Cloud Pub/Sub and trying to determine whether I will go for Pull or Push subscriber model.
Notably the pull model is able to handle larger throughput
Large volume of messages (many more than 1/second)
Efficiency and throughput of message processing is critical.
However, the push model can sit behind an HTTP load balancer, and is therefore able to auto-scale subscriber nodes during times when the number of queued messages exceeds the capacity of a single subscriber node.
The pull model is also more secure, because it does not require exposing sensitive operations in an HTTP endpoint.
The issue is how can we scale subscriber nodes in the pull model? Is there something in GCP for this kind of situation?

There are several options for auto scaling via a pull subscriber:
Use GKE and to set up Autoscaling Deployments with Cloud Monitoring Metrics. This allows you to scale the number of instances based on the num_undelivered_messages metric.
Use a GCE managed instance group and scale based the num_undelivered_messages metric.
Use Dataflow to process messages from Pub/Sub and set up autoscaling.

When you pull a subscription, you need to be connected to the subscription. For the lowest latency, being connected full time is required. So, you can do this with compute engine.
Your compute pulls your queue and consume the messages. In case of huge amount of message, the compute engine resources (CPU and Memory) will increase. You can put this compute into a manage instance group (MIG) and set scalability threshold, like the quantity of CPU use.
Of course, the message pulling is more efficient in term of network bandwidth and protocol handshake. However, it requires a compute full time up and the scalability velocity is slow.
If you consider the Push subscription, of course, the HTTPS protocol consume much more bandwidth and is not very efficient, but you can push the message to Cloud Run or Cloud Functions. The scalability is very elastic and based on the traffic (number of messages pushed) and not on the CPU usage.
In addition, you can push pubsub message securely to CLoud Functions and Cloud Run by using the correct identification in your pubsub subscription

Related

GCP: Splitting traffic in ratio across regions (Active-Active)

I am designing an application that runs on multiple regions. say R1, R2.
Files are submitted to a multi-region cloud storage bucket. PUT event in the bucket will publish a notification to either directly trigger the cloud function or to an pub/sub topic.
I want 80% of processing to be done by R1, and 20% by R2.
Approach1:
Have 2 Cloud functions: CF-R1, CF-R2.
How do I ensure that 80% of storage bucket notifications trigger CF-R1 & 20% trigger CF-R2?
Approach 2:
Have pub/sub topic which captures notification from the storage bucket.
Is it possible to configure CF-R1 & CF-R2 on the topic so that I can split traffic?
Or any other approach to handle this scenario.
Approach 1: Use a Load balancer with URL maps
You coudl use a Cloud function or Cloud Run and use a load balancer with a URL map (announced in June in this blog post - see documentation).
If you use the load balancer you can trigger the notification to the balancer directly or via pubsub with a PUSH subscription.
Note that the load balancer is a separate product and you must take a close look at usage and price.
Approach 2: Several pubsub subscriptions with a filter
I think the second option could be viable. Crazy to do for your case, but it will work.
Google has now in beta the option to apply a filter to a pubsub topic when you create a subscription.
Then, you can have a cloud function (or a cloud run) reacting to the pubsub notifications they recieve on their own subscription.
With this beta feature, you can filter by message values (equals ==, not equals !=, and hasPrefix).
The trick here is to have enough information to distribute the messages between the functions evenly because you cannot change the filter after you create the subscription.
If you can pass that information in your app, or as part of the filename, you can do it this way in an easy way.
If not, I guess the crc32 might have enough information for the filter you need.
But this filter has a 128 character limit that you hit with this:
hasPrefix(attributes.crc32,"A") OR hasPrefix(attributes.crc32,"B") OR hasPrefix(attributes.crc32,"C") OR hasPrefix(attributes.crc32,"D") OR hasPrefix(attributes.crc32,"E")
With the filter above you have almost 10% of the CRC32 possible cases. Not bad for some simple cases, but not good for you since you would have to configure a lot of subscriptions.

restful API maximum limit : API Queue

I am developing a rest service using Spring boot. The rest service takes an input file and do some operation on it and return back the processed file.
I know that in spring boot we have configuration "server.tomcat.max-threads" which can be a maximum of 400.
My rest application will be deployed on a cluster.
I want to understand how I should be handling if the request is more than 400 for a case wherein my cluster has only one node.
Basically I wanted to understand what is the standard way for serving requests more than the "max-thread-per-node X N-nodes" in a cloud solution.
Welcome to AWS and Cloud Computing in general. What you have described is the system elasticity which is made very easy and accessible in this ecosystem.
Have a look at AWS Auto Scaling. It is a service which will monitor your application and automatically scale out to meet the increasing demand and scale in to save costs when the demand is low.
You can set triggers for the same. For eg. If you know that your application load is a function of Memory usage, whenever memory usage hits 80% you can add nodes to the custer. read more about various scaling Policies here.
One such scaling metric is ALBRequestCountPerTarget. It will scale the number of nodes int he cluster to maintain the average request count per node(target) in the cluster. With some buffer, you can set this to 300 and achieve what you are looking for. Read more about this in the docs.

AWS, how to prepare for user rush due to Push notification?

AWS should be useful for getting more server resources for user rush (for instance due to push notification)
What are the components (because AWS has so many services) I should look for ?
would it be possible to increase server capacity dynamically (programmatically) just before we send out all-user push notification?
Assuming that the emphasis on push notifications is just for the sake of giving an example for a cause of overloading on EC2 instances; you should first have an autoscaling group:
If the period your load increases is certain and fixed, you can set scheduled scaling for that autoscaling group.
Or, you can watch for metrics such as CPU usage etc. and trigger scaling in & out with events alarmed through CloudWatch. For deeper understanding of the concept, you can check: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-simple-step.html

Tracking Usage per API key in a multi region application

I have an app deployed in 5 regions.
The latency between the regions varies from 150ms to 300ms
Currently, we use the method outlined in this article (usage tracking part):
http://highscalability.com/blog/2018/4/2/how-ipdata-serves-25m-api-calls-from-10-infinitely-scalable.html
But we export logs from Stackdriver to Cloud Pub/Sub. Then we use Cloud Dataflow to count the number of requests consumed per API key and update it in Mongo Atlas database which is geo-replicated in 5 regions.
In our app, we only read usage info from the nearest Mongo replica for low latency. App never updates any usage data directly in Mongo as it might incur latency cost since the data has to be updated in Master which may be in another region.
Updating API key usage counter directly from the app in Mongo doesn't seem feasible because we've traffic coming in at 10,000 RPS and due to the latency between region, I think it will run into some other issue. This is just a hunch, so far I've not tested it. I came to this conclusion based on my reading of https://www.mongodb.com/blog/post/active-active-application-architectures-with-mongodb
One problem is that we end up paying for cloud pub/sub and Dataflow. Are there strategies to avoid this?
I researched on Google but didn't find how other multi-region apps keep track of usage per API key in real-time. I am not surprised, from my understanding most apps operate in a single region for simplicity and until now it was not feasible to deploy an app in multiple regions without significant overhead.
If you want real-time then the best option is to go with Dataflow. You could change the way data arrives to Dataflow, for example usging Stackdriver → Cloud Storage → Dataflow, but instead of going though pub/sub you would go through Storage, so it’s more of a choice of convenience and comparing prices of each product cost on your use case. Here’s an example of how it could be with Cloud Storage.

AWS: Autoscaling based on the size of the queue

AWS auto scaling works based on the load (number of concurrent requests). It works perfectly for web sites and web APIs. However there are situations in which the number of required EC2 instances is not related to the requests but it depends on something else such as number of items in a queue.
For example an order processing system which pulls the orders from a custom queue (and not SQS) might need to scale out to process the order quicker. How can we make this happpen?
Auto scaling groups can be configured to scale in or out by linking their scaling policies to Cloud Watch alarms. Many people use CPU utilization as a scaling trigger but you can use any Cloud Watch metric you like. In your case you could use your queue's ApproximateNumberOfMessagesVisible metric.
For example, if you create an alarm that fires when the ApproximateNumberOfMessagesVisible > 500 and link that to the scale out policy of your auto scaling group, the group will create new instances whenever the queue has more that 500 messages in it.