I am currently working on a deploying a front-end that will scale dynamically based on the usage on google cloud platform. I was advised by a friend to use google cloud run. I have my angular front end building to a docker image with a simple express server and deployed on google cloud run. This (from what I understand) means that when the request threshold is met on one of the docker instances, another will boot up and take on the additional requests. How does this differ from a load balancer? Do I need a load balancer on top of google cloud run scaling?
I apologize in advance for my lack of devops knowledge.
Cloud Run provides autoscaling, meaning that you don't necessarily need to put a Load Balancer in front of your Cloud run services (which in the case of serverless products in GCP are known as Network Endpoint Groups), as this is done automatically on your behalf: each revision is automatically scaled to the number of container instances needed to handle all incoming requests, and even cooler since it's a scale to zero service the number of instances can reach zero if you are not receiving any requests (be aware that spinning up each new instance does necessarily take some time, which is known as cold starts, so you can always set a value of min_instances to avoid this type of issues). The use of Network Endpoint Groups is more oriented if you'd only have the backend part of your application hosted in Cloud Run, need your Load Balancer to do some sort of special routing and I believe the most wide use will be if you need to have a fixed external IP address for your application.
Related
Is it possible to mirror the traffic of one Cloud Run revision onto another?
We have a running Cloud Run service (one revision with 100% traffic), and we want to evaluate a change of in our algorithm, without actually deploying it to production.
It would be ideal, if we can just deploy a second revision (with 0% traffic, but with a revision URL) and mirror all incoming requests onto this URL.
I've seen, that you can mirror traffic using an Internal HTTP(S) Load Balancer (https://cloud.google.com/load-balancing/docs/l7-internal/setting-up-traffic-management#multiple_allowed_in_a_url_map ).
However, as far as I understand, I can't use an Internal HTTP(S) Load Balancer for Cloud Run, but only for VMs (Compute Engine).
For the serverless NEGs, it's possible to create External HTTP(S) Load Balancer, but those don't support this feature.
Am I understand it correctly, that it's not possible to mirror the traffic of Cloud Run with load balancers?
Are there any other solutions? Or do we need need to deploy our own load balancer (e.g. Nginx), and define our mirroring strategy there?
AFAIK, you can't mirror the request. you need, as you said, to deploy a proxy that split the traffic.
You can use another CLoud Run in front of your target service and then duplicate the request to the service tags. Nginx is an option, you can deploy it on Cloud Run for example.
I'd like to have my Google Cloud Run services privately communicate with one another over non-HTTP and/or without having to add bearer authentication in my code.
I'm aware of this documentation from Google which describes how you can do authenticated access between services, although it's obviously only for HTTP.
I think I have a general idea of what's necessary:
Create a custom VPC for my project
Enable the Serverless VPC Connector
What I'm not totally clear on is:
Is any of this necessary? Can Cloud Run services within the same project already see each other?
How do services address one another after this?
Do I gain the ability to use simpler by-convention DNS names? For example, could I have each service in Cloud Run manifest on my VPC as a single first level DNS name like apione and apitwo rather than a larger DNS name that I'd then have to hint in through my deployments?
If not, is there any kind of mechanism for services to discover names?
If I put my managed Cloud SQL postgres database on this network, can I control its DNS name?
Finally, are there any other gotchas I might want to be aware of? You can assume my use case is very simple, two or more long lived services on Cloud Run, doing non-HTTP TCP/UDP communications.
I also found a potentially related Google Cloud Run feature request that is worth upvoting if this isn't currently possible.
Cloud Run services are only reachable through HTTP request. you can't use other network protocol (SSH to log into instances for example, or TCP/UDP communication).
However, Cloud Run can initiate these kind of connection to external services (for instance Compute Engine instances deployed in your VPC, thanks to the serverless VPC Connector).
the serverless VPC connector allow you to make a bridge between the Google Cloud managed environment (where live the Cloud Run (and Cloud Functions/App Engine) instances) and the VPC of your project where you have your own instances (Compute Engine, GKE node pools,...)
Thus you can have a Cloud Run service that reach a Kubernetes pods on GKE through a TCP connection, if it's your requirement.
About service discovery, it's not yet the case but Google work actively on that and Ahmet (Google Cloud Dev Advocate on Cloud Run) has released recently a tool for that. But nothing really build in.
Background:
Trying to run Vault in Google Cloud Run (fully managed) and trying to decide if setting up HA is possible. Vault requires a single active node (container), and inbound requests to a standby node (container) need to be forwarded or redirected.
Forwarded means a side connection on another port (i.e. clients on tcp/8200 and pod-to-pod on tcp/8201). Is this possible, I don't see anything about this in docs.
Redirected means that a standby node (container) would need to 307 redirect to the active node's address. This would either be the Cloud Run url or the pod specific url. If it was the Cloud Run url then the load balancer could just send it right back to the standby node (loop); not good. It would need to be the pod url. Would the Cloud Run "proxy" (not sure what to call it) be able to accept the client request but do an internal redirect from pod to pod to reach the active pod?
It seems like you’re new to the programming and traffic serving model of Cloud Run. I recommend checking out documentation and https://github.com/ahmetb/cloud-run-faq for some answers.
Briefly answering some of your points:
only 1 port number can be exposed to the outside world from a container running on Cloud Run
Cloud Run apps are only accessible via HTTPS (includes gRPC) protocol over port :443.
you cannot ensure 2 running containers at a time on Cloud Run (that's not what it's designed for, that's something Kubernetes or VMs are more suitable for).
Cloud Run is, by definition, for running stateless HA apps
there's no such thing as "pod URL" in Cloud Run. multiple replicas of an app will have the same address.
as you said, Cloud Run cannot distinguish multiple instances of the same app. if a container forwards a request to its own URL, it might end up getting the request again.
Your best bet is to deploy these two containers as separate applications to Cloud Run, so they have different URLs and different lifecycles. You can set "maximum instances" to 1 to ensure these VaultService1 and VaultService2 never get additional replicas.
I have simple server now (some xeon cpu hosted somewhere), running apache/php/mysql (no docker, but its a possibility) and Im expecting some heavy traffic and I need my server to handle that.
Currently the server can handle about 100 users at once, I need it to handle couple thousands possibly.
What would be easiest and fastest solution to move my app to some scalable hosting?
I have no experience with AWS or something like that.
I was reading about AWS and similar, but Im mostly confused and not sure what should I choose.
The basic choice is:
Scale vertically by using a bigger computer. However, you will eventually hit a limit and you will have a single-point of failure (one server!), or
Scale horizontally by adding more servers and spreading the traffic across the servers. This has the added advantage of handling failure because, if one server fails, the others can continue serving traffic.
A benefit of doing horizontal scaling in the cloud is the ability to add/remove servers based on workload. When things are busy, add more servers. When things are quiet, remove servers. This also allows you to lower costs when things are quiet (which is not possible on-premises when you own your own equipment).
The architecture involves putting multiple servers behind a Load Balancer:
Traffic comes into a Load Balancer
The Load Balancer sends the request to a server (often based upon some measure of how "busy" each server is)
The server processes the request and sends a response back to the Load Balancer
The Load Balancer sends the response to the original requester
AWS has several Load Balancers available, which vary by need. If you are simply sending traffic to a single application that is installed on all servers, a Network Load Balancer should be sufficient. For situations where different parts of the application are on different servers (eg mobile interface vs web interface), you could use a Application Load Balancer.
AWS also assists with horizontal scaling by providing the Amazon EC2 Auto Scaling service. This allows you to specify details of the servers to launch (disk image, instance type, network settings) and Auto Scaling can then automatically launch new servers when required and terminate ones that aren't required. (Note that they launch and terminate, not start and stop.)
You can further define scaling policies that tell Auto Scaling when to launch/terminate instances by measuring metrics such as CPU Utilization. This way, the number of servers can approximately match the volume of traffic.
It should be mentioned that if you have a database, it should be stored separately to the application servers so that it does not get terminated. You could use the Amazon Relational Database Service (RDS) to run a database for you, or you could run one on a separate Amazon EC2 instance.
If you want to find out more about any of the above technologies, there are plenty of talks on YouTube or blog posts that can explain and demonstrate their use.
I asked this on serverfault but evidently to basic for them.
I have read through a ton of documents on the Google cloud platform but most of it is over my head, I am a developer and not a network type person. I think what I am trying to do is pretty basic but I can't find anywhere that has step by step instructions on how to accomplish the process. Google documentation seems to assume a good deal of networking knowledge.
I have :
created a "managed instance group" with Autoscaling turned on.
RDP'd into the server and installed the required software
upload all the code to run a site
set up DNS to point to that site
tested and everything seems to work just as I would expect.
I need to set up a load balancer and change the DNS to point to that instead of the server.
My web app doesn't have a back-end perse as it is entirely api driven so not sure what to do with the "backend configuration" part of setting up the load balance service.
I have an SSL cert on the server but don't know how to move it to the load balancer.
When the autoscaling kicks in will all the software and code from the current server be used or is there another step that I need to do to make this happen. If I update code on the server via RDP will the new autoscale created instances be aware of it?
Can anyone explain these steps to point me to a place NOT written for a sysadmin that I can try to understand them myself?
Here I am sharing with you a short YouTube video (less than 5 mins) of step by step instructions on how to quickly configure a load balancer in Google Cloud Platform with backend services.
I also would like to mention here that SSL terminates at the load balancer. Here is the public documentation on Creating and Using SSL Certificates in load balancing.
Finally, you want to make sure that all the software and configurations you want on each instance is done before you create the managed instance group, otherwise, the changes you make on one server will not reflect in the others.
To do this, configure your server with all the necessary software and settings. Once the server is in the correct state, create an image out of your server. You can then use this image to create an instance template which you will use for the managed instance group.