For google kubernetes engine, the master node, and ETCD cluster is abstracted away from the me the user.
Most of the ETCD backup guide (such as) assumes I have the endpoint or file system access to perform backups respectively.
As such - how do I perform such a backup, and restoration of ETCD in GKE?
Or would GKE provide subsequently a managed backup/restore service similar to cloud SQL?
Also if a full backup is not possible, even namespace backups will be great.
To clarify the scenario to guard against is not "if google goes down", but "if we do something stupid"
GKE backend is completely managed and thus, there is no way to access the etcd API. Even if you could access the cluster etcd, there are no guarantees of backwards compatibility for the storage backend. So the storage layer could change.
You'll have to use the kubernetes API which is backwards compatible for any backups you might want. There is some discussion on the kubernetes users google group here which should clarify this further.
Related
I am creating a connection with a Google Service Account in my Google Cloud Composer that privilegies a DAG for a specific use case with deals with sensitive data, the point is that I want that connection to be exclusive for a certain DAG and no other could see or use it.
Is there a way of doing it?
Currently this is not possible in airflow, and even you cannot implement that using a custom backend secret or another solution, where the connection is not a context variable, and it's accessible from anywhere in airflow not only from a run context.
Infortunately the service account given to Cloud Composer in the creation of cluster, is for all DAGs of this cluster.
It can be too much, but maybe you can create another Cloud Composer cluster 2 (GKE autopilot), with the minimum sizing for machines, containing this DAG that treats sensitive data.
Then you can give a SA with the needed privileges to this cluster.
The disadvantage of this solution is you will have a higher cost, because you have a second cluster. It will increases the cost even if the machine sizes are low.
It is worth noting that Composer 2 with GKE autopilot is cheaper that classical GKE cluster.
Maybe another solution, if the rework is not too important, you can rewrite only your DAG treating sensitive data to Cloud Workflow.
Cloud Workflow is serverless and you can give it a dedicated service account.
If I want to set up a PostgreSQL-compatible database on AWS, I have 3 choices:
Manual: EC2 (spin up an EC2 and manually set up and maintain a PostgreSQL database on the EC2).
Managed: RDS-PostgreSQL (AWS will set up and maintain the database instances).
Fully managed: RDS Aurora in PostgreSQL-compatible mode (AWS will set up and maintain the database instances, just like with RDS-PostgreSQL?)
My question concerns the difference between "managed" and "fully managed". Many AWS certification training materials highlight the "fully managed" feature as an advantage that RDS Aurora in PostgreSQL-compatible mode has over RDS-PostgreSQL. I don't understand what the distinction is.
AWS documentation lists these items as the advantage of the "managed" feature of RDS Postgres: "hardware provisioning, database setup, patching and backups". To compare, these are the items that AWS documentation lists as the advantage of the "fully managed" feature of RDS Aurora: "hardware provisioning, software patching, setup, configuration, or backups". As far as I can tell, the only difference between these lists is "configuration". What am I missing?
Note that I am not asking about other differences between RDS Aurora and RDS Postgres. I'm specifically asking about the difference between "managed" and "fully managed".
Your concern is to understand fully managed and managed services.
Let me explain fully managed service with an example of DynamoDB, which is a fully managed service. AWS manages all infrastructure, and software updates, and at the end all you need to do is use the service and maybe set up some IAM permissions to access it.
While in case of managed services it works on the shared responsibility, in short you have more control on it, AWS do not manage at Infra level like security patching, updates, scaling etc.
Now the above explanation of managed and fully managed services can be linked with multiple AWS services.
Google Cloud Platform has made hybrid- and multi-cloud computing a reality through Anthos which is an open application modernization platform. How does Anthos work for distributed data platforms?
For example, I have my data in Teradata On-premise, AWS Redshift and Azure Snowflake. Can Anthos joins all datasets and allow users to query or perform reporting with low latency? What is the equivalent of GCP Anthos in AWS and Azure?
Your question is wide. Anthos is designed for managing and distributing container accross several K8S cluster.
For a simpler view, imagine this: you have the Anthos master, and its direct node are K8S masters. If you ask Anthos Master to deploy a pod on AWS for example. Anthos master forward the query to K8S master deployed on EKS, and your pod is deployed on AWS.
Now, rethink your question: what about the data? Nothing magic, if your data are shared across several clusters you have to federate them with a system designed for this. It's quite similar than with only one cluster and with data on different node.
Anyway, you point here the real next challenge of multi-cloud/hybrid deployment. Solutions will emerge from this empty space.
Finally your last point: Azure and AWS equivalent. There isn't.
The newest Azure ARC seems to be light: it only allow to manage VM out of Azure Platform with an agent on it. Nothing as manageable as Anthos. for example: You have 3 VM on GCP and you manage them with Azure ARC. You deployed on each an NGINX and you want to set up a loadbalancer in from of your 3 VM. I don't catch how you can do this with Azure ARC. With Anthos, it's simply a service exposition of K8S -> The Loadbalancer will be deployed according with the cloud platform implementation.
About AWS, outpost is an hardware solution: you have to buy AWS specific hardware and to plug it in your OnPrem infrastructure. Need more investment on prem in your move to cloud strategy? Hard to convince. And not compliant with other cloud provider. BUT ReInvent is coming next month. Maybe an outsider?
I wanted to get List of containers and their details running on GCP Kubernets,
From API page https://developers.google.com/apis-explorer/#p/container/v1/
We could get Cluster and Node details, but I'm looking for more granular levels like Pods and Container.
Is there any way to get those.
Pod and container details are accessible through the kubernetes API, not using the Google Cloud SDK.
Unfortunately, this means getting that information on a per cluster basis.
Alternatively, the Cloud Console can be used by going to Kubernetes Engine > Workloads which will list all replica sets and individual pods (if not controlled by a replicaset).
You might be able to configure Stackdriver Monitoring to create a group consisting of all the pod and container resources in use in the project and then use the Monitoring API to call on that group, but I haven't tested that.
I have this MYSQL database pod running within a google cloud cluster, also using a persistent volume(GCEpersistentdisk) to backup my data.
Is there a way to also have a AWS persistent volume(AWSElasticBlockStore) backing up the same Pod, in case something goes wrong with google cloud platform and can't reach any of my data, so when I create another Kubernetes Pod within AWS, I'll be able to get my latest data(before GCP crushes) from that AWSElasticBlockStore.
If not what's the best way to simultaneously backing up a kubernetes database pod at two different cloud provider. So when one crushes, you'll be still able to deploy at the other.