Custom DNS resolver for Google Cloud Dataflow pipeline - google-cloud-platform

I am trying to access Kafka and 3rd-party services (e.g., InfluxDB) running in GKE, from a Dataflow pipeline.
I have a DNS server for service discovery, also running in GKE. I also have a route in my network to access the GKE IP range from Dataflow instances, and this is working fine. I can manually nslookup from the Dataflow instances using my custom server without issues.
However, I cannot find a proper way to set up an additional DNS server when running my Dataflow pipeline. How could I achieve that, so that KafkaIO and similar sources/writers can resolve hostnames against my custom DNS?
sun.net.spi.nameservice.nameservers is tricky to use, because it must be called very early on, before the name service is statically instantiated. I would call java -D, but Dataflow is going to run the code itself directly.
In addition, I would not want to just replace the systems resolvers but merely append a new one to the GCP project-specific resolvers that the instance comes pre-configured with.
Finally, I have not found any way to use a startup script like for a regular GCE instance with the Dataflow instances.

I can't think of a way today of specifying a custom DNS in a VM other than editing /etc/resolv.conf[1] file in the box. I don't know if it is possible to share the default network. If it is machines are available at hostName.c.[PROJECT_ID].internal, which may serve your purpose if hostName is stable [2].
[1] https://cloud.google.com/compute/docs/networking#internal_dns_and_resolvconf [2] https://cloud.google.com/compute/docs/networking

Related

Run Job in GKE from a cloud function

So I was able to connect to a GKE cluster from a java project and run a job using this:
https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/JobExample.java
All I needed was to configure my machine's local kubectl to point to the GKE cluster.
Now I want to ask if it is possible to trigger a job inside a GKE cluster from a Google Cloud Function, which means using the same library https://github.com/fabric8io/kubernetes-client
but from a serverless environment. I have tried to run it but obviously kubectl is not installed in the machine where the cloud function runs. I have seen something like this working using a lambda function from AWS that uses https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/ecs/AmazonECS.html
to run jobs in an ECS cluster. We're basically trying to migrate from that to GCP, so we're open to any suggestion regarding triggering the jobs in the cluster from some kind of code hosted in GCP in case the cloud function can't do it.
Yes you can trigger your GKE method from a serverless environment. But you have to be aware to some edge cases.
With Cloud Functions, you don't manage the runtime environment. Therefore, you can't control what is installed on the container. And you can't control, or install Kubectl on it.
You have 2 solutions:
Kubernetes expose API. From your Cloud Functions you can simply call an API. Kubectl is an APIf call wrapper, nothing more! Of course, it requires more efforts, but if you want to stay on Cloud Functions you don't have any other choice
You can switch to Cloud Run. With Cloud Run, you can define your own container and therefore install Kubectl in it, in addition of your webserver (you have to wrap your Function in a webserver with Cloud Run, but it's pretty easy ;) )
Whatever the solution chosen, you also have to be aware of the GKE control plane exposition. If it is publicly exposed (generally not recommended), there is no issue. But, if you have a private GKE cluster, your control plane is only accessible from the internal network. To solve that with a serverless product, you have to create a serverless VPC connector to bridge the serverless Google-managed VPC with your GKE control-plane VPC.

Can we run an application that is configured to run on multi-node AWS EC2 K8s cluster using kops into local kubernetes cluster (using kubeadm)?

Can we run an application that is configured to run on multi-node AWS EC2 K8s cluster using kops (project link) into local Kubernetes cluster (setup using kubeadm)?
My thinking is that if the application runs in k8s cluster based on AWS EC2 instances, it should also run in local k8s cluster as well. I am trying it locally for testing purposes.
Heres what I have tried so far but it is not working.
First I set up my local 2-node cluster using kubeadm
Then I modified the installation script of the project (link given above) by removing all the references to EC2 (as I am using local machines) and kops (particularly in their create_cluster.py script) state.
I have modified their application yaml files (app requirements) to meet my localsetup (2-node)
Unfortunately, although most of the application pods are created and in running state, some other application pods are unable to create and therefore, I am not being able to run the whole application on my local cluster.
I appreciate your help.
It is the beauty of Docker and Kubernetes. It helps to keep your development environment to match production. For simple applications, written without custom resources, you can deploy the same workload to any cluster running on any cloud provider.
However, the ability to deploy the same workload to different clusters depends on some factors, like,
How you manage authorization and authentication in your cluster? for example, IAM, IRSA..
Are you using any cloud native custom resources - ex, AWS ALBs used as LoadBalancer Services
Are you using any cloud native storage - ex, your pods rely on EFS/EBS volumes
Is your application cloud agonistic - ex using native technologies like Neptune
Can you mock cloud technologies in your local - ex. Using local stack to mock Kinesis, Dynamo
How you resolve DNS routes - ex, Say you are using RDS n AWS. You can access it using a route53 entry. In local you might be running a mysql instance and you need a DNS mechanism to discover that instance.
I did a google search and looked at the documentation of kOps. I could not find any info about how to deploy to local, and it only supports public cloud providers.
IMO, you need to figure out a way to set up your local EKS cluster, and if there are any usage of cloud native technologies, you need to figure out an alternative way about doing the same in your local.
The true answer, as Rajan Panneer Selvam said in his response, is that it depends, but I'd like to expand somewhat on his answer by saying that your application should run on any K8S cluster given that it provides the services that the application consumes. What you're doing is considered good practice to ensure that your application is portable, which is always a factor in non-trivial applications where simply upgrading a downstream service could be considered a change of environment/platform requiring portability (platform-independence).
To help you achieve this, you should be developing a 12-Factor Application (12-FA) or one of its more up-to-date derivatives (12-FA is getting a little dated now and many variations have been suggested, but mostly they're all good).
For example, if your application uses a database then it should use DB independent SQL or no-sql so that you can switch it out. In production, you may run on Oracle, but in your local environment you may use MySQL: your application should not care. The credentials and connection string should be passed to the application via the usual K8S techniques of secrets and config-maps to help you achieve this. And all logging should be sent to stdout (and stderr) so that you can use a log-shipping agent to send the logs somewhere more useful than a local filesystem.
If you run your app locally then you have to provide a surrogate for every 'platform' service that is provided in production, and this may mean switching out major components of what you consider to be your application but this is ok, it is meant to happen. You provide a platform that provides services to your application-layer. Switching from EC2 to local may mean reconfiguring the ingress controller to work without the ELB, or it may mean configuring kubernetes secrets to use local-storage for dev creds rather than AWS KMS. It may mean reconfiguring your persistent volume classes to use local storage rather than EBS. All of this is expected and right.
What you should not have to do is start editing microservices to work in the new environment. If you find yourself doing that then the application has made a factoring and layering error. Platform services should be provided to a set of microservices that use them, the microservices should not be aware of the implementation details of these services.
Of course, it is possible that you have some non-portable code in your system, for example, you may be using some Oracle-specific PL/SQL that can't be run elsewhere. This code should be extracted to config files and equivalents provided for each database you wish to run on. This isn't always possible, in which case you should abstract as much as possible into isolated services and you'll have to reimplement only those services on each new platform, which could still be time-consuming, but ultimately worth the effort for most non-trival systems.

GCP service to ssh and run a script on 10 Virtual Machines in GCE without using a bastion VM

In a GCP project, I have 10 virtual machines in GCE (runs sshd).
I have a need to run a script on each of the 10 virtual machines (in GCE) once an hour. I would like this to be centralized because number of VMs will grow over time and I do not want to have to do this on every single VM. In addition, I would want to analyze the data I get back in a central place.
However, I do not want to use a bastion VM, because I would like a cloud-native solution that does not require maintaining yet another virtual machine.
Which GCP service can do this?
I have looked into Cloud Run and Cloud Composer. I was not able to do this with Cloud Run, although that may be my own lack of familiarity with the product. Cloud Composer seems like a overkill.
As #JohnHanley mentioned, you will need to write code or scripts to launch commands on VMs dynamically because GCP doesn't have the type of service you require.
You may want to consider Cloud Identity-Aware Proxy (IAP) as it can be used for building your solution:
IAP helps to protect SSH access to your VMs without needing to provide
your VMs with public IP addresses, and without having to set up
bastion hosts.
For instance, you can check the enable IAP on Compute Engine guide.
You can also create a feature request for Google to consider implementing this solution.
Your best solution, with no additional charges, would be to :
Use a start-up script on each GCE
In order to set a CRON instruction to execute your script
crontab.guru can help you fin the CRON instruction, hourly is 0 * * * *

anyway to tell which cloud provider current k8s cluster is running at?

I'm writing a k8s operator, with the knowledge of current cloud provider the k8s is currently running on, I can do some platform-specific tasks for users, such as prepare some default storage classes for users.
but how can an operator running in the k8s cluster know it is GCP or AWS?
After scanning through the APIs, the cloud provider leaves some clues here and there, for example, for the GKE cluster I am running now, it has an API named: /apis/nodemanagement.gke.io/v1alpha1
but I think it's a little bit too hack, and wonder if there is any more formal way to get this info.
No, this is not exposed in a consistent way. You should have the use put it in their config file or whatever.
Indeed, it's not consistent. When the configuration is added by default to kubectl, you have these patterns:
> kubectl config current-context
# For GCP
> gke_gbl-imt-homerider-basguillaueb_europe-west1-b_my-first-cluster-1
# For AWS
> arn:aws:eks:eu-west-1:306974639454:cluster/demo-knative
You can also rename the config is you prefer your own pattern.

Communication between instances in Cloud Foundry

Is there a possibility to communicate between multiple instances of an application deployed to Cloud Foundry?
I checked the Cloud Foundry API but I couldn't find any mention of this subject.
I already tried Hazelcast but unfortunately, my Cloud Foundry provider doesn't support Multicasting, so I would have to know the IP addresses of every other instance in order to connect.
I think I can't be the only one interested in this.
I recommend you use a messaging service (like RabbitMQ) to communicate between instances of applications. You can also store shared information in a database service or any remote location outside the file system.
It is generally not a good practice to build applications that require this type of communication in the cloud. Each instance should ideally be able to run independently and be stateless.
If you can programmatically access the ip addresses, you can build up the Hazelcast Config object and use that to setup your cluster. You can then rely on tcp-ip discovery.
pseudo code:
List<String> ipAddresses = cloudfoundry.getIps()
Config config = new Config();
config.getJoinConfig().getIpDiscovery().addMembers(ipAddresses)
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
You can even load your existing Hazelcast XML configuration and enhance it on the fly using the XmlConfigBuilder.
There might be occasions where the instances need to communicate amongst themselves instead of using an external component to gain efficiency or avoid dependency. One possibility is to use gorouter itself with specific instance id in http with below header.
X-Cf-App-Instance=app-uuid:instanceno-integer
Other option which I am yet to explore is explained in this link
https://ict.swisscom.ch/2018/05/container-networking-with-cloud-foundry/