How to connect Airflow with IA Platform? - google-cloud-platform

I am in a project and I need to connect Airflow to IA Platform (Artificial Intelligence Google Cloud). Are there any connectors I can use? And I also need to start a job from there. Thanks.

Have you considered using Airflow GCP Operator?
Airflow has extensive support for the Google Cloud Platform.
See the GCP connection type documentation to configure connections to GCP.
All hooks are based on airflow.gcp.hooks.base.GoogleCloudBaseHook.
see: https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#gcp-google-cloud-platform
Alternatively, Google has productise the Apache Airflow into Google Cloud Composer which has inbuilt integrations with Google AI services. So using Cloud Composer might prove beneficial in a long run.

Related

Where to keep the Dataflow and Cloud composer python code?

It probably is a silly question. In my project we'll be using Dataflow and Cloud composer. For that I had asked permission to create a VM instance in the GCP project to keep the both the Dataflow and Cloud composer python program. But the client asked me the reason of creation of a VM instance and told me that you can execute the Dataflow without the VM instance.
Is that possible? If yes how to achieve it? Can anyone please explain it? It'll be really helpful to me.
You can run Dataflow pipelines or manage Composer environments in you own computer once your credentials are authenticated and you have both the Google SDK and Dataflow Python library installed. However, this depends on how you want to manage your resources. I prefer to use a VM instance to have all the resources I use in the cloud where it is easier to set up VPC networks including different services. Also, saving data from a VM instance into GCS buckets is usually faster than from an on-premise computer/server.

Running a code from an instance in Google Cloud Composer

I am new to google cloud composer. I have some code in google cloud compute engine -
for eg: test.py
Currently I am using Jenkins as my scheduler - and I'm running the code like below
echo "cd /home/user/src/digital_platform &&/home/user/venvs/bdp/bin/python -m test.test.test" | ssh user#instance-dp
I want to run the same code from google cloud composer.
How I can do that..
Basically I need to ssh to an instance in google cloud and run the code in an automated way using google cloud composer.
It seems that SSHOperator might be something that might work for you. This operator is an Airflow feature, not Cloud Composer feature per se.
The other operator that you might want to take a look at before making your final decision is BaskOperator
You need to create a DAG (workflows), Cloud Composer schedules only the DAGs that are in the DAGs folder in the environment's Cloud Storage bucket. Each Cloud Composer environment has a web server that runs the Airflow web interface that you can use to manage DAGs.
Bash Operator is useful to run command-line programs. I suggest you follow the Cloud Composer Quickstart which shows you how to create a Cloud Composer environment in the Google Cloud Console and run a simple Apache Airflow DAG.

Setup "Stackdriver Kubernetes Monitoring" for AWS

Google Cloud Platform announced "Stackdriver Kubernetes Monitoring" at Kubecon 2018. It looks awesome.
I am an AWS user running a few Kubernetes clusters and immediately had envy, until I saw that it also supported AWS and "on prem".
Stackdriver Kubernetes Engine Monitoring
This is where I am getting a bit lost.
I cannot find any documentation for helping me deploy the agents onto my Kubernetes clusters. The closest example I could find was here: Manual installation of Stackdriver support, but the agents are polling for "internal" GCP metadata services.
E0512 05:14:12 7f47b6ff5700 environment.cc:100 Exception: Host not found (authoritative): 'http://metadata.google.internal./computeMetadata/v1/instance/attributes/cluster-name'
I'm not sure the Stackdriver dashboard has "Stackdriver Kubernetes Monitoring" turned on. I don't seem to have the same interface as the demo on YouTube here
I'm not sure if this is something which will get turned on when I configure the agents correctly, or something I'm missing.
I think I might be missing some "getting started" documentation which takes me through the setup.
You can use a Stackdriver partner service, Blue Medora BindPlane, to monitor AWS Kubernetes or almost anything else in AWS for that matter or on-premise. Here's an article from Google Docs about the partnership: About Blue Medora; you can signup for BindPlane through the Google Cloud Platform Marketplace.
It looks like BindPlane is handling deprecated Stackdriver monitoring agents. Google Cloud: Transition guide for deprecated third-party integrations
As per this article, currently Stackdriver Kubernetes Monitoring beta release only supports Kubernetes version v1.10.2 clusters running on Google Cloud Platform's Kubernetes Engine. To track when this feature will be available in AWS, I suggest creating a feature request in Public Issue Tracker.
Stackdriver monitoring of Amazon EKS, Azure AKS, and general purpose Kubernetes running on non--GCP hosted VMs is available if you enable the BindPlane option for Stackdriver.
https://cloud.google.com/stackdriver/blue-medora

How to integrate on premise logs with GCP stackdriver

I am evaluating stackdriver from GCP for logging across multiple micro services.
Some of these services are deployed on premise and some of them are on AWS/GCP.
Our services are either .NET or nodejs based apps and we are invested in winston for nodejs and nlog in .net.
I was looking # integrating our on-premise nodejs application with stackdriver logging. Looking # https://cloud.google.com/logging/docs/setup/nodejs the documentation it seems that there we need to install the agent for any machine other than the google compute instances. Is this correct?
if we need to install the agent then is there any way where I can test the logging during my development? The development environment is either a windows 10/mac.
There's a new option for ingesting logs (and metrics) with Stackdriver as most of the non-google environment agents look like they are being deprecated. https://cloud.google.com/stackdriver/docs/deprecations/third-party-apps
A Google post on logging on-prem resources with stackdriver and Blue Medora
https://cloud.google.com/solutions/logging-on-premises-resources-with-stackdriver-and-blue-medora
for logs you still need to install an agent on each box to collect the logs, it's a BindPlane agent not a Google agent.
For node.js, you can use the #google-cloud/logging-winston and #google-cloud/logging-bunyan modules from anywhere (on-prem, AWS, GCP, etc.). You will need to provide projectId and auth credentials manually if not running on GCP. Instructions on how to set these up is available in the linked pages.
When running on GCP we figure out the exact environment (App Engine, Compute Engine, etc.) automatically and the logs should up under those resources in the Logging UI. If you are going to use the modules from your development machines, we will report the logs against the 'global' resource by default. You can customize this by passing a specific resource descriptor yourself.
Let us know if you run into any trouble.
I tried setting this up on my local k8s cluster. By following this: https://kubernetes.io/docs/tasks/debug-application-cluster/logging-stackdriver/
But i couldnt get it to work, the fluentd-gcp-v2.0-qhqzt keeps crashing.
Also, the page mentions that there are multiple issues with stackdriver logging if you DONT use it on google GKE. See the screenshot.
I think google is trying to lock you in into GKE.

Difference between Cloud Foundry & Pivotal Web Services

I read on wikipedia that cloud foundry open source software is available to anyone whereas the Pivotal Web Services is a commercial product from Pivotal.
I kinda searched a lot on internet but did not find any cloud foundry open source software implementation example. Everything is for Pivotal product which provides a 2 months free trial service.
So can anyone tell me what is the cloud foundry open source software?
And what exactly is the difference between cloud foundry OSS & Pivotal CF?
Cloud Foundry is open source software, but if you are looking to tinker with it for the first time, using the OSS is a bit involved. You will need to have a provisioned cloud environment, you will install it yourself using MicroBosh, and everything will be done through the command line.
Pivotal Cloud Foundry is a commercial implementation that makes it easier to get up and running as you are learning the project. It provides a hosted environment in Pivotal Web Services so that you don't have to install it yourself, a web interface that makes managing the environment easier, and a number of pre-provisioned services including relational databases and messaging queues. This is the best starting point if you are just learning the technology.
To add to the above answer, Pivotal Cloud Foundry offers a public cloud offering called Pivotal Web Services where you can signup and deploy your apps on the cloud which is hosted by Pivotal.
On the other hand they also allow enterprises to host private cloud environment by installing components of the cloud infrastructure on VMWare VSphere, AWS, OpenStack Check this(http://docs.pivotal.io/pivotalcf/installing/pcf-docs.html) link out.