Location of /home/airflow - google-cloud-platform

I have specified 3 nodes when creating a cloud composer environment. I tried to connect to worker nodes via SSH but I am not able to find airflow directory in /home. So where exactly is it located?

Cloud Composer runs Airflow on GKE, so you won't find data directly on any of the host GCE instances. Instead, Airflow processes are run within Kubernetes-managed containers, which either mount or sync data to the /home/airflow directory. To find the directory you will need to look within a running container.
Since each environment stores its Airflow data in a GCS bucket, you can alternatively inspect files by using Cloud Console or gsutil. If you really want to view /home/airflow with a shell, you can use kubectl exec which allows you to run commands/open a shell on any pod/container in the Kubernetes cluster. For example:
# Obtain the name of the Composer environment's GKE cluster
$ gcloud composer environments describe $ENV_NAME
# Fetch Kubernetes credentials for that cluster
$ gcloud container cluster get-credentials $GKE_CLUSTER_NAME
Once you have Kubernetes credentials, you can list running pods and SSH into them:
# List running pods
$ kubectl get pods
# SSH into a pod
$ kubectl exec -it $POD_NAME bash
airflow-worker-a93j$ ls /home/airflow

Related

Google Cloud Compute - Cannot get access to account security instance

I'm trying to run a docker container via Compute (not Cloud Run as I need long term instance)
The container works fine on the local machine, it can access GCloud account resources.
I can see that it's running on a Container-Optimized OS on Compute
I tried running the following commands in order, but I get the same issue. CONSUMER_INVALID. The IAM account has access to all the required permissions, I tripled checked this.
// Fix gcloud issue
alias gcloud='(docker images google/cloud-sdk || docker pull google/cloud-sdk) > /dev/null;docker run -t -i --net=host -v $HOME/.config:/.config -v /var/run/docker.sock:/var/run/docker.sock -v /usr/bin/docker:/usr/bin/docker google/cloud-sdk gcloud'
// Setup GCloud
gcloud
// Enable access via to dockers gcloud
gcloud auth configure-docker
// Enable access via to dockers gcloud
gcloud auth configure-docker
Need to run
docker-credential-gcr configure-docker
export GOOGLE_CLOUD_PROJECT=project-352014
Not sure what to do now; seems Compute isn't communicating property with the internal account resources?

Can you access the Airflow CLI within Google Cloud Composer?

I'm aware that many of the common Airflow management commands are made available through the gcloud CLI. However, I'm troubleshooting some DAG scheduling and would like to use the schedule and next_execution commands directly on the cluster.
Is there an easy way to do this?
It's possible to access the full Airflow CLI by using kubectl exec to SSH into Composer pods. To do so, obtain the name of the GKE cluster associated with your environment, and get cluster credentials for it:
gcloud container clusters get-credentials $CLUSTER_NAME --zone=$ZONE
Then, use kubectl to check for the Composer namespace, and then find a pod and SSH to it:
kubectl get namespaces | grep composer
kubectl get pods --namespace=$NAMESPACE | grep airflow
kubectl exec -it --namespace=$NAMESPACE $POD_NAME -- bash
From within a pod, you can use airflow with any command supported by that version of Airflow. However, it should also be noted that this also provides full access to commands that can make your environment permanently unusable (such as resetdb), so they should be used with care.

Connect to Memorystore from Cloud Run

I want to run a service on Google Cloud Run that uses Cloud Memorystore as cache.
I created an Memorystore instance in the same region as Cloud Run and used the example code to connect: https://github.com/GoogleCloudPlatform/golang-samples/blob/master/memorystore/redis/main.go this didn't work.
Next I created a Serverless VPC access Connectore which didn't help. I use Cloud Run without a GKE Cluster so I can't change any configuration.
Is there a way to connect from Cloud Run to Memorystore?
To connect Cloud Run (fully managed) to Memorystore you need to use the mechanism called "Serverless VPC Access" or a "VPC Connector".
As of May 2020, Cloud Run (fully managed) has Beta support for the Serverless VPC Access. See Connecting to a VPC Network for more information.
Alternatives to using this Beta include:
Use Cloud Run for Anthos, where GKE provides the capability to connect to Memorystore if the cluster is configured for it.
Stay within fully managed Serverless but use a GA version of the Serverless VPC Access feature by using App Engine with Memorystore.
While waiting for serverless VPC connectors on Cloud Run - Google said yesterday that announcements would be made in the near term - you can connect to Memorystore from Cloud Run using an SSH tunnel via GCE.
The basic approach is the following.
First, create a forwarder instance on GCE
gcloud compute instances create vpc-forwarder --machine-type=f1-micro --zone=us-central1-a
Don't forget to open port 22 in your firewall policies (it's open by default).
Then install the gcloud CLI via your Dockerfile
Here is an example for a Rails app. The Dockerfile makes use of a script for the entrypoint.
# Use the official lightweight Ruby image.
# https://hub.docker.com/_/ruby
FROM ruby:2.5.5
# Install gcloud
RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz
RUN mkdir -p /usr/local/gcloud \
&& tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz \
&& /usr/local/gcloud/google-cloud-sdk/install.sh
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
# Generate SSH key to be used by the SSH tunnel (see entrypoint.sh)
RUN mkdir -p /home/.ssh && ssh-keygen -b 2048 -t rsa -f /home/.ssh/google_compute_engine -q -N ""
# Install bundler
RUN gem update --system
RUN gem install bundler
# Install production dependencies.
WORKDIR /usr/src/app
COPY Gemfile Gemfile.lock ./
ENV BUNDLE_FROZEN=true
RUN bundle install
# Copy local code to the container image.
COPY . ./
# Run the web service on container startup.
CMD ["bash", "entrypoint.sh"]
Finally open an SSH tunnel to Redis in your entrypoint.sh script
# !/bin/bash
# Memorystore config
MEMORYSTORE_IP=10.0.0.5
MEMORYSTORE_REMOTE_PORT=6379
MEMORYSTORE_LOCAL_PORT=6379
# Forwarder config
FORWARDER_ID=vpc-forwarder
FORWARDER_ZONE=us-central1-a
# Start tunnel to Redis Memorystore in background
gcloud compute ssh \
--zone=${FORWARDER_ZONE} \
--ssh-flag="-N -L ${MEMORYSTORE_LOCAL_PORT}:${MEMORYSTORE_IP}:${MEMORYSTORE_REMOTE_PORT}" \
${FORWARDER_ID} &
# Run migrations and start Puma
bundle exec rake db:migrate && bundle exec puma -p 8080
With the solution above Memorystore will be available to your application on localhost:6379.
There are a few caveats though
This approach requires the service account configured on your Cloud Run service to have the roles/compute.instanceAdmin role, which is quite powerful.
The SSH keys are backed into the image to speedup container boot time. That's not ideal.
There is no failover if your forwarder crashes.
I've written a longer and more elaborated approach in a blog post that improves the overall security and adds failover capabilities. The solution uses plain SSH instead of the gcloud CLI.
If you need something in your VPC, you can also spin up Redis on Compute Engine
It's more costly (especially for a Cluster) than Redis Cloud - but an temp solution if you have to keep the data in your VPC.

Kubernetes run on AWS

I've been struggling with configuring Kubernetes for many hours and I don't know how to move it forward.
What I did :
I created few services using spring cloud
I created docker images for each service
I pushed those images to docker hub
I launched AWS by running
export KUBERNETES_PROVIDER=aws; wget -q -O - https://get.k8s.io | bash
Command kubectl cluster-info shows that it actually works.
I created Kubernetes pods for each service. Command kubectl get pods
shows that all pods have status running.
The problem is that when I log to my AWS account I don't see any running instance, although I can see kubernetes-staging created in my S3 bucket.
My goal is to actually access my service , not on localhost. How can I do it ?
You should be able to see instances of course - as #kichik mentioned check whether your AWS console is using the same region as the deployment scripts.
To use your services/applications the next step is to expose them to the public with Kubernetes services as described here and here

How to setup Kubernetes Master HA on AWS

What I am trying to do:
I have setup kubernete cluster using documentation available on Kubernetes website (http_kubernetes.io/v1.1/docs/getting-started-guides/aws.html). Using kube-up.sh, i was able to bring kubernete cluster up with 1 master and 3 minions (as highlighted in blue rectangle in the diagram below). From the documentation as far as i know we can add minions as and when required, So from my point of view k8s master instance is single point of failure when it comes to high availability.
Kubernetes Master HA on AWS
So I am trying to setup HA k8s master layer with the three master nodes as shown above in the diagram. For accomplishing this I am following kubernetes high availability cluster guide, http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
What I have done:
Setup k8s cluster using kube-up.sh and provider aws (master1 and minion1, minion2, and minion3)
Setup two fresh master instance’s (master2 and master3)
I then started configuring etcd cluster on master1, master 2 and master 3 by following below mentioned link:
http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
So in short i have copied etcd.yaml from the kubernetes website (http_kubernetes.io/v1.1/docs/admin/high-availability/etcd.yaml) and updated Node_IP, Node_Name and Discovery Token on all the three nodes as shown below.
NODE_NAME NODE_IP DISCOVERY_TOKEN
Master1
172.20.3.150 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master2
172.20.3.200 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master3
172.20.3.250 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
And on running etcdctl member list on all the three nodes, I am getting:
$ docker exec <container-id> etcdctl member list
ce2a822cea30bfca: name=default peerURLs=http_localhost:2380,http_localhost:7001 clientURLs=http_127.0.0.1:4001
As per documentation we need to keep etcd.yaml in /etc/kubernete/manifest, this directory already contains etcd.manifest and etcd-event.manifest files. For testing I modified etcd.manifest file with etcd parameters.
After making above changes I forcefully terminated docker container, container was existing after few seconds and I was getting below mentioned error on running kubectl get nodes:
error: couldn't read version from server: Get httplocalhost:8080/api: dial tcp 127.0.0.1:8080: connection refused
So please kindly suggest how can I setup k8s master highly available setup on AWS.
To configure an HA master, you should follow the High Availability Kubernetes Cluster document, in particular making sure you have replicated storage across failure domains and a load balancer in front of your replicated apiservers.
Setting up HA controllers for kubernetes is not trivial and I can't provide all the details here but I'll outline what was successful for me.
Use kube-aws to set up a single-controller cluster: https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws.html. This will create CloudFormation stack templates and cloud-config templates that you can use as a starting point.
Go the AWS CloudFormation Management Console, click the "Template" tab and copy out the complete stack configuration. Alternatively, use $ kube-aws up --export to generate the cloudformation stack file.
User the userdata cloud-config templates generated by kube-aws and replace the variables with actual values. This guide will help you determine what those values should be: https://coreos.com/kubernetes/docs/latest/getting-started.html. In my case I ended up with four cloud-configs:
cloud-config-controller-0
cloud-config-controller-1
cloud-config-controller-2
cloud-config-worker
Validate your new cloud-configs here: https://coreos.com/validate/
Insert your cloud-configs into the CloudFormation stack config. First compress and encode your cloud config:
$ gzip -k cloud-config-controller-0
$ cat cloud-config-controller-0.gz | base64 > cloud-config-controller-0.enc
Now copy the content into your encoded cloud-config into the CloudFormation config. Look for the UserData key for the appropriate InstanceController. (I added additional InstanceController objects for the additional controllers.)
Update the stack at the AWS CloudFormation Management Console using your newly created CloudFormation config.
You will also need to generate TLS asssets: https://coreos.com/kubernetes/docs/latest/openssl.html. These assets will have to be compressed and encoded (same gzip and base64 as above), then inserted into your userdata cloud-configs.
When debugging on the server, journalctl is your friend:
$ journalctl -u oem-cloudinit # to debug problems with your cloud-config
$ journalctl -u etcd2
$ journalctl -u kubelet
Hope that helps.
There is also kops project
From the project README:
Operate HA Kubernetes the Kubernetes Way
also:
We like to think of it as kubectl for clusters
Download the latest release, e.g.:
cd ~/opt
wget https://github.com/kubernetes/kops/releases/download/v1.4.1/kops-linux-amd64
mv kops-linux-amd64 kops
chmod +x kops
ln -s ~/opt/kops ~/bin/kops
See kops usage, especially:
kops create cluster
kops update cluster
Assuming you already have s3://my-kops bucket and kops.example.com hosted zone.
Create configuration:
kops create cluster --state=s3://my-kops --cloud=aws \
--name=kops.example.com \
--dns-zone=kops.example.com \
--ssh-public-key=~/.ssh/my_rsa.pub \
--master-size=t2.medium \
--master-zones=eu-west-1a,eu-west-1b,eu-west-1c \
--network-cidr=10.0.0.0/22 \
--node-count=3 \
--node-size=t2.micro \
--zones=eu-west-1a,eu-west-1b,eu-west-1c
Edit configuration:
kops edit cluster --state=s3://my-kops
Export terraform scripts:
kops update cluster --state=s3://my-kops --name=kops.example.com --target=terraform
Apply changes directly:
kops update cluster --state=s3://my-kops --name=kops.example.com --yes
List cluster:
kops get cluster --state s3://my-kops
Delete cluster:
kops delete cluster --state s3://my-kops --name=kops.identityservice.co.uk --yes