I am currently trying to install the ops agent on some ubuntu 18.04 gcloud VM instance I am creating with a bash script, based off the gcloud guide accessible here. The script goes as following:
echo "Installing components for agent policies"
gcloud components install beta
echo "Enabling API and setting proper permissions for monitoring"
sh set-permissions.sh --project=XXX
gcloud beta compute instances ops-agents policies create ops-agents-policy-safe-rollout \
--agent-rules="type=logging,version=current-major,package-state=installed,enable-autoupgrade=true;type=metrics,version=current-major,package-state=installed,enable-autoupgrade=true" \
--os-types=short-name=ubuntu,version=18.04 \
--project=XXX \
--instances=zones/us-central1-a/instances/instance-XXX
...
gcloud compute instances create instance-XXX --boot-disk-size=100GB \
--boot-disk-type=pd-ssd --metadata=enable-oslogin=TRUE \
--image-family=ubuntu-minimal-1804-lts --image-project=ubuntu-os-cloud \
--no-service-account --no-scopes --project=XXX --zone=us-central-1 \
--network-interface "" --network-interface subnet=.../regions/us-central1/subnetworks/XXX,no-address
I am not getting any errors when executing this script, but when I go to GCP and try to look for metrics for my instance the charts for Memory Utilization and Disk Space Utilization say that the Ops Agent is required and that I should install it. Following the guide, and after verifying that the OS config agent is installed, I follow the steps in "The OS Config agent is installed but does not install the Ops agents". When I do so I get two errors. None of them is addressed in the guide:
Dec 14 15:34:34 bastion OSConfigAgent[600]: 2021-12-14T15:34:34.1627Z OSConfigAgent Error policies.go:49: Error running LookupEffectiveGuestPolicies: error getting token from metadata: metadata: GCE metadata "instance/service-accounts/default/identity?audience=osconfig.googleapis.com&format=full" not defined
Dec 14 15:34:36 bastion OSConfigAgent[600]: 2021-12-14T15:34:36.9551Z OSConfigAgent Error inventory.go:76: Error reporting inventory checksum: error getting token from metadata: metadata: GCE metadata "instance/service-accounts/default/identity?audience=osconfig.googleapis.com&format=full" not defined
How can I fix these errors to effectively install the Ops Agent? Thank you!
The log which you've provided tells little and this could have many reasons.
Make sure that eg. all of /etc/apt/sources.list.d/ are valid repositories.
Also make sure that the metadata is a) set up correctly and b) can be accessed:
enable-guest-attributes TRUE
enable-osconfig TRUE
This may well have to do with the --agent-rules argument, which you're passing.
Ever thought about a start-up script, which would simply install the agent?
Also see: Managing Agent Policies - Troubleshooting.
Related
I want to execute a script which is in my compute engine using cloudbuild but somehow cloudbuild is not able to ssh into my vm , in my vm "OS LOGIN" is enabled and also have only internal ip.
here is my cloudbuild.yaml file
steps:
name: 'gcr.io/cloud-builders/gcloud' id: Update staging server entrypoint: /bin/sh args:
'-c'
|
set -x &&
gcloud compute ssh vm_name --zone=us-central1-c --command='/bin/sh /pullscripts/pull.sh'
I am attaching my error pics
cloudbuild error page 1
cloudbuild error page 2
Also my question is , is it possible connect a vm using cloud sdk if "os login" is enabled.
You'll probably have to add the roles/iap.tunnelResourceAccessor role to the cloudbuild service account. Please read this Google documentation, which shows you what to do with a certain error code.
Error code 4033
Either you don't have permission to access the instance, the instance doesn't exist, or the instance is stopped.
in fact, you can use gcloudbuild to connect in any vm, just need a docker configuration and upload the files (private_key, scripts, etc). I've this repo to solve this problem: https://github.com/jmbl1685/gcloudbuild-vm-ssh-connect
I hope that the above help you
Try adding --internal-ip which looks like as follows:
gcloud compute ssh vm_name --zone=us-central1-c --internal-ip
We have installed Istio manually on GKE cluster. We want to install/add istio stack driver adapter so that Istio metrics are available on Stack Driver monitoring Dashboard of GCP. I am not able to get the metrics despite add the CRD as mentioned in
https://github.com/GoogleCloudPlatform/istio-samples/blob/master/common/install_istio.sh
git clone https://github.com/istio/installer && cd installer
helm template istio-telemetry/mixer-telemetry --execute=templates/stackdriver.yaml -f global.yaml --set mixer.adapters.stackdriver.enabled=true --namespace istio-system | kubectl apply -f -
I feel we are missing the authentication part. Can anyone help in resolving this?
I was unable to replicate your set up and I notice that the Istio version downloaded by the script was 1.4.2 which is not supported by GKE at this moment.
Nonetheless, I’d recommend you to check this document for troubleshooting and consult this guide to get Istio installed on GKE.
You should also be aware of couple of limitations when using Istio on GKE
I have a service created on Google Cloud run that I am able to deploy manually through the Google Cloud Console UI using an image on Container registry. But deployment from CLI is failing. Here is the command I am using and the error I get. I am not able to understand what I am missing:
$ gcloud beta run deploy service-name --platform managed --region region-name --image image-url
Deploying container to Cloud Run service [service-name] in project [project-name] region [region-name]
X Deploying...
. Creating Revision...
. Routing traffic...
Deployment failed
ERROR: (gcloud.beta.run.deploy) INVALID_ARGUMENT: The request has errors
- '#type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: spec.revisionTemplate.spec.container.ports should be empty
field: spec.revisionTemplate.spec.container.ports
Update 1:
I have updated the SDK using gcloud components update, but I still have the same issue
Here's my SDK Version
$gcloud version
Google Cloud SDK 270.0.0
beta 2019.05.17
bq 2.0.49
core 2019.11.04
gsutil 4.46
I am using a multistage docker build. Here's my Dockerfile:
FROM custom-dev-image
COPY . /project_dir
WORKDIR /project_dir
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
/usr/local/bin/go build -a \
-ldflags '-w -extldflags "-static"' \
-o /root/go/bin/executable ./cmds/project/main.go
FROM alpine:3.10
ENV GIN_MODE=release APP_NAME=project_name
COPY --from=0 /root/go/bin/executable /usr/local/bin/
CMD executable
I had this same problem and I assume it was because I had older Cloud Run deployment that was created before I had ran gcloud components update since some update.
I was able to fix it by deleting the whole Cloud Run service (through the GUI) and deploying it from scratch again (via terminal). I noticed that the ports: definition disappeared from the YAML once I did this.
After this I could do deployments normally.
This was a bug in Cloud Run. It has been fixed and deploying with CLI is working for me now. Here's the link to the issue I had raised with Google Cloud which has a response from them https://issuetracker.google.com/issues/144069696.
I want to test kubernetes for gitlab-ci, so I want to create my first k8s cluster on aws
So I follow the docs:
sudo snap install conjure-up --classic
# re-login may be required at that point if you just installed snap utility
conjure-up kubernetes
In the install process, I choose:
Canonical Distribution of Kubernetes
Helm
AWS
my credentials
us-east-2
Juju-as-a-Service (JaaS) Free Controller
Then I must log into JaaS. I log entering my Ubuntu One account, but it always fail:
Login failed, please try again: ERROR cannot log into "jimm.jujucharms.com": cannot get user details for "https://login.ubuntu.com/+id/W8KzXrQ":
not found
What am I forgetting ?
What I am trying to do:
I have setup kubernete cluster using documentation available on Kubernetes website (http_kubernetes.io/v1.1/docs/getting-started-guides/aws.html). Using kube-up.sh, i was able to bring kubernete cluster up with 1 master and 3 minions (as highlighted in blue rectangle in the diagram below). From the documentation as far as i know we can add minions as and when required, So from my point of view k8s master instance is single point of failure when it comes to high availability.
Kubernetes Master HA on AWS
So I am trying to setup HA k8s master layer with the three master nodes as shown above in the diagram. For accomplishing this I am following kubernetes high availability cluster guide, http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
What I have done:
Setup k8s cluster using kube-up.sh and provider aws (master1 and minion1, minion2, and minion3)
Setup two fresh master instance’s (master2 and master3)
I then started configuring etcd cluster on master1, master 2 and master 3 by following below mentioned link:
http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
So in short i have copied etcd.yaml from the kubernetes website (http_kubernetes.io/v1.1/docs/admin/high-availability/etcd.yaml) and updated Node_IP, Node_Name and Discovery Token on all the three nodes as shown below.
NODE_NAME NODE_IP DISCOVERY_TOKEN
Master1
172.20.3.150 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master2
172.20.3.200 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master3
172.20.3.250 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
And on running etcdctl member list on all the three nodes, I am getting:
$ docker exec <container-id> etcdctl member list
ce2a822cea30bfca: name=default peerURLs=http_localhost:2380,http_localhost:7001 clientURLs=http_127.0.0.1:4001
As per documentation we need to keep etcd.yaml in /etc/kubernete/manifest, this directory already contains etcd.manifest and etcd-event.manifest files. For testing I modified etcd.manifest file with etcd parameters.
After making above changes I forcefully terminated docker container, container was existing after few seconds and I was getting below mentioned error on running kubectl get nodes:
error: couldn't read version from server: Get httplocalhost:8080/api: dial tcp 127.0.0.1:8080: connection refused
So please kindly suggest how can I setup k8s master highly available setup on AWS.
To configure an HA master, you should follow the High Availability Kubernetes Cluster document, in particular making sure you have replicated storage across failure domains and a load balancer in front of your replicated apiservers.
Setting up HA controllers for kubernetes is not trivial and I can't provide all the details here but I'll outline what was successful for me.
Use kube-aws to set up a single-controller cluster: https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws.html. This will create CloudFormation stack templates and cloud-config templates that you can use as a starting point.
Go the AWS CloudFormation Management Console, click the "Template" tab and copy out the complete stack configuration. Alternatively, use $ kube-aws up --export to generate the cloudformation stack file.
User the userdata cloud-config templates generated by kube-aws and replace the variables with actual values. This guide will help you determine what those values should be: https://coreos.com/kubernetes/docs/latest/getting-started.html. In my case I ended up with four cloud-configs:
cloud-config-controller-0
cloud-config-controller-1
cloud-config-controller-2
cloud-config-worker
Validate your new cloud-configs here: https://coreos.com/validate/
Insert your cloud-configs into the CloudFormation stack config. First compress and encode your cloud config:
$ gzip -k cloud-config-controller-0
$ cat cloud-config-controller-0.gz | base64 > cloud-config-controller-0.enc
Now copy the content into your encoded cloud-config into the CloudFormation config. Look for the UserData key for the appropriate InstanceController. (I added additional InstanceController objects for the additional controllers.)
Update the stack at the AWS CloudFormation Management Console using your newly created CloudFormation config.
You will also need to generate TLS asssets: https://coreos.com/kubernetes/docs/latest/openssl.html. These assets will have to be compressed and encoded (same gzip and base64 as above), then inserted into your userdata cloud-configs.
When debugging on the server, journalctl is your friend:
$ journalctl -u oem-cloudinit # to debug problems with your cloud-config
$ journalctl -u etcd2
$ journalctl -u kubelet
Hope that helps.
There is also kops project
From the project README:
Operate HA Kubernetes the Kubernetes Way
also:
We like to think of it as kubectl for clusters
Download the latest release, e.g.:
cd ~/opt
wget https://github.com/kubernetes/kops/releases/download/v1.4.1/kops-linux-amd64
mv kops-linux-amd64 kops
chmod +x kops
ln -s ~/opt/kops ~/bin/kops
See kops usage, especially:
kops create cluster
kops update cluster
Assuming you already have s3://my-kops bucket and kops.example.com hosted zone.
Create configuration:
kops create cluster --state=s3://my-kops --cloud=aws \
--name=kops.example.com \
--dns-zone=kops.example.com \
--ssh-public-key=~/.ssh/my_rsa.pub \
--master-size=t2.medium \
--master-zones=eu-west-1a,eu-west-1b,eu-west-1c \
--network-cidr=10.0.0.0/22 \
--node-count=3 \
--node-size=t2.micro \
--zones=eu-west-1a,eu-west-1b,eu-west-1c
Edit configuration:
kops edit cluster --state=s3://my-kops
Export terraform scripts:
kops update cluster --state=s3://my-kops --name=kops.example.com --target=terraform
Apply changes directly:
kops update cluster --state=s3://my-kops --name=kops.example.com --yes
List cluster:
kops get cluster --state s3://my-kops
Delete cluster:
kops delete cluster --state s3://my-kops --name=kops.identityservice.co.uk --yes