Not able to exec into AKS pod

Not able to exec into AKS pod - kubectl

I'm trying to exec into one of the pods of my AKS instance using the following command: kubectl exec -n hermes --stdin --tty hermes-deployment-ddb88855b-dzgvt -- /bin/sh. I'm using just the regular CMD from Windows, but I'm getting the following error:
error: v1.Pod: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 3193 ...:{},"k:{\"... at {"kind":"Pod","apiVersion":"v1","metadata"
The error is quite long with some meta data and environment variables I believe. I can't share the meta data due to corporate restrictions. I'm logged into Azure using az login, and also have the correct context in my kubeconfig.
Does anyone know what this error exactly entails?

Related

How to make aws cli quiet

When I am using aws cli commands it adds debug data to its output.
Is there a way to make it quiet?
Here is my use case:
# get deployed version
COMMAND="git describe --tags"
aws ecs execute-command --cluster="${CLUSTER}" --task="${TASK}" --container="${SERVICE}" --command="${COMMAND}" --interactive > VERSION
The issue is that instead of expected contents of VERSION file (just the version number):
0.0.67
I have something like that:
The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.
Starting session with SessionId: ecs-execute-command-123456789abcdefgh
0.0.67
Exiting session with sessionId: ecs-execute-command-123456789abcdefgh.
How can I get rid of the debug data?
I already tried adding --quiet parameter (parameter does not exist)
and redirecting error output, none helped.

AWS-RunBashScript errors/warnings with Python

I have many EC2 instances that retain Celery jobs for processing. To efficiently start the overall task of completing the queue, I have tested AWS-RunBashScript in AWS' SSM with a BASH script that calls a Python script. For example, for a single instance this begins with sh start_celery.sh.
When I run the command in SSM, this is the following output (compare to other output below, after reading on):
/home/ec2-user/dh2o-py/venv/local/lib/python2.7/dist-packages/celery/utils/imports.py:167:
UserWarning: Cannot load celery.commands extension u'flower.command:FlowerCommand':
ImportError('No module named compat',)
namespace, class_name, exc))
/home/ec2-user/dh2o-py/tasks/task_harness.py:49: YAMLLoadWarning: calling yaml.load() without
Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
task_configs = yaml.load(conf)
Running a worker with superuser privileges when the worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).
User information: uid=0 euid=0 gid=0 egid=0
failed to run commands: exit status 1
Note that only warnings are thrown. When I SSH to the same instance and run the same command (i.e. sh start_celery.sh), the following (same) output results BUT the process runs:
I have verified that the process does NOT run when doing this via SSM, and I have no idea why. As a work-around, I tried running the sh start_celery.sh command with bootstrapping in user data for each EC2, but that failed too.
So, why does SSM fail to actually run the process that I succeed in doing by actually via SSH to each instance running identical commands? The details below relate to machine and Python configuration:

Knative on GKE is not working with some images, shows RevisionMissing error

I am running Knative on a GKE cluster. The sample images provided on the Knative website work but when I switch to some other images, it stops working. Only 2 containers work out of 3 and route's ready state remains 'unknown' and Reason shows as 'RevisionMissing'.
I tried with multiple images, k8s.gcr.io/hpa-example is one of them.
Edit: The cluster has a two-node of configuration of type n1-standard-4 (4 vCPUs, 15 GB memory). I created this cluster using the GCP console with the latest version of kubernetes, and checking the Enable Istio checkbox. I used following commands to install the Knative:
kubectl apply --selector knative.dev/crd-install=true \
-f https://github.com/knative/serving/releases/download/v0.8.0/serving.yaml \
-f https://github.com/knative/eventing/releases/download/v0.8.0/release.yaml \
-f https://github.com/knative/serving/releases/download/v0.8.0/monitoring.yaml
kubectl apply \
-f https://github.com/knative/serving/releases/download/v0.8.0/serving.yaml \
-f https://github.com/knative/eventing/releases/download/v0.8.0/release.yaml \
-f https://github.com/knative/serving/releases/download/v0.8.0/monitoring.yaml
Thanks

Ok, I found the problem. I tried posting custom images. All worked until I change the port (inside image) to 80. This image not only work as Knative service, but also, It did not work on Cloud run service as well.
Bottom line is, either pull port number from environment variable, or hard code it to any other port than 80.

Thanks for the precisions.
When you installed Knative you should see this kind of errors
# Without CRD
unable to recognize "https://github.com/knative/serving/releases/download/v0.8.0/serving.yaml": no matches for kind "Gateway" in version "networking.istio.io/v1alpha3"
unable to recognize "https://github.com/knative/serving/releases/download/v0.8.0/serving.yaml": no matches for kind "Gateway" in version "networking.istio.io/v1alpha3"
unable to recognize "https://github.com/knative/serving/releases/download/v0.8.0/serving.yaml": no matches for kind "Image" in version "caching.internal.knative.dev/v1alpha1"
unable to recognize "https://github.com/knative/eventing/releases/download/v0.8.0/release.yaml": no matches for kind "ClusterChannelProvisioner" in version "eventing.knative.dev/v1alpha1"
# Without CRD
Error from server (NotFound): error when creating "https://github.com/knative/serving/releases/download/v0.8.0/monitoring.yaml": namespaces "istio-system" not found
Error from server (NotFound): error when creating "https://github.com/knative/serving/releases/download/v0.8.0/monitoring.yaml": namespaces "istio-system" not found
Error from server (NotFound): error when creating "https://github.com/knative/serving/releases/download/v0.8.0/monitoring.yaml": namespaces "istio-system" not found
Error from server (NotFound): error when creating "https://github.com/knative/serving/releases/download/v0.8.0/monitoring.yaml": namespaces "istio-system" not found
You didn't have installed Istio. Do it, relaunch the knative installation (with and without CRD) to solve previous errors and enjoy!!!

GCP: kubectl exec/logs fails to container on using UBUNTU as OS

I created a 2 node cluster with OS as UBUNTU.
After deploying a container, trying a kubectl exec or logs fail with following error :-
Error from server: error dialing backend: No SSH tunnels currently open. Were the targets able to accept an ssh-key for user <username>
Please tell how to make it work.
Nodes are part of default pool only.
Steps to reproduce:-
gcloud container clusters create "gke-test-cluster" --image-type=UBUNTU --machine-type=n1-standard-2 --zone us-east1-c --num-nodes 2 --cluster-version=1.8
kubectl create -f https://k8s.io/docs/tasks/debug-application-cluster/shell-demo.yaml
kubectl get pod shell-demo
kubectl exec -it shell-demo -- /bin/bash
Error from server: error dialing backend: No SSH tunnels currently open. Were the targets able to accept an ssh-key for user "gke-0c"?
kubectl logs shell-demo
Error from server: Get https://10.142.0.5:10250/containerLogs/default/shell-demo/nginx: No SSH tunnels currently open. Were the targets able to accept an ssh-key for user "gke-0c"?
I am using my laptop for all CLI commands.
This issue has already been raised at:-
https://issuetracker.google.com/issues/77986235
https://serverfault.com/questions/907468/gcp-kubectl-exec-logs-fails-to-container-on-using-ubuntu-as-os/907882?noredirect=1#comment1177112_907882

I reproduced your issue, with your exact commands and it worked just fine. This has to be an issue due to something else (like the firewall, as in the issue tracker is suggested).
Actually, check to confirm you have these three firewall rules:
gke-gke-test-cluster-07424324-all ...
gke-gke-test-cluster-07424324-ssh ...
gke-gke-test-cluster-07424324-vms ...
About cloud shell and your laptop, there is no much difference, if you are correctly authenticated with Cloud SDK. So to say "This issue is also reproducible from gcp cloud-shell" doesn't really make sense.
If you do have the firewall rules, and don't have much done in the project, I would recommend you to create a new project and start over there.

It was some issue with size of project metadata. We cleaned it up and it worked.

chef-client failing as node_name not present in client.rb

I followed a tutorial here to bootstrap and register a node to chef server. The instance is in autoscaling group which is why I opted this method for bootstrapping.
Scenario is - I am using client.rb, validation.pem, trusted_certs from s3 to newly launched instance via userdata.
client.rb
log_location STDOUT
chef_server_url "https://chef.myserver.org/organizations/org"
validation_client_name "org-validator"
# Using default node name (fqdn)
trusted_certs_dir "/etc/chef/trusted_certs"
After downloading required files following command get executed to run chef-client with $INSTANCE_ID as node_name.
chef-client -N $INSTANCE_ID -j /etc/chef/first-boot.json
The initial bootstrapping is successful and the node gets registered to chef-server with instance-id as node-name but when running subsequent chef-client, it fails with the error:
ERROR: 401 "Unauthorized"
This is due to the node_name not present in client.rb.
How can I make the entry of node_name in client.rb during the very first chef-client run?

This is generally handled in the userdata script or config, a la echo "node_name '$HOSTNAME'" >>/etc/chef/client.rb. The specifics can vary depending on your naming scheme, sometimes you'll make some string edits to $HOSTNAME or use a different name entirely. This isn't strictly required, but without a name in the config file, Chef uses whatever the current FQDN of the system is, and it sounds like something in the initial Chef run changes the FQDN. Another option is to just not do that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js