I have following pods:
NAME READY STATUS RESTARTS AGE
airflow-database-init-job-ggk95 0/1 Completed 0 3h
airflow-redis-0 1/1 Running 0 3h
airflow-scheduler-7594cd584-mlfrt 2/2 Running 9 3h
airflow-sqlproxy-74f64b8b97-csl8h 1/1 Running 0 3h
airflow-worker-5fcd4fffff-7w2sg 2/2 Running 0 3h
airflow-worker-5fcd4fffff-m44bs 2/2 Running 0 3h
airflow-worker-5fcd4fffff-mm55s 2/2 Running 0 3h
composer-agent-0034135a-3fed-49a6-b173-9d3f9d0569db-ktwwt 0/1 Completed 0 3h
composer-agent-0034135a-3fed-49a6-b173-9d3f9d0569db-nmjvw 0/1 Error 0 3h
composer-agent-d043348f-025a-4aa1-89b4-d4a5fae91653-8zdwk 0/1 Completed 0 3h
composer-fluentd-daemon-grwsp 1/1 Running 0 3h
composer-fluentd-daemon-rxhjc 1/1 Running 0 3h
composer-fluentd-daemon-xxrmr 1/1 Running 0 3h
I don't know which of them are webserver pods. airflow-worker is probably not webserver, righ? I want to poke it to check if it is working properly, because it seems not to.
As explained in the documentation about Cloud Composer's architecture, the Airflow webserver is running in an App Engine flexible environment hosted in a Google-managed tenant project to which users do not have access.
Unfortunately, the Webserver logs are not forwarded to the Composer's main project (i.e. your project), although there is an open Feature Request in the Public Issue Tracker, so feel free to click on the star icon and comment on it in order to let the Composer engineering know about the importance of this feature and your use case too. Therefore, if you believe you have any other similar issue regarding the webserver itself, I recommend you to either contact support (if you are eligible to do so) or open an issue in the corresponding Public Issue Tracker so that your issue can be investigated by the GCP Support Team.
In case you want to know more about the Airflow Webserver, you can find some additional information in its documentation page too.
With regards to logs of Airflow Webserver - these logs are visible in Stackdriver logging.
Navigate to GCP Menu -> Logging -> Logs Viewer
If you are using classical Stackdriver UI then select "Cloud Composer Environment" in the "resource" dropdown and then select "airflow-webserver" in the second drop down as it was shown in this picture
If you are using new Stackdriver menu then put the following query into query box:
query:resource.type="cloud_composer_environment"
logName="projects/<your project name>/logs/airflow-webserver"
... and you will get the logs generated by airflow-webserver.
Related
I started observing below validation error on EMR console,
Upon checking the status of the instance controller service, observed that
sudo systemctl status instnace-controller.service output is not consistent, it varies between running and auto-restart.
Master node system logs shows;
(console) 2023-02-03 21:55:23 About to start instance controller.
(console) 2023-02-03 21:55:23 Listing currently running instance controllers:
hadoop 8439 1 0 21:55 ? 00:00:00 /bin/bash -l /usr/bin/instance-controller
hadoop 8510 8439 0 21:55 ? 00:00:00 /etc/alternatives/jre/bin/java -Xmx1024m -XX:+ExitOnOutOfMemoryError -XX:MinHeapFreeRatio=10 -server -cp /usr/share/aws/emr/instance-controller/lib/*:/home/hadoop/conf -Dlog4j.defaultInitOverride aws157.instancecontroller.Main
hadoop 8541 8439 0 21:55 ? 00:00:00 grep -i instance
root 8542 8439 0 21:55 ? 00:00:00 sudo tee -a /emr/instance-state/console.log-2023-02-03-21-55 /dev/console
oozie 26477 1 26 21:53 ? 00:00:22 /etc/alternatives/jre/bin/java -Xmx1024m -Xmx1024m -Doozie.home.dir=/usr/lib/oozie -Doozie.config.dir=/etc/oozie/conf -Doozie.log.dir=/var/log/oozie -Doozie.data.dir=/var/lib/oozie -Doozie.instance.id=ip-10-111-24-159.pvt.lp192.cazena.com -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Djava.library.path= -cp /usr/lib/oozie/embedded-oozie-server/*:/usr/lib/oozie/embedded-oozie-server/dependency/*:/usr/lib/oozie/lib/*:/usr/lib/oozie/libtools/*:/usr/lib/oozie/libext/*:/usr/lib/oozie/embedded-oozie-server:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/conf/*:/usr/share/aws/emr/emrfs/auxlib/* org.apache.oozie.server.EmbeddedOozieServer
root 27455 1 42 21:54 ? 00:00:35 /etc/alternatives/jre/bin/java -Xmx1024m -XX:+ExitOnOutOfMemoryError -XX:MinHeapFreeRatio=10 -server -cp /usr/share/aws/emr/instance-controller/lib/*:/home/hadoop/conf -Dlog4j.defaultInitOverride aws157.logpusher.Main /etc/logpusher/logpusher.properties
(console) 2023-02-03 21:55:23 Displaying last 10 lines of instance controller logfile:
2023-02-03 21:55:17,719 INFO main: isV2FrameworkEnabled: false, extraInstanceData.numCandidates: 1
2023-02-03 21:55:17,735 WARN main: Invalid metrics information null fetched from checkpoint, will start continuing from current moment instead.
2023-02-03 21:55:17,735 INFO main: Initialized YARN checkpointing state with ckpFileAvl: true, ckpInfo: [ lastCkpTs(0), totalHdfsBytesReadCompletedApps(0), totalHdfsBytesWrittenCompletedApps(0), totalS3BytesReadCompletedApps(0), totalS3BytesWrittenCompletedApps(0)]
2023-02-03 21:55:17,745 ERROR main: Thread + 'main' failed with error
java.lang.RuntimeException: LocalStartupState is FAILED, so not allowing instance controller to start
at aws157.instancecontroller.common.InstanceConfigurator.hasAlreadyBeenConfigured(InstanceConfigurator.java:124)
at aws157.instancecontroller.common.InstanceConfigurator.<init>(InstanceConfigurator.java:100)
at aws157.instancecontroller.InstanceController.<init>(InstanceController.java:223)
at aws157.instancecontroller.Main.runV1Framework(Main.java:239)
at aws157.instancecontroller.Main.main(Main.java:222)
I tried restarting service multiple time with
sudo systemctl start instance-controller.service, rebooted the node hoping that service will start back after reboot. But it is not working. (Btw, this worked on lower environment)
Jobs on the cluster are running fine though without any issues, but I am not able to see application logs pushed to S3 or on console.
Need inputs on how to restart instance controller service.
I am trying to enable the canary deployment for the AWS eks but my kayenta pod is not starting. When I describe the pod I see this error. Can anyone help?
Warning Unhealthy 12m (x2 over 12m) kubelet Readiness probe failed: wget: can't connect to remote host (127.0.0.1): Connection refused
Warning Unhealthy 2m56s (x59 over 12m) kubelet Readiness probe failed: wget: server returned error: HTTP/1.1 503
This is the status of pod:
NAME READY STATUS RESTARTS AGE
spin-clouddriver-d796bdc59-tpznw 1/1 Running 0 3h40m
spin-deck-77cc75b57d-w7rfp 1/1 Running 0 3h40m
spin-echo-db954bb9-phfd5 1/1 Running 0 3h40m
spin-front50-7c5684cf9-t7vl8 1/1 Running 0 3h40m
spin-gate-78d6779854-7xqz4 1/1 Running 0 3h40m
spin-kayenta-6d7b5fdfc6-p5tcp 0/1 Running 0 21m
spin-kayenta-869c46bfcf-8t5fh 0/1 Running 0 28m
spin-orca-7ddd66758d-mpnkg 1/1 Running 0 3h40m
spin-redis-5975cfcdc8-rnm9g 1/1 Running 0 45h
spin-rosco-b7dbb577-z4szz 1/1 Running 0 3h40m
I will try to address your issue from the Kubernetes perspective.
The errors you were experiencing:
Warning Unhealthy 12m (x2 over 12m) kubelet Readiness probe failed: wget: can't connect to remote host (127.0.0.1): Connection refused
Warning Unhealthy 2m56s (x59 over 12m) kubelet Readiness probe failed: wget: server returned error: HTTP/1.1 503
indicates that there was a problem with your ReadinessProbe configuration. Removing the ReadinessProbe from the deployment "fixed" the error but can cause some more issues in the future. To avoid that I recommend adding it back with a proper configuration:
Probes have a number of fields that you can use to more precisely
control the behavior of liveness and readiness checks:
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to
0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1
for liveness. Minimum value is 1.
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness
probe means restarting the container. In case of readiness probe the
Pod will be marked Unready. Defaults to 3. Minimum value is 1.
You'll need to adjust the Probe's configuration based on your apps behavior (usually by trial and error). The two resources I would recommend that will help you with that are:
Configure Liveness, Readiness and Startup Probes
Kubernetes best practices: Setting up health checks with readiness and liveness probes
I am attempting to deploy CockroachDB:v2.1.6 to a new AWS EKS cluster. Everything is deployed successfully; statefulset, services, pv's & pvc's are created. The AWS EBS volumes are created successfully too.
The issue is the pods never get to a READY state.
pod/cockroachdb-0 0/1 Running 0 14m
pod/cockroachdb-1 0/1 Running 0 14m
pod/cockroachdb-2 0/1 Running 0 14m
If I 'describe' the pods I get the following:
Normal Pulled 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Container image "cockroachdb/cockroach:v2.1.6" already present on machine
Normal Created 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Created container cockroachdb
Normal Started 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Started container cockroachdb
Warning Unhealthy 1s (x8 over 36s) kubelet, ip-10-5-109-70.eu-central-1.compute.internal Readiness probe failed: HTTP probe failed with statuscode: 503
If I examine the logs of a pod I see this:
I200409 11:45:18.073666 14 server/server.go:1403 [n?] no stores bootstrapped and --join flag specified, awaiting init command.
W200409 11:45:18.076826 87 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host". Reconnecting...
W200409 11:45:18.076942 21 gossip/client.go:123 [n?] failed to start gossip client to cockroachdb-0.cockroachdb:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host"
I came across this comment from the CockroachDB forum (https://forum.cockroachlabs.com/t/http-probe-failed-with-statuscode-503/2043/6)
Both the cockroach_out.log and cockroach_output1.log files you sent me (corresponding to mycockroach-cockroachdb-0 and mycockroach-cockroachdb-2) print out no stores bootstrapped during startup and prefix all their log lines with n?, indicating that they haven’t been allocated a node ID. I’d say that they may have never been properly initialized as part of the cluster.
I have deleted everything including pv's, pvc's & AWS EBS volumes through the kubectl delete command and reapplied with the same issue.
Any thoughts would be very much appreciated. Thank you
I was not aware that you had to initialize the CockroachDB cluster after creating it. I did the following to resolve my issue:
kubectl exec -it cockroachdb-0 -n /bin/sh
/cockroach/cockroach init
See here for more details - https://www.cockroachlabs.com/docs/v19.2/cockroach-init.html
After this the pods started running correctly.
When I start minikube and apply istio.yaml
bug the ingress can't start up:
eumji#eumji:~$ kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
istio-ca-76dddbd695-bdwm9 1/1 Running 5 2d
istio-ingress-85fb769c4d-qtbcx 0/1 CrashLoopBackOff 67 2d
istio-mixer-587fd4bbdb-ldvhb 3/3 Running 15 2d
istio-pilot-7db8db896c-9znqj 2/2 Running 10 2d
When I try to see the log I get following output:
eumji#eumji:~$ kubectl logs -f istio-ingress-85fb769c4d-qtbcx -n istio-system
ERROR: logging before flag.Parse: I1214 05:04:26.193386 1 main.go:68] Version root#24c944bda24b-0.3.0-24ec6a3ac3a1d592d1873d2d8198278a849b8301
ERROR: logging before flag.Parse: I1214 05:04:26.193463 1 main.go:109] Proxy role: proxy.Node{Type:"ingress", IPAddress:"", ID:"istio-ingress-85fb769c4d-qtbcx.istio-system", Domain:"istio-system.svc.cluster.local"}
ERROR: logging before flag.Parse: I1214 05:04:26.193480 1 resolve.go:35] Attempting to lookup address: istio-mixer
ERROR: logging before flag.Parse: I1214 05:04:41.195879 1 resolve.go:42] Finished lookup of address: istio-mixer
Error: lookup failed for udp address: i/o timeout
Usage:
agent proxy [flags]
--serviceregistry string Select the platform for service registry, options are {Kubernetes, Consul, Eureka} (default "Kubernetes")
--statsdUdpAddress string IP Address and Port of a statsd UDP listener (e.g. 10.75.241.127:9125)
--zipkinAddress string Address of the Zipkin service (e.g. zipkin:9411)
Global Flags:
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
-v, --v Level log level for V logs (default 0)
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
ERROR: logging before flag.Parse: E1214 05:04:41.198640 1 main.go:267] lookup failed for udp address: i/o timeout
What could be the reason?
There is not enough information in your post to figure out what may be wrong, in particular it seems that somehow your ingress isn't able to resolve istio-mixer which is unexpected.
Can you file a detailed issue
https://github.com/istio/issues/issues/new
And we can take it from there ?
Thanks
Are you using something like minikube? The quick-start docs give this hint: "Note: If your cluster is running in an environment that does not support an external load balancer (e.g., minikube), the EXTERNAL-IP of istio-ingress says . You must access the application using the service NodePort, or use port-forwarding instead."
https://istio.io/docs/setup/kubernetes/quick-start.html
I would like to do some real world data testing on micro cloud foundry but the capacity of postgres database is limited to 256MB which is not sufficient for my testing. Is there a way to increase the db capacity temporarily in offline mode for testing?
If not, can somebody point me the latest instructions of setting up private cloud foundry server on Ubuntu Server 12.04
you can ssh in to the instance, change the configuration for mysql and restart the service.
SSH to the MCF instance :
$ ssh vcap#api.<your-mcf-instance-name-here>.cloudfoundry.me
*note - if you can't remember the password for the vcap user, you can change this via the vm console menu by selecting option 3.
Edit the mysql-node configuration file :
$ vi /var/vcap/jobs/mysql_node/config/mysql_node.yml
the file should look something or exactly like this :
---
local_db: sqlite3:/var/vcap/store/mysql_node.db
base_dir: /var/vcap/store/mysql
mbus: nats://nats:f5dc63f74be5e38f#127.0.0.1:4222
index: 0
logging:
level: debug
file: /var/vcap/sys/log/mysql_node/mysql_node.log
pid: /var/vcap/sys/run/mysql_node/mysql_node.pid
available_storage: 2048
node_id: mysql_node_1
max_db_size: 256
max_long_query: 3
mysql:
host: localhost
port: 3306
socket: /var/vcap/sys/run/mysqld/mysqld.sock
user: root
pass: dc64fad710976ea5
migration_nfs: /var/vcap/services_migration
max_long_tx: 0
max_user_conns: 20
mysqldump_bin: /var/vcap/packages/mysql/bin/mysqldump
mysql_bin: /var/vcap/packages/mysql/bin/mysql
gzip_bin: /bin/gzip
ip_route: 127.0.0.1
z_interval: 30
max_nats_payload: 1048576
The two lines you are interested in are;
available_storage: 2048
and
max_db_size: 256
The first line is the maximum available amount of disk storage made available to MySQL, the second is the maximum size per mysql db instance. Set these to your desired values, obviously available_storage has to be large than max_db_size and also a multiple of that value.
Save the file and then restart the VM (shut it down via the menu in the VM console or do it via SSH) and you should be good to go!