Can't interact with services after installing istio 1.0.3 - istio

I have used istio-1.1.0-snapshot.2 and everything worked well. Then i have upgraded istio to istio-1.0.3. and after that i could not interact with services in mesh.
In logs of istio-ingressgateway i see such problems:
[2018-10-28 09:18:41.317][20][info][main] external/envoy/source/server/drain_manager_impl.cc:63] shutting down parent after drain
[2018-10-28 09:19:35.188][34][info][main] external/envoy/source/server/drain_manager_impl.cc:63] shutting down parent after drain
[2018-10-28 09:19:35.189][20][info][main] external/envoy/source/server/hot_restart_impl.cc:444] shutting down due to child request
[2018-10-28 09:19:35.189][20][warning][main] external/envoy/source/server/server.cc:373] caught SIGTERM
[2018-10-28 09:19:35.189][20][info][main] external/envoy/source/server/server.cc:436] main dispatch loop exited
[2018-10-28 09:19:35.197][20][info][main] external/envoy/source/server/server.cc:472] exiting
[2018-11-02 09:22:33.045][34][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 13,
[2018-11-02 09:22:43.322][34][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers
[2018-11-02 09:22:53.503][34][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers
[2018-11-02 09:23:05.420][34][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers
[2018-11-02 09:23:15.810][34][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers
Afeter executing command istioctl proxy-status, i have seen that versions of my side-car proxies if 1.0.2
Any suggestiongs?

After upgrading Istio control plane, you need to upgrade sidecar containers. In simple words, you need to re-inject them.
If you use manual injection, you can upgrade them by the command:
$ kubectl apply -f <(istioctl kube-inject -f $ORIGINAL_DEPLOYMENT_YAML)
If you use automatic sidecar injection, you can upgrade the sidecar by doing a rolling update for all the pods, so that the new version of the sidecar will be automatically re-injected.
Here is the link on documentation.

1.0.3 Appears to be rather broke, it doesn't route requests between containers. Even a simple wordpress/mysql site does not work in 1.0.3

Related

Cloud Foundry cli i/o timeout

I was able to successfully deploy BOSH and CF on GCP. I was able to install the cf cli on my worker machine and was able to cf login to the api endpoint without any issues. Now I am attempting to deploy a python and a node.js hello-world style application (cf push) but I am running into the following error:
Python:
**ERROR** Could not install python: Get https://buildpacks.cloudfoundry.org/dependencies/python/python-3.5.4-linux-x64-5c7aa3b0.tgz: dial tcp: lookup buildpacks.cloudfoundry.org on 169.254.0.2:53: read udp 10.255.61.196:36513->169.254.0.2:53: i/o timeout
Failed to compile droplet: Failed to run all supply scripts: exit status 14
NodeJS
-----> Nodejs Buildpack version 1.6.28
-----> Installing binaries
engines.node (package.json): unspecified
engines.npm (package.json): unspecified (use default)
**WARNING** Node version not specified in package.json. See: http://docs.cloudfoundry.org/buildpacks/node/node-tips.html
-----> Installing node 6.14.3
Download [https://buildpacks.cloudfoundry.org/dependencies/node/node-6.14.3-linux-x64-ae2a82a5.tgz]
**ERROR** Unable to install node: Get https://buildpacks.cloudfoundry.org/dependencies/node/node-6.14.3-linux-x64-ae2a82a5.tgz: dial tcp: lookup buildpacks.cloudfoundry.org on 169.254.0.2:53: read udp 10.255.61.206:34802->169.254.0.2:53: i/o timeout
Failed to compile droplet: Failed to run all supply scripts: exit status 14
I am able to download and ping the build pack urls manually on the worker machine, jumpbox, and the bosh vms so I believe DNS is working properly on each of those machine types.
As part of the default deployment, I believe a socks5 tunnel is created to allow communication from my worker machine to the jumpbox so this is where I believe the issue lies. https://docs.cloudfoundry.org/cf-cli/http-proxy.html
When running bbl print-env, export BOSH_ALL_PROXY=ssh+socks5://jumpbox#35.192.140.0:22?private-key=/tmp/bosh-jumpbox725514160/bosh_jumpbox_private.key , however when I export https_proxy=socks5://jumpbox#35.192.140.0:22?private-key=/tmp/bosh-jumpbox389236516/bosh_jumpbox_private.key and do a cf push I receive the following error:
Request error: Get https://api.cloudfoundry.costub.com/v2/info: proxy: SOCKS5 proxy at 35.192.140.0:22 has unexpected version 83
TIP: If you are behind a firewall and require an HTTP proxy, verify the https_proxy environment variable is correctly set. Else, check your network connection.
FAILED
Am I on the right track? Is my https_proxy variable formatted correctly? I also tried https_proxy=socks5://jumpbox#35.192.140.0:22 with the same result.

504 gateway timeout flask socketio

I am working on a flask-socketio server which is getting stuck in a state where only 504s (gateway timeout) are returned. We are using AWS ELB in front of the server. I was wondering if anyone wouldn't mind giving some tips as to how to debug this issue.
Other symptoms:
This problem does not occur consistently, but once it begins happening, only 504s are received from requests. Restarting the process seems to fix the issue.
When I run netstat -nt on the server, I see many entries with rec-q's of over 100 stuck in the CLOSE_WAIT state
When I run strace on the process, I only see select and clock_gettime
When I run tcpdump on the server, I can see the valid requests coming into the server
AWS health checks are coming back succesfully
EDIT:
I should also add two things:
flask-socketio's server is used for production (not gunicorn or uWSGI)
Python's daemonize function is used for daemonizing the app
It seemed that switching to gunicorn as the wsgi server fixed the problem. This legitimately might be an issue with the flask-socketio wsgi server.

Cannot connect Celery to RabbitMQ on Windows Server

I am trying to setup rabbitMQ to use as a message broker for Celery. I am trying to set these up on a Windows Server 2012 R2. After I start the rabbitMQ server using the RabbitMQ start service on the applications menu, I try to start the celery app with the command.
celery -A proj worker -l info
I get the following error after the above command.
[2018-01-09 10:03:02,515: ERROR/MainProcess] consumer: Cannot connect to amqp://
guest:**#127.0.0.1:5672//: [WinError 10042] An unknown, invalid, or unsupported
option or level was specified in a getsockopt or setsockopt call.
Trying again in 2.00 seconds...
So, I tried debugging, by check the status of the RabbitMQ server, for which I went into the RabbitMQ command prompt and typed rabbitmqctl status, on which I got the following response.
These are the services that I used to start RabbitMQ and the RabbitMQ command line
Here's my Django settings for Celery. I tried putting ports and usernames before and after the hosts, but same error.
CELERY_BROKER_URL = 'amqp://localhost//'
CELERY_RESULT_BACKEND = 'amqp://localhost//'
What is the issue here? How do I check if the RabbitMQ service started or not? What setting do I need to put on the Django Settings file.
I was fighting the same issue. Ended up downgrading amqp to 2.1.3 based on the open issue in py-amqp:
https://github.com/celery/py-amqp/issues/130
Uninstall amqp using pip uninstall amqp
Install amqp using pip install -Iv amqp==2.1.3

Spark 0.90 Stand alone connection refused

I am using spark 0.90 stand alone mode.
When I tried with a streaming application in stand alone mode, I am getting a connection refused exception.
I added hostname in /etc/hosts also tried with IP alone. In both cases worker got registered with master without any issues.
Is there a way to solve this issue?
14/02/28 07:15:01 INFO Master: akka.tcp://driverClient#127.0.0.1:55891 got disassociated, removing it.
14/02/28 07:15:04 INFO Master: Registering app Twitter Streaming
14/02/28 07:15:04 INFO Master: Registered app Twitter Streaming with ID app-20140228071504-0000
14/02/28 07:34:42 INFO Master: akka.tcp://spark#127.0.0.1:33688 got disassociated, removing it.
14/02/28 07:34:42 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.165.35.96%3A38903-6#-1146558090] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/02/28 07:34:42 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster#10.165.35.96:8910] -> [akka.tcp://spark#127.0.0.1:33688]: Error [Association failed with [akka.tcp://spark#127.0.0.1:33688]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark#127.0.0.1:33688]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /127.0.0.1:33688
I had a similar issue when running in Spark in cluster mode. My problem was that the server was started with the hostname 'fluentd:7077' and not the FQDN. I edited the
/sbin/start-master.sh
to reflect how my remote nodes connect with the -ip flag.
/usr/lib/jvm/jdk1.7.0_51/bin/java -cp :/home/vagrant/spark-0.9.0-incubating-bin- hadoop2/conf:/home/vagrant/spark-0.9.0-incuba
ting-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.ap
ache.spark.deploy.master.Master --ip fluentd.alex.dev --port 7077 --webui-port 8080
Hope this helps.

Problems getting RabbitMQ and Django-Celery Running: Target Machine actively refused connection

I am trying to get Django-Celery running on my Django App. I cannot get the worker server to run. When I try I get the message: No Connection could be made because the target machine actively refused it
Here is what I have done so far. First, I installed the django celery package: http://pypi.python.org/pypi/django-celery
I can load it into python without problems. I also installed the RabbitMQ server per the windows install instructions: http://www.rabbitmq.com/install.html#windows
Starting the tutorials in pytho on the RabbitMQ site I saw the need to install pika: http://pypi.python.org/pypi/pika. It imports without any problems.
From there I start the RabbitMQ server by running this at the command line: rabbitmq-service start
I get the message back that Service RabbitMQ started
Here is where I start to have problems.
I attempted the first steps in django-celery: http://packages.python.org/django-celery/getting-started/first-steps-with-django.html and the "hello world" example on the rabbitMQ site: http://www.rabbitmq.com/tutorials/tutorial-one-python.html
In both cases I get the message: No Connection could be made because the target machine actively refused it
My first thought was that this sounded like a firewall problem. So I went into the windows 7 firewall and added inbound and outbound rules to open the local and remote ports 5672 and 5673 to TCP protocol, but I still get the same error message.
When I run rabbitmqctl status i get the message:
Error: unable to connect to node 'rabbit#hostname': nodedown
diagnostics:
- nodes and their ports on hostname: [{rabbitmqctl18856, 505031}]
Does that mean it that it is trying to operate on those ports? what about the default 5672?
Any suggestions?
UPDATE: This was actually a problem resulting from several failed rabbitmq installs conflicting with the latest installation. If you have to remove rabbitmq use the 'rabbitmq-service remove' command and not SC DELETE, which cause a lot of problems for me and I had to go in and clean up my windows registry file.
The nodedown error indicated by rabbitmqctl suggests that the server isn't running on that machine.
Try going though the steps in RabbitMQ's troubleshooting guide. In particular, pay close attention to the logs. Has the server crashed for some reason? Could you post the logs somewhere?