Connection reset by peer when specifying TLS traffic in Istio - istio

I have a distributed database (Consul) that I want to run within Istio on Kubernetes. Communication between peers are encrypted and verified using mTLS. I am using a headless service to allow the peers to reach each other:
apiVersion: v1
kind: Service
metadata:
name: cluster
namespace: consul
spec:
clusterIP: None
publishNotReadyAddresses: true
selector:
app: consul
ports:
- name: tcp-server
port: 8300
targetPort: 8300
- name: tcp-serflan
protocol: TCP
port: 8301
targetPort: tcp-serflan
- name: udp-serflan
protocol: UDP
port: 8301
targetPort: udp-serflan
However, when I name the tcp-server port on the headless service tls-server instead I see a load of connection reset errors:
2020/02/18 08:46:44 [INFO] serf: EventMemberUpdate: consul-0
2020/02/18 08:47:33 [ERR] agent: Coordinate update error: rpc error making call: stream closed
2020/02/18 08:47:38 [WARN] raft: Heartbeat timeout from "10.0.3.146:8300" reached, starting election
2020/02/18 08:47:38 [INFO] raft: Node at 10.0.2.114:8300 [Candidate] entering Candidate state in term 106
2020/02/18 08:47:38 [ERROR] raft: Failed to make RequestVote RPC to {Voter 54224806-ed63-0d1b-ae2c-9c1a09de43c4 10.0.3.146:8300}: EOF
2020/02/18 08:47:38 [ERROR] raft: Failed to make RequestVote RPC to {Voter 5b2e92ce-61d8-032e-dd94-c0d9eb1319a0 10.0.1.209:8300}: read tcp 10.0.2.114:40018->10.0.1.209:8300: read: connection reset by peer
2020/02/18 08:47:47 [WARN] raft: Election timeout reached, restarting election
2020/02/18 08:47:47 [INFO] raft: Node at 10.0.2.114:8300 [Candidate] entering Candidate state in term 107
2020/02/18 08:47:47 [ERROR] raft: Failed to make RequestVote RPC to {Voter 54224806-ed63-0d1b-ae2c-9c1a09de43c4 10.0.3.146:8300}: read tcp 10.0.2.114:40762->10.0.3.146:8300: read: connection reset by peer
2020/02/18 08:47:47 [ERROR] raft: Failed to make RequestVote RPC to {Voter 5b2e92ce-61d8-032e-dd94-c0d9eb1319a0 10.0.1.209:8300}: read tcp 10.0.2.114:40062->10.0.1.209:8300: read: connection reset by peer
How is Istio/Envoy treating TLS traffic that could be causing this issue? The traffic is TLS traffic (I am doing this) so it seems it's necessary to pretend the traffic is TCP.
Consul version: 1.6.2
Istio version: 1.4.4

Related

AWS managed hyperledger fabric v1.4.7 blockchain - Getting bad certificate error when connecting to the fabric network

I have deployed a AWS managed Hyperledger Fabric v1.4.7 blockchain. The HLF blockchain network and the EC2 instance (hlf-client) are in the same VPC and everything seems to be working fine since I am able to invoke transactions using the cli container.
I have my client-app which is using fabric-sdk-go gateway API to connect to the fabric network using the connection-profile.yamlto invoke/query the blockchain. This client-app is running in a docker container on same EC2 instance as the cli container which has all the necessary security configuration. The client-app is unable to connect to the fabric network due to a bad certificate error
The error log on the client app is:
[fabsdk/util] 2021/11/02 09:55:17 UTC - lazyref.(*Reference).refreshValue -> WARN Error - initializer returned error: QueryBlockConfig failed: QueryBlockConfig failed: queryChaincode failed: Transaction processing for endorser [nd-cjfwwnimujabllevl6yitqqmxi.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.ap-southeast-1.amazonaws.com:30003]: Endorser Client Status Code: (2) CONNECTION_FAILED. Description: dialing connection on target [nd-cjfwwnimujabllevl6yitqqmxi.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.ap-southeast-1.amazonaws.com:30003]: connection is in TRANSIENT_FAILURE. Will retry again later
The corresponding peer log is:
[36m2021-11-02 10:07:17.789 UTC [grpc] handleRawConn -> DEBU 39501a[0m grpc: Server.Serve failed to complete security handshake from "10.0.2.131:39100": remote error: tls: bad certificate
[31m2021-11-02 10:10:17.809 UTC [core.comm] ServerHandshake -> ERRO 395322[0m TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=10.0.2.131:12696
While invoking transactions using the cli the same certificate files are used. Could anyone tell me what's wrong with my setup here or am I missing any other configuration?
I have generated the ccp (connection-profile.yaml) as below:
---
name: n-RH3K6KAHFND6BGTXXGRU7C3B5Q
version: 1.0.0
client:
organization: Org1
connection:
timeout:
peer:
endorser: "300"
channels:
mychannel:
peers:
nd-CJFWWNIMUJABLLEVL6YITQQMXI:
endorsingPeer: true
chaincodeQuery: true
ledgerQuery: true
eventSource: true
organizations:
Org1:
mspid: m-L3ASCXXBINCWRBTIRBGPP4BP7U
peers:
- nd-CJFWWNIMUJABLLEVL6YITQQMXI
certificateAuthorities:
- m-L3ASCXXBINCWRBTIRBGPP4BP7U
peers:
nd-CJFWWNIMUJABLLEVL6YITQQMXI:
url: grpcs://nd-cjfwwnimujabllevl6yitqqmxi.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.managedblockchain.us-east-1.amazonaws.com:30003
eventUrl: grpcs://nd-cjfwwnimujabllevl6yitqqmxi.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.managedblockchain.us-east-1.amazonaws.com:30004
grpcOptions:
ssl-target-name-override: nd-CJFWWNIMUJABLLEVL6YITQQMXI
tlsCACerts:
path: /home/ec2-user/managedblockchain-tls-chain.pem
certificateAuthorities:
m-L3ASCXXBINCWRBTIRBGPP4BP7U:
url: https://ca.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.managedblockchain.us-east-1.amazonaws.com:30002
httpOptions:
verify: false
tlsCACerts:
path: /home/ec2-user/managedblockchain-tls-chain.pem
caName: m-L3ASCXXBINCWRBTIRBGPP4BP7U
The following solution applies to:
HLF v1.4.7 AWS Managed Blockchain
Fabric client [fabric-sdk-go v1.0.0] Gateway programming model
To resolve the issue just remove the grpcOptions stanza

Istio1.9 integration with virtual machine (aws ec2) getting host file as empty

I have installed mysql in a VM and wanted my EKS with istio 1.9 installed to talk with them, i am following this https://istio.io/latest/docs/setup/install/virtual-machine/ but when am doing this step the host file which getting generated is empty file.
With this empty host file i tried but when starting the vm with this command am getting
> sudo systemctl start istio
when tailed this file
*/var/log/istio/istio.log*
2021-03-22T18:44:02.332421Z info Proxy role ips=[10.8.1.179 fe80::dc:36ff:fed3:9eea] type=sidecar id=ip-10-8-1-179.vm domain=vm.svc.cluster.local
2021-03-22T18:44:02.332429Z info JWT policy is third-party-jwt
2021-03-22T18:44:02.332438Z info Pilot SAN: [istiod.istio-system.svc]
2021-03-22T18:44:02.332443Z info CA Endpoint istiod.istio-system.svc:15012, provider Citadel
2021-03-22T18:44:02.332997Z info Using CA istiod.istio-system.svc:15012 cert with certs: /etc/certs/root-cert.pem
2021-03-22T18:44:02.333093Z info citadelclient Citadel client using custom root cert: istiod.istio-system.svc:15012
2021-03-22T18:44:02.410934Z info ads All caches have been synced up in 82.7974ms, marking server ready
2021-03-22T18:44:02.411247Z info sds SDS server for workload certificates started, listening on "./etc/istio/proxy/SDS"
2021-03-22T18:44:02.424855Z info sds Start SDS grpc server
2021-03-22T18:44:02.425044Z info xdsproxy Initializing with upstream address "istiod.istio-system.svc:15012" and cluster "Kubernetes"
2021-03-22T18:44:02.425341Z info Starting proxy agent
2021-03-22T18:44:02.425483Z info dns Starting local udp DNS server at localhost:15053
2021-03-22T18:44:02.427627Z info dns Starting local tcp DNS server at localhost:15053
2021-03-22T18:44:02.427683Z info Opening status port 15020
2021-03-22T18:44:02.432407Z info Received new config, creating new Envoy epoch 0
2021-03-22T18:44:02.433999Z info Epoch 0 starting
2021-03-22T18:44:02.690764Z warn ca ca request failed, starting attempt 1 in 91.93939ms
2021-03-22T18:44:02.693579Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-proxy --service-node sidecar~10.8.1.179~ip-10-8-1-179.vm~vm.svc.cluster.local --local-address-ip-version v4 --bootstrap-version 3 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error --concurrency 2]
2021-03-22T18:44:02.782817Z warn ca ca request failed, starting attempt 2 in 195.226287ms
2021-03-22T18:44:02.978344Z warn ca ca request failed, starting attempt 3 in 414.326774ms
2021-03-22T18:44:03.392946Z warn ca ca request failed, starting attempt 4 in 857.998629ms
2021-03-22T18:44:04.251227Z warn sds failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.8.0.2:53: no such host"
2021-03-22T18:44:04.849207Z warn ca ca request failed, starting attempt 1 in 91.182413ms
2021-03-22T18:44:04.940652Z warn ca ca request failed, starting attempt 2 in 207.680983ms
2021-03-22T18:44:05.148598Z warn ca ca request failed, starting attempt 3 in 384.121814ms
2021-03-22T18:44:05.533019Z warn ca ca request failed, starting attempt 4 in 787.704352ms
2021-03-22T18:44:06.321042Z warn sds failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.8.0.2:53: no such host"

Web connection refused on EC2 Meteor deployment

I am using MUP to deploy a meteor app to an EC2 instance running Ubuntu 18. My deployment seems to work, but when I try to access the public URL of the instance in my browser, I get "connection refused." I'm going crazy with this one!
I assume this would be an AWS issue like a port not open, but my EC2 inbound rules seem like they should work:
I SSH'ed into the instance to see if everything is working, and I think it is. For starters, the docker container seems to be running fine:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2b70717ce5c9 mup-oil-pricing:latest "/bin/sh -c 'exec $M…" About an hour ago Up About an hour 0.0.0.0:80->80/tcp oil-pricing
While still SSH'ed in, when I hit curl localhost:80 I get back HTML in the console, which suggests the app (a Meteor app) is running fine.
I checked to see if the Ubuntu firewall is active, and I don't think it is:
ubuntu#ip-172-30-1-118:~$ sudo ufw status verbose
Status: inactive
My ports also seem fine (as far as I can tell):
ubuntu#ip-172-30-1-118:~$ sudo netstat -tulpn | grep LISTEN
tcp 0 0 10.0.3.1:53 0.0.0.0:* LISTEN 3230/dnsmasq
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 344/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 7903/sshd: /usr/sbi
tcp6 0 0 :::22 :::* LISTEN 7903/sshd: /usr/sbi
tcp6 0 0 :::80 :::* LISTEN 13597/docker-proxy
But when I go to Chrome on my local machine and try to access the site using the EC2 instance via the Elastic IP I've assigned (34.231.39.181) or via the EC2 address (https://ec2-34-231-39-181.compute-1.amazonaws.com/) I get :
This site can’t be reached
ec2-34-231-39-181.compute-1.amazonaws.com refused to connect.
I don't think it's a MUP issue, but here's the MUP config just in case that matters:
module.exports = {
servers: {
one: {
host: '34.231.39.181',
username: 'ubuntu',
pem: [[MY PEM FILE]]
}
},
hooks: {
'pre.deploy': {
remoteCommand: 'docker system prune -a --force' // PRUNE DOCKER IMAGES
},
},
app: {
name: 'oil-pricing',
path: '../',
servers: {
one: {},
},
buildOptions: {
serverOnly: true,
},
env: {
ROOT_URL: 'https://ec2-34-231-39-181.compute-1.amazonaws.com/',
MONGO_URL: [[MY MONGO URL]]
PORT: 80,
},
docker: {
image: 'abernix/meteord:node-8.15.1-base', // per: https://github.com/zodern/meteor-up/issues/692
},
enableUploadProgressBar: true
},
};
When I run mup deploy everything checks out:
Started TaskList: Pushing Meteor App
[34.231.39.181] - Pushing Meteor App Bundle to the Server
[34.231.39.181] - Pushing Meteor App Bundle to the Server: SUCCESS
[34.231.39.181] - Prepare Bundle
[34.231.39.181] - Prepare Bundle: SUCCESS
Started TaskList: Configuring App
[34.231.39.181] - Pushing the Startup Script
[34.231.39.181] - Pushing the Startup Script: SUCCESS
[34.231.39.181] - Sending Environment Variables
[34.231.39.181] - Sending Environment Variables: SUCCESS
Started TaskList: Start Meteor
[34.231.39.181] - Start Meteor
[34.231.39.181] - Start Meteor: SUCCESS
[34.231.39.181] - Verifying Deployment
[34.231.39.181] - Verifying Deployment: SUCCESS
I'm using Meteor 1.8.1 if that matters.
Any help would be greatly appreciated!
Your sudo netstat -tulpn | grep LISTEN shows that you are listening on port 80. But you are using HTTPS in:
https://ec2-34-231-39-181.compute-1.amazonaws.com
This will connect to port 443, which nothing listens to. So either change your app to listen for HTTPS connections on port 443 (will require proper ssl certificates), or use HTTP which will go to port 80 (unencrypted):
http://ec2-34-231-39-181.compute-1.amazonaws.com

istio upgrade from 1.4.6 -> 1.5.0 throws istiod erros : remote error: tls: error decrypting message

Just upgraded istio from 1.4.6 (helm) to istio 1.5.0 (istioctl) [Purged istio and installed from istioctl] but it appears the istiod logs keep throwing the following :
2020-03-16T18:25:45.209055Z info grpc: Server.Serve failed to complete security handshake from "10.150.56.111:56870": remote error: tls: error decrypting message
2020-03-16T18:25:46.792447Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.112:49162": remote error: tls: error decrypting message
2020-03-16T18:25:46.930483Z info grpc: Server.Serve failed to complete security handshake from "10.150.56.160:36878": remote error: tls: error decrypting message
2020-03-16T18:25:48.284122Z info grpc: Server.Serve failed to complete security handshake from "10.150.52.230:44758": remote error: tls: error decrypting message
2020-03-16T18:25:48.288180Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.149:56756": remote error: tls: error decrypting message
2020-03-16T18:25:49.108515Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.151:53970": remote error: tls: error decrypting message
2020-03-16T18:25:49.111874Z info Handling event update for pod contentgatewayaidest-7f4694d87-qmq8z in namespace djin-content -> 10.150.53.50
2020-03-16T18:25:49.519861Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.91:59510": remote error: tls: error decrypting message
2020-03-16T18:25:50.133664Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.203:59726": remote error: tls: error decrypting message
2020-03-16T18:25:50.331020Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.195:59970": remote error: tls: error decrypting message
2020-03-16T18:25:52.110695Z info Handling event update for pod contentgateway-d74b44c7-dtdxs in namespace djin-content -> 10.150.56.215
2020-03-16T18:25:53.312761Z info Handling event update for pod dysonpriority-b6dbc589b-mk628 in namespace djin-content -> 10.150.52.91
2020-03-16T18:25:53.496524Z info grpc: Server.Serve failed to complete security handshake from "10.150.56.111:57276": remote error: tls: error decrypting message
This also leads to no sidecars successfully launching and failing with :
2020-03-16T18:32:17.265394Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 16 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-03-16T18:32:19.269334Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 16 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-03-16T18:32:21.265214Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 16 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-03-16T18:32:23.266159Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 16 successful, 0 rejected; lds updates: 0 successful,
Weirdly other clusters that I upgraded go through fine. Any idea where this error might be popping up from ? istioctl analyze works fine.
error goes away after killing the nodes (recreating) but istio-proxies still fail with :
info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 0 rejected
As far as I know since version 1.4.4 they add istioctl upgrade, which should be used when You want to upgrade istio from 1.4.x to 1.5.0.
The istioctl upgrade command performs an upgrade of Istio. Before performing the upgrade, it checks that the Istio installation meets the upgrade eligibility criteria. Also, it alerts the user if it detects any changes in the profile default values between Istio versions.
The upgrade command can also perform a downgrade of Istio.
See the istioctl upgrade reference for all the options provided by the istioctl upgrade command.
istioctl upgrade --help
The upgrade command checks for upgrade version eligibility and, if eligible, upgrades the Istio control plane components in-place. Warning: traffic may be disrupted during upgrade. Please ensure PodDisruptionBudgets are defined to maintain service continuity.
I made a test on gcp cluster with istio 1.4.6 installed with istioctl and then I used istioctl upgrade from version 1.5.0 and everything works fine.
kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
istio-ingressgateway-598796f4d9-lvzdb 1/1 Running 0 12m
istiod-7d9c7bdd6-mggx7 1/1 Running 0 12m
prometheus-b47d8c58c-7spq5 2/2 Running 0 12m
I checked the logs and made some simple examples and no errors occurs in istiod like in your example.
Upgrade prerequisites for istioctl upgrade
Ensure you meet these requirements before starting the upgrade process:
Istio version 1.4.4 or higher is installed.
Your Istio installation was installed using istioctl.
I assume because of the differences between 1.4.x and 1.5.0 there might be some issues when you want to use both of the installatio methods, helm and istioctl. The best option here would be to install istio 1.4.6 with istioctl and then upgrade it to 1.5.0.
I hope this answer your question. Let me know if you have any more questions.

Kubernetes ingress on GKE results in 502 response on http / SSL_ERROR_SYSCALL on https

I've tested my configuration on minikube where it works perfectly, however on GKE I run into an error of HTTP responding with 502 while the HTTPS gets the connection terminated?
I have no idea how to diagnose this issue, which logs could I look at?
Here is a verbose curl log when accessing over https://
* Expire in 0 ms for 1 (transfer 0x1deb470)
* Expire in 0 ms for 1 (transfer 0x1deb470)
* Expire in 0 ms for 1 (transfer 0x1deb470)
* Trying 35.244.154.110...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x1deb470)
* Connected to chrischrisexample.de (35.244.154.110) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:#STRENGTH
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to chrischrisexample.de:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to chrischrisexample.de:443
To solve it I had to:
Respond with a HTTP 200 on the health check (from the Google load balancer!)
Set a SSL certificate secret in the ingress (even if a self signed one)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Sync 14m (x20 over 157m) loadbalancer-controller Could not find TLS certificates. Continuing setup for the load balancer to serve HTTP. Note: this behavior is deprecated and will be removed in a future version of ingress-gce
Warning Translate 3m56s (x18 over 9m24s) loadbalancer-controller error while evaluating the ingress spec: could not find port "80" in service "default/app"; could not find port "80" in service "default/app"; could not find port "80" in service "default/app"; could not find port "80" in service "default/app"
These errors were shown on the kubectl describe ingress... Still doesn't make sense why it would error on the SSL handshake / connection though.