How to Setup PKI for Multicluster Istio Mesh - istio

I am trying to configure a multicluster mesh topology with replicated controls planes with Istio, as described in https://istio.io/docs/setup/install/multicluster/gateways/. My PKI setup has 3 tiers and looks like the following.
PKI Hierarchy
A Root CA (root-ca.pem)
Intermediate CA to sign each clusters Citadel CA (intermediate-ca.pem)
Each cluster's Citadel CA (ca-cert.pem)
Following the install instructions, I install the certificates into the istio-system namespace with the following command.
kubectl create secret generic cacerts -n istio-system --from-file=./ca-cert.pem \
--from-file=./ca-key.pem --from-file=./root-cert.pem \
--from-file=./cert-chain.pem
In this command, ca-cert.pem is the cluster's CA certificate. ca-key.pem is the private key for ca-cert. cert-chain.pem is the full chain of the ca-cert.pem ie. cert-chain.pem=$(cat ca-cert.pem intermediate-ca.pem root-ca.pem)
When I install this setup into a cluster, mTLS works fine within the cluster using my custom CA as expected. However, when I go to setup the multicluster environment, calls from cluster A to cluster b fail the root certificate validation.
Does anyone have insight into why these certificates would not be trusted when they share the same root CA structure?
Update: I believe this might have to do with the ingress-gateway of destination cluster crashing when it tries to proxy the connection to the backend service.
[Envoy (Epoch 0)] [2020-04-07 15:58:34.193][22][debug][filter] [external/envoy/source/common/tcp_proxy/tcp_proxy.cc:232] [C2] new tcp proxy session
[Envoy (Epoch 0)] [2020-04-07 15:58:34.193][22][trace][connection] [external/envoy/source/common/network/connection_impl.cc:294] [C2] readDisable: enabled=true disable=true state=0
[Envoy (Epoch 0)] [2020-04-07 15:58:34.194][22][trace][filter] [external/envoy/source/extensions/filters/network/sni_cluster/sni_cluster.cc:16] [C2] sni_cluster: new connection with server name outbound_.80_._.nginx.istio-fkt.global
[Envoy (Epoch 0)] [2020-04-07 15:58:34.194][22][trace][filter] [src/envoy/tcp/tcp_cluster_rewrite/tcp_cluster_rewrite.cc:55] [C2] tcp_cluster_rewrite: new connection with server name outbound_.80_._.nginx.istio-fkt.global
[Envoy (Epoch 0)] [2020-04-07 15:58:34.194][22][trace][filter] [src/envoy/tcp/tcp_cluster_rewrite/tcp_cluster_rewrite.cc:64] [C2] tcp_cluster_rewrite: final tcp proxy cluster name outbound_.80_._.nginx.istio-fkt.svc.cluster.local
[Envoy (Epoch 0)] [2020-04-07 15:58:34.194][22][critical][main] [external/envoy/source/exe/terminate_handler.cc:13] std::terminate called! (possible uncaught exception, see trace)
[Envoy (Epoch 0)] [2020-04-07 15:58:34.194][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:70] Backtrace (use tools/stack_decode.py to get line numbers):
[Envoy (Epoch 0)] [2020-04-07 15:58:34.194][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:71] Envoy version: 73f240a29bece92a8882a36893ccce07b4a54664/1.13.1-dev/Clean/RELEASE/BoringSSL
[Envoy (Epoch 0)] [2020-04-07 15:58:34.205][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #0: Envoy::TerminateHandler::logOnTerminate()::$_0::operator()() [0x562ba8ae7dae]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.216][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:77] #1: [0x562ba8ae7cb9]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.225][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #2: std::__terminate() [0x562ba904aa73]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.234][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #3: Envoy::Tcp::TcpClusterRewrite::TcpClusterRewriteFilter::onNewConnection() [0x562ba7209c4d]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.244][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #4: Envoy::Network::FilterManagerImpl::onContinueReading() [0x562ba862a582]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.256][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #5: Envoy::Network::FilterManagerImpl::initializeReadFilters() [0x562ba862a4e5]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.267][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #6: Envoy::Server::ConnectionHandlerImpl::ActiveTcpListener::newConnection() [0x562ba861a547]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.278][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #7: Envoy::Server::ConnectionHandlerImpl::ActiveTcpSocket::continueFilterChain() [0x562ba861a1fb]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.287][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #8: Envoy::Server::ConnectionHandlerImpl::ActiveTcpListener::onAcceptWorker() [0x562ba861a2f1]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.295][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #9: Envoy::Network::ListenerImpl::listenCallback() [0x562ba862dd4c]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.306][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #10: listener_read_cb [0x562ba89547c3]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.317][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #11: event_process_active_single_queue [0x562ba89529ab]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.329][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #12: event_base_loop [0x562ba895123e]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.341][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #13: Envoy::Server::WorkerImpl::threadRoutine() [0x562ba8617278]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.352][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #14: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::$_0::__invoke() [0x562ba8b1d953]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.352][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #15: start_thread [0x7ff80cbd16db]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.352][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:83] Caught Aborted, suspect faulting address 0x10
[Envoy (Epoch 0)] [2020-04-07 15:58:34.352][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:70] Backtrace (use tools/stack_decode.py to get line numbers):
[Envoy (Epoch 0)] [2020-04-07 15:58:34.352][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:71] Envoy version: 73f240a29bece92a8882a36893ccce07b4a54664/1.13.1-dev/Clean/RELEASE/BoringSSL
[Envoy (Epoch 0)] [2020-04-07 15:58:34.352][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #0: __restore_rt [0x7ff80cbdc890]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:77] #1: [0x562ba8ae7cb9]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #2: std::__terminate() [0x562ba904aa73]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #3: Envoy::Tcp::TcpClusterRewrite::TcpClusterRewriteFilter::onNewConnection() [0x562ba7209c4d]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #4: Envoy::Network::FilterManagerImpl::onContinueReading() [0x562ba862a582]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #5: Envoy::Network::FilterManagerImpl::initializeReadFilters() [0x562ba862a4e5]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #6: Envoy::Server::ConnectionHandlerImpl::ActiveTcpListener::newConnection() [0x562ba861a547]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #7: Envoy::Server::ConnectionHandlerImpl::ActiveTcpSocket::continueFilterChain() [0x562ba861a1fb]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #8: Envoy::Server::ConnectionHandlerImpl::ActiveTcpListener::onAcceptWorker() [0x562ba861a2f1]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #9: Envoy::Network::ListenerImpl::listenCallback() [0x562ba862dd4c]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #10: listener_read_cb [0x562ba89547c3]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #11: event_process_active_single_queue [0x562ba89529ab]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #12: event_base_loop [0x562ba895123e]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #13: Envoy::Server::WorkerImpl::threadRoutine() [0x562ba8617278]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #14: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::$_0::__invoke() [0x562ba8b1d953]
[Envoy (Epoch 0)] [2020-04-07 15:58:34.363][22][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #15: start_thread [0x7ff80cbd16db]
2020-04-07T15:58:34.392193Z error Epoch 0 exited with error: signal: aborted (core dumped)
2020-04-07T15:58:34.392220Z info No more active epochs, terminating

Related

AWS OpenVPN dns not resolving

We're using an OpenVPN server on AWS which we configured using this tutorial. However, when we connect to the VPN the internet does not seem to work, because the DNS is not resolving anything. When we switch the DNS to 8.8.8.8 in the configuration panel, everything works as expected.
We've tried reinstalling everything from scratch, but the problem remains the same. We used the standard AWS AMI template for OpenVPN provided by AWS.
Our DNS is:
nameserver[0] : 172.31.0.2
nameserver[0] : 172.31.0.2
When I ping this IP this is the response:
Request timeout for icmp_seq 0
ping: sendto: No route to host
I've executed some commands to provide more information:
dig #127.0.0.1 google.com
; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> #127.0.0.1 google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
dig google.com
; <<>> DiG 9.11.3-1ubuntu1.17-Ubuntu <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45371
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 124 IN A 142.250.185.238
;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue Jul 19 07:30:15 UTC 2022
;; MSG SIZE rcvd: 55

AWS IoT closes mqtt connection after Handshake

I have an implementation which uses AWS SDK to connect AWS IoT. It works well on Linux.
I am trying to port it to FreeRTOS based embedded system.
mbedtls is used in AWS SDK with ssl wrapper.
There are small modification on mbedtls side (I provide time from sntp to mbedtls).
When I enabled mbedtls debugs, I am seeing that everything is fine and handshake is completed. But after handshake I am getting connection close message from AWS SDK.
ssl_cli.c : 3303 - client state: MBEDTLS_SSL_FLUSH_BUFFERS (14)
ssl_cli.c : 3303 - client state: MBEDTLS_SSL_HANDSHAKE_WRAPUP (15)
ssl_tls.c : 5024 - <= handshake wrapup
ssl_tls.c : 6346 - <= handshake
ssl_tls.c : 2701 - => write record
ssl_tls.c : 1258 - => encrypt buf
ssl_tls.c : 1400 - before encrypt: msglen = 125, including 0 bytes of padding
ssl_tls.c : 1560 - <= encrypt buf
ssl_tls.c : 2838 - output record: msgtype = 23, version = [3:3], msglen = 141
ssl_tls.c : 2416 - => flush output
ssl_tls.c : 2435 - message length: 146, out_left: 146
ssl_tls.c : 2441 - ssl->f_send() returned 146 (-0xffffff6e)
ssl_tls.c : 2460 - <= flush output
ssl_tls.c : 2850 - <= write record
ssl_tls.c : 6883 - <= write
ssl_tls.c : 6514 - => read
ssl_tls.c : 3728 - => read record
ssl_tls.c : 2208 - => fetch input
ssl_tls.c : 2366 - in_left: 0, nb_want: 5
ssl_tls.c : 2390 - in_left: 0, nb_want: 5
ssl_tls.c : 2391 - ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)
ssl_tls.c : 2403 - <= fetch input
ssl_tls.c : 3488 - input record: msgtype = 21, version = [3:3], msglen = 26
ssl_tls.c : 2208 - => fetch input
ssl_tls.c : 2366 - in_left: 5, nb_want: 31
ssl_tls.c : 2390 - in_left: 5, nb_want: 31
ssl_tls.c : 2391 - ssl->f_recv(_timeout)() returned 26 (-0xffffffe6)
ssl_tls.c : 2403 - <= fetch input
ssl_tls.c : 1576 - => decrypt buf
ssl_tls.c : 2051 - <= decrypt buf
ssl_tls.c : 3961 - **got an alert message, type: [1:0]**
ssl_tls.c : 3976 - **is a close notify message**
As I read, "got an alert message, type: [1:0]" means AWS closes the connection but why and what does it mean?
I saw an "Application Data" entry in the Wireshark. So probably I am getting AWS Close alert in the middle of Application data transaction.
I also saw a comment like "it means certificate is not permissive enough for AWS" but I am using same certificates for both Linux and embedded side.
Any idea. How can I debug it?
I would recommend to triple check your policy (and certificates and the links between them and your thing).
I had the same issue and the solution was to change the policy from:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iot :*",
"Resource": "*"
}
]
}
to:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iot:*",
"Resource": "*"
}
]
}
i.e.: The "Action" string has lost the space character.

Downsizing/Scaling-in MySQL cluster nodes

I am trying to setup mysql cluster within aws autoscaling group. I am starting out with two ec2 instances, each with their own management(ndb_mgmd), data (ndbmtd) and sql (mysqld) node. When scaling out (I have configured live scale out which works fine), it adds two more ec2 instances (because the number of replication is set to 2 for ndbd) and creates a new nodegroup.
Now since I cant control exactly which instances aws shuts down during a scale-in event, it always takes out a whole nodegroup rendering the cluster invalid and causing it to crash.
From what I can see mysql cluster is not really designed to scale-in online, but is there a way I can achieve this without bringing the whole system down for maintenance? The idea is to add new identical instances to the cluster during scale-out and take instances off during scale-in events fired by aws autoscaling group.
Let me know if I missed out on any details, cheers!
This is what the initial config looks like:
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=1 #10.0.0.149 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0, *)
id=2 #10.0.0.81 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=101 #10.0.0.149 (mysql-5.6.31 ndb-7.4.12)
id=102 #10.0.0.81 (mysql-5.6.31 ndb-7.4.12)
[mysqld(API)] 2 node(s)
id=51 #10.0.0.149 (mysql-5.6.31 ndb-7.4.12)
id=52 #10.0.0.81 (mysql-5.6.31 ndb-7.4.12)
This is an example of scaled out version of the same cluster (+2 instances):
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=1 #10.0.0.149 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0, *)
id=2 #10.0.0.81 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0)
id=3 #10.0.0.151 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 1)
id=4 #10.0.0.83 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 1)
[ndb_mgmd(MGM)] 4 node(s)
id=101 #10.0.0.149 (mysql-5.6.31 ndb-7.4.12)
id=102 #10.0.0.81 (mysql-5.6.31 ndb-7.4.12)
id=103 #10.0.0.151 (mysql-5.6.31 ndb-7.4.12)
id=104 #10.0.0.83 (mysql-5.6.31 ndb-7.4.12)
[mysqld(API)] 4 node(s)
id=51 #10.0.0.149 (mysql-5.6.31 ndb-7.4.12)
id=52 #10.0.0.81 (mysql-5.6.31 ndb-7.4.12)
id=53 #10.0.0.151 (mysql-5.6.31 ndb-7.4.12)
id=54 #10.0.0.83 (mysql-5.6.31 ndb-7.4.12)

connection issue with hazelcast on amazon AWS

I am using Hazelcast v3.6 on two amazon AWS virtual machines (not using the AWS specific settings for hazelcast). The connection is supposed to work via TCP/IP connection settings (not multicasting). I have opened 5701-5801 address for connection on the virtual machines.
I have tried using iperf on the two virtual machines using which I can see that the client on one VM connects to the server on another VM (and vice versa when I switch the client server setup for iperf).
When I launch two Hazelcast servers on different VM's, the connection is not established. The log statements and the hazelcast.xml config are given below (I am not using the programmatic settings for Hazelcast). I have changed the IP addresses below:
20160401-16:41:02.812 [cached2] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Connecting to /22.23.24.25:5701, timeout: 0, bind-any: true
20160401-16:41:02.812 [cached3] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Connecting to /22.23.24.25:5703, timeout: 0, bind-any: true
20160401-16:41:02.813 [cached1] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Connecting to /22.23.24.25:5702, timeout: 0, bind-any: true
20160401-16:41:02.816 [cached1] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Could not connect to: /22.23.24.25:5702. Reason: SocketException[Connection refused to address /22.23.24.25:570
2]
20160401-16:41:02.816 [cached1] TcpIpJoiner INFO - [45.46.47.48]:5701 [dev] [3.6] Address[22.23.24.25]:5702 is added to the blacklist.
20160401-16:41:02.817 [cached3] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Could not connect to: /22.23.24.25:5703. Reason: SocketException[Connection refused to address /22.23.24.25:570
3]
20160401-16:41:02.817 [cached3] TcpIpJoiner INFO - [45.46.47.48]:5701 [dev] [3.6] Address[22.23.24.25]:5703 is added to the blacklist.
20160401-16:41:02.834 [cached2] TcpIpConnectionManager INFO - [45.46.47.48]:5701 [dev] [3.6] Established socket connection between /45.46.47.48:51965 and /22.23.24.25:5701
20160401-16:41:02.849 [hz._hzInstance_1_dev.IO.thread-in-0] TcpIpConnection INFO - [45.46.47.48]:5701 [dev] [3.6] Connection [Address[22.23.24.25]:5701] lost. Reason: java.io.EOFException[Remote socket
closed!]
20160401-16:41:02.851 [hz._hzInstance_1_dev.IO.thread-in-0] NonBlockingSocketReader WARN - [45.46.47.48]:5701 [dev] [3.6] hz._hzInstance_1_dev.IO.thread-in-0 Closing socket to endpoint Address[54.89.161.2
28]:5701, Cause:java.io.EOFException: Remote socket closed!
20160401-16:41:03.692 [cached2] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Connecting to /22.23.24.25:5701, timeout: 0, bind-any: true
20160401-16:41:03.693 [cached2] TcpIpConnectionManager INFO - [45.46.47.48]:5701 [dev] [3.6] Established socket connection between /45.46.47.48:60733 and /22.23.24.25:5701
20160401-16:41:03.696 [hz._hzInstance_1_dev.IO.thread-in-1] TcpIpConnection INFO - [45.46.47.48]:5701 [dev] [3.6] Connection [Address[22.23.24.25]:5701] lost. Reason: java.io.EOFException[Remote socket
closed!]
Part of Hazelcast config
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.6.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<group>
<name>abc</name>
<password>defg</password>
</group>
<network>
<port auto-increment="true" port-count="100">5701</port>
<outbound-ports>
<ports>0-5900</ports>
</outbound-ports>
<join>
<multicast enabled="false">
<!--<multicast-group>224.2.2.3</multicast-group>
<multicast-port>54327</multicast-port>-->
</multicast>
<tcp-ip enabled="true">
<member>22.23.24.25</member>
</tcp-ip>
</join>
<interfaces enabled="true">
<interface>45.46.47.48</interface>
</interfaces>
<ssl enabled="false" />
<socket-interceptor enabled="false" />
<symmetric-encryption enabled="false">
<algorithm>PBEWithMD5AndDES</algorithm>
<!-- salt value to use when generating the secret key -->
<salt>thesalt</salt>
<!-- pass phrase to use when generating the secret key -->
<password>thepass</password>
<!-- iteration count to use when generating the secret key -->
<iteration-count>19</iteration-count>
</symmetric-encryption>
</network>
<partition-group enabled="false"/>
iperf server and client log statements
Server listening on TCP port 5701
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 22.23.24.25, TCP port 5701
TCP window size: 1.33 MByte (default)
------------------------------------------------------------
[ 5] local 172.31.17.104 port 57398 connected with 22.23.24.25 port 5701
[ 4] local 172.31.17.104 port 5701 connected with 22.23.24.25 port 55589
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 662 MBytes 555 Mbits/sec
[ 4] 0.0-10.0 sec 797 MBytes 666 Mbits/sec
Server listening on TCP port 5701
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local xxx.xx.xxx.xx port 5701 connected with 22.23.24.25 port 57398
------------------------------------------------------------
Client connecting to 22.23.24.25, TCP port 5701
TCP window size: 1.62 MByte (default)
------------------------------------------------------------
[ 6] local 172.31.17.23 port 55589 connected with 22.23.24.25 port 5701
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 797 MBytes 669 Mbits/sec
[ 4] 0.0-10.0 sec 662 MBytes 553 Mbits/sec
Note:
I forgot to mention that I can connect from hazelcast client to server i.e. when I use a hazelcast client to connect to a single hazlecast server node, I am able to connect just fine
An outbound ports range which includes 0 is interpreted by hazelcast as "use ephemeral ports", so the <outbound-ports> element has actually no effect in your configuration. There is an associated test in hazelcast sources: https://github.com/hazelcast/hazelcast/blob/75251c4f01d131a9624fc3d0c4190de5cdf7d93a/hazelcast/src/test/java/com/hazelcast/nio/NodeIOServiceTest.java#L60

WSO2 BAM No space left on device

Hi I´m running BAM in a server for testing, this server have the following filesystem distribution:
root#serv:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VG-LV_root_ 3.8G 2.3G 1.4G 62% /
udev 3.0G 4.0K 3.0G 1% /dev
tmpfs 1.2G 224K 1.2G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 3.0G 0 3.0G 0% /run/shm
/dev/mapper/VG-LV_var 7.3G 4.5G 2.5G 65% /var
I´m put BAM in var with 7.3GB of space.
As you can see I have enough space to run the WSO2 product but BAM :-D
After running BAm for 3 days i´m facing problems with the space:
In tmp I have 1GB with hadoop
root#serv:/# du -sh /tmp/*
1023M /tmp/hadoop
4.0K /tmp/hadoop-root
4.0K /tmp/hsperfdata_root
4.0K /tmp/mc-root
117M /tmp/root
And inside BAM it´s used 2.9GB in tmp.
root#serv:/# du -sh /var/BAM/wso2bam-2.0.1/*
160K /var/BAM/wso2bam-2.0.1/bin
236K /var/BAM/wso2bam-2.0.1/dbscripts
8.0K /var/BAM/wso2bam-2.0.1/INSTALL.txt
5.0M /var/BAM/wso2bam-2.0.1/lib
52K /var/BAM/wso2bam-2.0.1/LICENSE.txt
12K /var/BAM/wso2bam-2.0.1/README.txt
8.0K /var/BAM/wso2bam-2.0.1/release-notes.html
540M /var/BAM/wso2bam-2.0.1/repository
80K /var/BAM/wso2bam-2.0.1/resources
14M /var/BAM/wso2bam-2.0.1/samples
2.9G /var/BAM/wso2bam-2.0.1/tmp
88K /var/BAM/wso2bam-2.0.1/webapp-mode
4.0K /var/BAM/wso2bam-2.0.1/wso2carbon.pid
My question is related to the fact that I still have space in disk and in just 3 days BAM used 4GB so: what is the best HDD size for a long time deployment or what can I do to avoid this errors(in a production enviroment I can use a lot more of space in disk but it´s a concern of my clients):
noted that in this 3 days I just monitoring 2 servers, AS and ESB, with minimun load.
TID: [0] [BAM] [2012-11-06 12:00:00,026] INFO {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask} - Running script executor task for script esb_stats_0. [Tue Nov 06 12:00:00 CST 2012] {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask}
TID: [0] [BAM] [2012-11-06 12:00:00,026] INFO {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask} - Running script executor task for script service_stats_848. [Tue Nov 06 12:00:00 CST 2012] {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask}
TID: [0] [BAM] [2012-11-06 12:00:01,213] ERROR {org.apache.hadoop.hive.ql.exec.ExecDriver} - Exception: /var/BAM/wso2bam-2.0.1/tmp/root/hive_2012-11-06_12-00-00_487_706674934402805629/-local-10000/plan.xml (No space left on device) {org.apache.hadoop.hive.ql.exec.ExecDriver}
TID: [0] [BAM] [2012-11-06 12:00:01,213] ERROR {org.apache.hadoop.hive.ql.exec.ExecDriver} - Exception: /var/BAM/wso2bam-2.0.1/tmp/root/hive_2012-11-06_12-00-00_487_706674934402805629/-local-10000/plan.xml (No space left on device) {org.apache.hadoop.hive.ql.exec.ExecDriver}
TID: [0] [BAM] [2012-11-06 12:00:01,216] ERROR {org.apache.hadoop.hive.ql.Driver} - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask {org.apache.hadoop.hive.ql.Driver}
TID: [0] [BAM] [2012-11-06 12:00:01,216] ERROR {org.apache.hadoop.hive.ql.Driver} - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask {org.apache.hadoop.hive.ql.Driver}
TID: [0] [BAM] [2012-11-06 12:00:01,219] ERROR {org.apache.hadoop.hive.ql.exec.ExecDriver} - Exception: /var/BAM/wso2bam-2.0.1/tmp/root/hive_2012-11-06_12-00-00_500_8714455975079609718/-local-10000/plan.xml (No space left on device) {org.apache.hadoop.hive.ql.exec.ExecDriver}
TID: [0] [BAM] [2012-11-06 12:00:01,219] ERROR {org.apache.hadoop.hive.ql.exec.ExecDriver} - Exception: /var/BAM/wso2bam-2.0.1/tmp/root/hive_2012-11-06_12-00-00_500_8714455975079609718/-local-10000/plan.xml (No space left on device) {org.apache.hadoop.hive.ql.exec.ExecDriver}
TID: [0] [BAM] [2012-11-06 12:00:01,221] ERROR {org.apache.hadoop.hive.ql.Driver} - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask {org.apache.hadoop.hive.ql.Driver}
TID: [0] [BAM] [2012-11-06 12:00:01,221] ERROR {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl} - Error while executing Hive script.
Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl}
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:325)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:225)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
TID: [0] [BAM] [2012-11-06 12:00:01,221] ERROR {org.apache.hadoop.hive.ql.Driver} - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask {org.apache.hadoop.hive.ql.Driver}
TID: [0] [BAM] [2012-11-06 12:00:01,223] ERROR {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl} - Error while executing Hive script.
Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl}
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:325)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:225)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
TID: [0] [BAM] [2012-11-06 12:00:01,231] ERROR {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask} - Error while executing script : esb_stats_0 {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask}
org.wso2.carbon.analytics.hive.exception.HiveExecutionException: Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl.execute(HiveExecutorServiceImpl.java:110)
at org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask.execute(HiveScriptExecutorTask.java:60)
at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:56)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
TID: [0] [BAM] [2012-11-06 12:00:01,232] ERROR {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask} - Error while executing script : service_stats_848 {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask}
org.wso2.carbon.analytics.hive.exception.HiveExecutionException: Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl.execute(HiveExecutorServiceImpl.java:110)
at org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask.execute(HiveScriptExecutorTask.java:60)
at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:56)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
TID: [0] [BAM] [2012-11-06 12:00:01,743] INFO {org.wso2.carbon.core.services.util.CarbonAuthenticationUtil} - 'admin#carbon.super [-1234]' logged in at [2012-11-06 12:00:01,742-0500] {org.wso2.carbon.core.services.util.CarbonAuthenticationUtil}
TID: [0] [BAM] [2012-11-06 12:00:01,750] INFO {org.wso2.carbon.dashboard.gadgetrepopopulator.GadgetRepoPopulator} - Couldn't find a Dashboard at '/var/BAM/wso2bam-2.0.1/repository/resources/gadget-repo/gadget-repo.xml'. Giving up. {org.wso2.carbon.dashboard.gadgetrepopopulator.GadgetRepoPopulator}
TID: [0] [BAM] [2012-11-06 12:00:04,213] ERROR {org.apache.tiles.jsp.context.JspTilesRequestContext} - JSPException while including path '/hive-explorer/listscripts.jsp'. {org.apache.tiles.jsp.context.JspTilesRequestContext}
javax.servlet.jsp.JspException: ServletException while including page.
at org.apache.tiles.jsp.context.JspUtil.doInclude(JspUtil.java:102)
at org.apache.tiles.jsp.context.JspTilesRequestContext.include(JspTilesRequestContext.java:88)
at org.apache.tiles.jsp.context.JspTilesRequestContext.dispatch(JspTilesRequestContext.java:82)
at org.apache.tiles.impl.BasicTilesContainer.render(BasicTilesContainer.java:465)
at org.apache.tiles.jsp.taglib.InsertAttributeTag.render(InsertAttributeTag.java:140)
at org.apache.tiles.jsp.taglib.InsertAttributeTag.render(InsertAttributeTag.java:117)
at org.apache.tiles.jsp.taglib.RenderTagSupport.execute(RenderTagSupport.java:171)
at org.apache.tiles.jsp.taglib.RoleSecurityTagSupport.doEndTag(RoleSecurityTagSupport.java:75)
at org.apache.tiles.jsp.taglib.ContainerTagSupport.doEndTag(ContainerTagSupport.java:80)
at org.apache.jsp.admin.layout.template_jsp._jspx_meth_tiles_insertAttribute_7(template_jsp.java:539)
at org.apache.jsp.admin.layout.template_jsp._jspService(template_jsp.java:290)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:332)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.wso2.carbon.ui.JspServlet.service(JspServlet.java:161)
at org.wso2.carbon.ui.TilesJspServlet.service(TilesJspServlet.java:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.eclipse.equinox.http.helper.ContextPathServletAdaptor.service(ContextPathServletAdaptor.java:36)
at org.eclipse.equinox.http.servlet.internal.ServletRegistration.handleRequest(ServletRegistration.java:90)
at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:111)
at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:67)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.wso2.carbon.tomcat.ext.servlet.DelegationServlet.service(DelegationServlet.java:68)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:690)
at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:477)
at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:402)
at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:329)
at org.eclipse.equinox.http.servlet.internal.RequestDispatcherAdaptor.forward(RequestDispatcherAdaptor.java:30)
at org.eclipse.equinox.http.helper.ContextPathServletAdaptor$RequestDispatcherAdaptor.forward(ContextPathServletAdaptor.java:258)
at org.apache.tiles.servlet.context.ServletTilesRequestContext.forward(ServletTilesRequestContext.java:198)
at org.apache.tiles.servlet.context.ServletTilesRequestContext.dispatch(ServletTilesRequestContext.java:185)
at org.apache.tiles.impl.BasicTilesContainer.render(BasicTilesContainer.java:419)
at org.apache.tiles.impl.BasicTilesContainer.render(BasicTilesContainer.java:370)
at org.wso2.carbon.ui.action.ActionHelper.render(ActionHelper.java:52)
at org.wso2.carbon.ui.TilesJspServlet.service(TilesJspServlet.java:101)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.eclipse.equinox.http.helper.ContextPathServletAdaptor.service(ContextPathServletAdaptor.java:36)
at org.eclipse.equinox.http.servlet.internal.ServletRegistration.handleRequest(ServletRegistration.java:90)
at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:111)
at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:67)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.wso2.carbon.tomcat.ext.servlet.DelegationServlet.service(DelegationServlet.java:68)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.wso2.carbon.tomcat.ext.filter.CharacterSetFilter.doFilter(CharacterSetFilter.java:61)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.wso2.carbon.tomcat.ext.valves.CompositeValve.invoke(CompositeValve.java:172)
at org.wso2.carbon.tomcat.ext.valves.CarbonStuckThreadDetectionValve.invoke(CarbonStuckThreadDetectionValve.java:156)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at org.wso2.carbon.tomcat.ext.valves.CarbonContextCreatorValve.invoke(CarbonContextCreatorValve.java:52)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1653)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
You can clean up these tmp files, These files are generated during Hadoop map reduce job executions and once a particular job is complete. it is safe to delete these files. So it should be safe to delete a temporary file if a file hasn't been touched or accessed for the past 24 hours. You might use a cron job to do this. See [1].
[1] http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3CABC24175AFD3BE4DA15F4CD375ED413D0639E23F45#hq-ex-mb02.ad.navteq.com%3E
Thanks,
Kasun.