Hyperledger Fabric: Peer nodes fail to restart with byfn script when machine is shut down while network is running - amazon-web-services

I have a hyperledger fabric network running on a single AWS instance using the default byfn script.
ERROR: Orderer, cli, CA docker containers show "Up" status. Peers show "Exited" status.
Error occurs when:
Byfn network is running, machine is rebooted (not in my control but because of some external reason).
Network is left running overnight without shutting the machine. Shows same status next morning.
Error shown:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b0523a7b1730 hyperledger/fabric-tools:latest "/bin/bash" 23 seconds ago Up 21 seconds cli
bfab227eb4df hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 23 seconds ago peer1.org1.example.com
6fd7e818fab3 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 19 seconds ago peer1.org2.example.com
1287b6d93a23 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 22 seconds ago peer0.org2.example.com
2684fc905258 hyperledger/fabric-orderer:latest "orderer" 28 seconds ago Up 26 seconds 0.0.0.0:7050->7050/tcp orderer.example.com
93d33b51d352 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 25 seconds ago peer0.org1.example.com
Attaching docker log: https://hastebin.com/ahuyihubup.cs
Only the peers fail to start up.
Steps I have tried to solve the issue:
docker start $(docker ps -aq) or manually, starting individual peers.
byfn down, generate and then up again. Shows the same result as above.
Rolled back to previous versions of fabric binaries. Same result on 1.1, 1.2 and 1.4. In older binaries, error is not repeated if network is left running overnight but repeats when machine is restarted.
Used older docker images such as 1.1 and 1.2.
Tried starting up only one peer, orderer and cli.
Changed network name and domain name.
Uninstalled docker, docker-compose and reinstalled.
Changed port numbers of all nodes.
Tried restarting without mounting any volumes.
The only thing that works is reformatting the AWS instance and reinstalling everything from scratch. Also, I am NOT using AWS blockchain template.
Any help would be appreciated. I have been stuck on this issue for a month now.

Error resolved by adding following lines to peer-base.yaml:
GODEBUG=netdns=go
dns_search: .
Thanks to #gari-singh for the answer:
https://stackoverflow.com/a/49649678/5248781

Related

Is docker swarm a container aware load balancer?

6 node docker swarm(cluster) - 3 mgrs, 3 workers
After running below command:
docker service create --name psight -p 8080:8080 --replicas 5 <image>
We see that, mgr3 does not run the task(shown below)
$ docker service ps psight1
ID NAME IMAGE NODE DESIRED_STATE CURRENT_STATE ERROR PORTS
yoj psight.1 image wrk2 Running Running 19 minutes ago
sjb psight.2 image wrk3 Running Running 19 minutes ago
vv6 psight.3 image mgr1 Running Running 19 minutes ago
scf psight.4 image mgr2 Running Running 19 minutes ago
7i2 psight.5 image wrk1 Running Running 19 minutes ago
but,
Can service be available from mgr3? with actual state(above)
As long as the mgr3 is reachable as a manager (ref. Monitor swarm health) then it should be able to perform the usual tasks of a manager.
If your instances are expose on the wide area network with a public IP, with ssh open to the world (e.g. 0.0.0.0/0, ::/0) and that you have you ssh key then you should be able to connect to the instance.

AWS EC2 instance randomly rebooting several times a day

I have a t2.nano instance that often reboots several times a day, as shown in the last reboot log:
reboot system boot 3.13.0-74-generi Tue Sep 12 17:26 - 19:15 (01:49)
reboot system boot 3.13.0-74-generi Tue Sep 12 13:58 - 19:15 (05:17)
reboot system boot 3.13.0-74-generi Tue Sep 12 11:13 - 19:15 (08:02)
reboot system boot 3.13.0-74-generi Tue Sep 12 00:48 - 19:15 (18:27)
reboot system boot 3.13.0-74-generi Fri Sep 1 23:48 - 19:15 (10+19:27)
As you can see, the server was up and running for 10 days, until it randomly reboots. It then reboots a total of 4 times over the next few hours.
There is nothing in /var/log/syslog at the time of reboot. Initially the instance is running a web server, but after the first reboot, the web server is not configured to start back up automatically. Therefore, nothing is running on my server, yet the instance still reboots several more times.
What's going on here? Is it likely that I'm being hacked or there's a problem with Amazon's servers?
Reboots look to be taking place at 19:15
Do you have any scripts or cronjobs running that could be playing a part in it?
Try This
https://status.aws.amazon.com/
Reboots should be expected, but no more frequently than you'd expect them with commodity hardware

Generating a REST API in Hyperledger Composer error

I'm following the tutorial to generate REST API whith Digital Land Title Network example. But I get the following error:
To restart the REST server using the same options, issue the following command:
composer-rest-server -p defaultProfile -n digitalproperty-network -i WebAppAdmin -s DJY27pEnl16d -N always
Discovering types from business network definition ...
Connection fails: Error: {"created":"#1494321356.313456565","description":"Failed parsing HTTP/2","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":2022,"grpc_status":14,"referenced_errors":[{"created":"#1494321356.313436962","description":"Expected SETTINGS frame as the first frame, got frame type 80","file":"../src/core/ext/transport/chttp2/transport/parsing.c","file_line":479}{"created":"#1494321356.313450563","description":"Trying to connect an http1.x server","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1995,"http_status":400}]}
It will be retried for the next request.
{ Error: {"created":"#1494321356.313456565","description":"Failed parsing HTTP/2","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":2022,"grpc_status":14,"referenced_errors":[{"created":"#1494321356.313436962","description":"Expected SETTINGS frame as the first frame, got frame type 80","file":"../src/core/ext/transport/chttp2/transport/parsing.c","file_line":479}{"created":"#1494321356.313450563","description":"Trying to connect an http1.x server","file":"../src/core/ext/transport/chttp2/transport/chttp2_transport.c","file_line":1995,"http_status":400}]}
at /usr/lib/node_modules/composer-rest-server/node_modules/grpc/src/node/src/client.js:417:17 code: 14, metadata: Metadata { _internal_repr: {} } }
It happent too when deploy a new network definition ...it seems that can`t comunicate with Hyperledger Fabric. But Fabric its running
calmadmin#localhost:~/composer-sample-applications-hlfv1/packages/getting-started$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
04d28aa6dcbc dev-peer1-digitalproperty-network-0.7.0 "chaincode -peer.addr" About an hour ago Up About an hour dev-peer1-digitalproperty-network-0.7.0
ebdaa8cb6e17 dev-peer0-digitalproperty-network-0.7.0 "chaincode -peer.addr" About an hour ago Up About an hour dev-peer0-digitalproperty-network-0.7.0
71d6fe2731a5 hyperledger/fabric-peer:x86_64-1.0.0-alpha "peer node start --pe" About an hour ago Up About an hour 0.0.0.0:7056->7051/tcp, 0.0.0.0:7058->7053/tcp peer1
24302fa77160 hyperledger/fabric-peer:x86_64-1.0.0-alpha "peer node start --pe" About an hour ago Up About an hour 0.0.0.0:7051->7051/tcp, 0.0.0.0:7053->7053/tcp peer0
fc0cb6a66977 hyperledger/fabric-ca:x86_64-1.0.0-alpha "sh -c 'fabric-ca-ser" About an hour ago Up About an hour 0.0.0.0:7054->7054/tcp ca_peerOrg1
0750ca58d06f hyperledger/fabric-orderer:x86_64-1.0.0-alpha "orderer" About an hour ago Up About an hour 0.0.0.0:7050->7050/tcp orderer0
Thanks!
You are running a HLF 1.0.0-alpha fabric, but when you started the rest server you specified defaultProfile which is a profile for hlf v0.6. When you deployed the digitalproperty-network you specified a profile hlfv1 (which is created for you when you follow the quickstart guide) which is the profile you need to use when you start the rest server.

Spark shuts down after 10 seconds of running

I'm trying to setup clusters in my AWS account (Amazon). I followed this tutorial to set it up. I've ran into some problems regarding ports but I finally got it to work until... it shut down after 10 seconds giving me no more than this error:
16/05/12 12:52:46 INFO client.AppClient$ClientActor: Connecting to master spark://ip-to-my-machine:7077...
16/05/12 12:53:06 INFO client.AppClient$ClientActor: Connecting to master spark://ip-to-my-machine:7077...
16/05/12 12:53:26 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
16/05/12 12:53:26 ERROR scheduler.TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
This was the bash I ran to make it work:
bin/spark-shell --master spark://ip-to-my-machine:7077
I opened the TCP port 7077, what seems to be the problem?

Cntlmd not starting under systemd on Centos 7.1

Had a weird error trying to start cntlmd on Centos 7.1.
systemctl start cntlmd` results in the following in the logs (and yes, becomming is exactly how it's spelt in the logs :)):
systemd: Started SYSV: Cntlm is meant to be given your proxy address and becoming
Weird thing is:
that it did run initially after installation.
The exact same config works perfectly on another machine (provisioned with Chef so 100% same config).
If I run it in the foreground it works but through systemd, not.
To "fix" it, I had to manually remove and reinstall, whereupon it worked again.
Anybody seen this error (Google reveals nothing) and know what's going on?
I realised that the /var/run/cntlm directory seemed to be "removed" after every boot. Turns out the /var/run/cntlm directory is never created by systemd-tmpfiles on boot (thanks to this SO answer), which then resulted in:
Feb 29 06:13:04 node01 cntlm: Using following NTLM hashes: NTLMv2(1) NT(0) LM(0)
Feb 29 06:13:04 node01 cntlm[10540]: Daemon ready
Feb 29 06:13:04 node01 cntlm[10540]: Changing uid:gid to 996:995 - Success
Feb 29 06:13:04 node01 cntlm[10540]: Error creating a new PID file
because cntlm couldn't write it's pid file because /var/run/cntlm didn't exist.
So to get systemd-tmpfiles to create the /var/run/cntlm directory on boot you need to add the following file in /usr/lib/tmpfiles.d/cntlm.conf:
d /run/cntlm 700 cntlm cntlm
Reboot and Bob's your uncle.