Why is aws cli giving me a connection reset by peer? - amazon-web-services

Trying to use aws cli to push a docker container. Getting connection reset. Not sure why?
ecs-cli compose --project-name blah service up --create-log-groups --cluster-config blah --timeout 20
WARN[0000] Skipping unsupported YAML option for service... option name=expose service name=app
WARN[0000] Ignoring the ip address while transforming it to task definition container=app portMapping="0.0.0.0:8080:8080"
WARN[0000] Ignoring the ip address while transforming it to task definition container=app portMapping="0.0.0.0:8080:8080"
INFO[0000] Using ECS task definition TaskDefinition="blah:2"
WARN[0000] No log groups to create; no containers use 'awslogs'
INFO[0001] Updated ECS service successfully desiredCount=1 force-deployment=false service=blah
INFO[0047] (service blah) has started 1 tasks: (task d4d52496-057a-4b24-878a-4cf654085eff). timestamp="2019-05-24 18:15:02 +0000 UTC"
INFO[0125] (service blah) has started 1 tasks: (task 9484000f-4a4d-4b2d-8fdb-a9668012d6ae). timestamp="2019-05-24 18:16:23 +0000 UTC"
INFO[0202] (service blah) has started 1 tasks: (task f13dea68-e996-40ce-a44f-be266219b001). timestamp="2019-05-24 18:17:32 +0000 UTC"
ERRO[0392] Error describing service error="RequestError: send request failed\ncaused by: Post https://ecs.us-west-2.amazonaws.com/: read tcp 192.168.1.231:53094->52.119.169.134:443: read: connection reset by peer" service=blah
FATA[0392] RequestError: send request failed
caused by: Post https://ecs.us-west-2.amazonaws.com/: read tcp 192.168.1.231:53094->52.119.169.134:443: read: connection reset by peer

Related

AWS managed hyperledger fabric v1.4.7 blockchain - Getting bad certificate error when connecting to the fabric network

I have deployed a AWS managed Hyperledger Fabric v1.4.7 blockchain. The HLF blockchain network and the EC2 instance (hlf-client) are in the same VPC and everything seems to be working fine since I am able to invoke transactions using the cli container.
I have my client-app which is using fabric-sdk-go gateway API to connect to the fabric network using the connection-profile.yamlto invoke/query the blockchain. This client-app is running in a docker container on same EC2 instance as the cli container which has all the necessary security configuration. The client-app is unable to connect to the fabric network due to a bad certificate error
The error log on the client app is:
[fabsdk/util] 2021/11/02 09:55:17 UTC - lazyref.(*Reference).refreshValue -> WARN Error - initializer returned error: QueryBlockConfig failed: QueryBlockConfig failed: queryChaincode failed: Transaction processing for endorser [nd-cjfwwnimujabllevl6yitqqmxi.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.ap-southeast-1.amazonaws.com:30003]: Endorser Client Status Code: (2) CONNECTION_FAILED. Description: dialing connection on target [nd-cjfwwnimujabllevl6yitqqmxi.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.ap-southeast-1.amazonaws.com:30003]: connection is in TRANSIENT_FAILURE. Will retry again later
The corresponding peer log is:
[36m2021-11-02 10:07:17.789 UTC [grpc] handleRawConn -> DEBU 39501a[0m grpc: Server.Serve failed to complete security handshake from "10.0.2.131:39100": remote error: tls: bad certificate
[31m2021-11-02 10:10:17.809 UTC [core.comm] ServerHandshake -> ERRO 395322[0m TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=10.0.2.131:12696
While invoking transactions using the cli the same certificate files are used. Could anyone tell me what's wrong with my setup here or am I missing any other configuration?
I have generated the ccp (connection-profile.yaml) as below:
---
name: n-RH3K6KAHFND6BGTXXGRU7C3B5Q
version: 1.0.0
client:
organization: Org1
connection:
timeout:
peer:
endorser: "300"
channels:
mychannel:
peers:
nd-CJFWWNIMUJABLLEVL6YITQQMXI:
endorsingPeer: true
chaincodeQuery: true
ledgerQuery: true
eventSource: true
organizations:
Org1:
mspid: m-L3ASCXXBINCWRBTIRBGPP4BP7U
peers:
- nd-CJFWWNIMUJABLLEVL6YITQQMXI
certificateAuthorities:
- m-L3ASCXXBINCWRBTIRBGPP4BP7U
peers:
nd-CJFWWNIMUJABLLEVL6YITQQMXI:
url: grpcs://nd-cjfwwnimujabllevl6yitqqmxi.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.managedblockchain.us-east-1.amazonaws.com:30003
eventUrl: grpcs://nd-cjfwwnimujabllevl6yitqqmxi.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.managedblockchain.us-east-1.amazonaws.com:30004
grpcOptions:
ssl-target-name-override: nd-CJFWWNIMUJABLLEVL6YITQQMXI
tlsCACerts:
path: /home/ec2-user/managedblockchain-tls-chain.pem
certificateAuthorities:
m-L3ASCXXBINCWRBTIRBGPP4BP7U:
url: https://ca.m-l3ascxxbincwrbtirbgpp4bp7u.n-rh3k6kahfnd6bgtxxgru7c3b5q.managedblockchain.managedblockchain.us-east-1.amazonaws.com:30002
httpOptions:
verify: false
tlsCACerts:
path: /home/ec2-user/managedblockchain-tls-chain.pem
caName: m-L3ASCXXBINCWRBTIRBGPP4BP7U
The following solution applies to:
HLF v1.4.7 AWS Managed Blockchain
Fabric client [fabric-sdk-go v1.0.0] Gateway programming model
To resolve the issue just remove the grpcOptions stanza

istio upgrade from 1.4.6 -> 1.5.0 throws istiod erros : remote error: tls: error decrypting message

Just upgraded istio from 1.4.6 (helm) to istio 1.5.0 (istioctl) [Purged istio and installed from istioctl] but it appears the istiod logs keep throwing the following :
2020-03-16T18:25:45.209055Z info grpc: Server.Serve failed to complete security handshake from "10.150.56.111:56870": remote error: tls: error decrypting message
2020-03-16T18:25:46.792447Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.112:49162": remote error: tls: error decrypting message
2020-03-16T18:25:46.930483Z info grpc: Server.Serve failed to complete security handshake from "10.150.56.160:36878": remote error: tls: error decrypting message
2020-03-16T18:25:48.284122Z info grpc: Server.Serve failed to complete security handshake from "10.150.52.230:44758": remote error: tls: error decrypting message
2020-03-16T18:25:48.288180Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.149:56756": remote error: tls: error decrypting message
2020-03-16T18:25:49.108515Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.151:53970": remote error: tls: error decrypting message
2020-03-16T18:25:49.111874Z info Handling event update for pod contentgatewayaidest-7f4694d87-qmq8z in namespace djin-content -> 10.150.53.50
2020-03-16T18:25:49.519861Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.91:59510": remote error: tls: error decrypting message
2020-03-16T18:25:50.133664Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.203:59726": remote error: tls: error decrypting message
2020-03-16T18:25:50.331020Z info grpc: Server.Serve failed to complete security handshake from "10.150.57.195:59970": remote error: tls: error decrypting message
2020-03-16T18:25:52.110695Z info Handling event update for pod contentgateway-d74b44c7-dtdxs in namespace djin-content -> 10.150.56.215
2020-03-16T18:25:53.312761Z info Handling event update for pod dysonpriority-b6dbc589b-mk628 in namespace djin-content -> 10.150.52.91
2020-03-16T18:25:53.496524Z info grpc: Server.Serve failed to complete security handshake from "10.150.56.111:57276": remote error: tls: error decrypting message
This also leads to no sidecars successfully launching and failing with :
2020-03-16T18:32:17.265394Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 16 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-03-16T18:32:19.269334Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 16 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-03-16T18:32:21.265214Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 16 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-03-16T18:32:23.266159Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 16 successful, 0 rejected; lds updates: 0 successful,
Weirdly other clusters that I upgraded go through fine. Any idea where this error might be popping up from ? istioctl analyze works fine.
error goes away after killing the nodes (recreating) but istio-proxies still fail with :
info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 0 rejected
As far as I know since version 1.4.4 they add istioctl upgrade, which should be used when You want to upgrade istio from 1.4.x to 1.5.0.
The istioctl upgrade command performs an upgrade of Istio. Before performing the upgrade, it checks that the Istio installation meets the upgrade eligibility criteria. Also, it alerts the user if it detects any changes in the profile default values between Istio versions.
The upgrade command can also perform a downgrade of Istio.
See the istioctl upgrade reference for all the options provided by the istioctl upgrade command.
istioctl upgrade --help
The upgrade command checks for upgrade version eligibility and, if eligible, upgrades the Istio control plane components in-place. Warning: traffic may be disrupted during upgrade. Please ensure PodDisruptionBudgets are defined to maintain service continuity.
I made a test on gcp cluster with istio 1.4.6 installed with istioctl and then I used istioctl upgrade from version 1.5.0 and everything works fine.
kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
istio-ingressgateway-598796f4d9-lvzdb 1/1 Running 0 12m
istiod-7d9c7bdd6-mggx7 1/1 Running 0 12m
prometheus-b47d8c58c-7spq5 2/2 Running 0 12m
I checked the logs and made some simple examples and no errors occurs in istiod like in your example.
Upgrade prerequisites for istioctl upgrade
Ensure you meet these requirements before starting the upgrade process:
Istio version 1.4.4 or higher is installed.
Your Istio installation was installed using istioctl.
I assume because of the differences between 1.4.x and 1.5.0 there might be some issues when you want to use both of the installatio methods, helm and istioctl. The best option here would be to install istio 1.4.6 with istioctl and then upgrade it to 1.5.0.
I hope this answer your question. Let me know if you have any more questions.

CockroachDB on AWS EKS cluster - [n?] no stores bootstrapped

I am attempting to deploy CockroachDB:v2.1.6 to a new AWS EKS cluster. Everything is deployed successfully; statefulset, services, pv's & pvc's are created. The AWS EBS volumes are created successfully too.
The issue is the pods never get to a READY state.
pod/cockroachdb-0 0/1 Running 0 14m
pod/cockroachdb-1 0/1 Running 0 14m
pod/cockroachdb-2 0/1 Running 0 14m
If I 'describe' the pods I get the following:
Normal Pulled 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Container image "cockroachdb/cockroach:v2.1.6" already present on machine
Normal Created 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Created container cockroachdb
Normal Started 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Started container cockroachdb
Warning Unhealthy 1s (x8 over 36s) kubelet, ip-10-5-109-70.eu-central-1.compute.internal Readiness probe failed: HTTP probe failed with statuscode: 503
If I examine the logs of a pod I see this:
I200409 11:45:18.073666 14 server/server.go:1403 [n?] no stores bootstrapped and --join flag specified, awaiting init command.
W200409 11:45:18.076826 87 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host". Reconnecting...
W200409 11:45:18.076942 21 gossip/client.go:123 [n?] failed to start gossip client to cockroachdb-0.cockroachdb:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host"
I came across this comment from the CockroachDB forum (https://forum.cockroachlabs.com/t/http-probe-failed-with-statuscode-503/2043/6)
Both the cockroach_out.log and cockroach_output1.log files you sent me (corresponding to mycockroach-cockroachdb-0 and mycockroach-cockroachdb-2) print out no stores bootstrapped during startup and prefix all their log lines with n?, indicating that they haven’t been allocated a node ID. I’d say that they may have never been properly initialized as part of the cluster.
I have deleted everything including pv's, pvc's & AWS EBS volumes through the kubectl delete command and reapplied with the same issue.
Any thoughts would be very much appreciated. Thank you
I was not aware that you had to initialize the CockroachDB cluster after creating it. I did the following to resolve my issue:
kubectl exec -it cockroachdb-0 -n /bin/sh
/cockroach/cockroach init
See here for more details - https://www.cockroachlabs.com/docs/v19.2/cockroach-init.html
After this the pods started running correctly.

Multiple app instances on Cloud Foundry not shown in Netflix Hystrix dashboard

I have set up Netflix Eureka, Hystrix and Turbine on Cloud Foundry split in two apps:
A monitoring app "mrc-service" includes Eureka Server, Turbine and Hystrix Dashboard. The application.yml for this app looks like this:
---
spring:
profiles: cloud
eureka:
instance:
nonSecurePort: 80
hostname: ${vcap.application.uris[0]}
leaseRenewalIntervalInSeconds: 10
metadataMap:
instanceId: ${spring.application.name}:${vcap.application.instance_id:${spring.application.instance_id:${random.value}}}
client:
registerWithEureka: true
fetchRegistry: true
service-url:
defaultZone: https://mrc-service.myurl/eureka/
turbine:
aggregator:
clusterConfig: LOG-TEST
appConfig: log-test
The Hystrix stream producing app called "log-test" has multiple instances on Cloud Foundry. The app is an Eureka Client and exposes a Hystrix Stream using Spring Actuator. Here the application.yml for the app:
---
spring:
profiles: cloud
eureka:
instance:
nonSecurePort: 80
hostname: ${vcap.application.uris[0]}
metadataMap:
instanceId: ${spring.application.name}:${vcap.application.instance_id:${spring.application.instance_id:${random.value}}}
secure-port-enabled: true
client:
healthcheck:
enabled: true
service-url:
defaultZone: https://mrc-service.myurl/eureka/
The two instances of the log-test app register correctly with the Eureka server:
But when I start to monitor the turbine stream the Hystrix dashboard shows only one host (as indicated by the red arrow) instead of two:
The Turbine log retrieves both instances correctly, but then says that only one Host is up:
2017-08-23T10:12:10.764+02:00 [APP/PROC/WEB/0] [OUT] 2017-08-23 08:12:10.764 INFO 19 --- [ Timer-0] o.s.c.n.turbine.EurekaInstanceDiscovery : Fetching instances for app: log-test
2017-08-23T10:12:10.764+02:00 [APP/PROC/WEB/0] [OUT] 2017-08-23 08:12:10.764 INFO 19 --- [ Timer-0] o.s.c.n.turbine.EurekaInstanceDiscovery : Received instance list for app: log-test, size=2
2017-08-23T10:12:10.764+02:00 [APP/PROC/WEB/0] [OUT] 2017-08-23 08:12:10.763 INFO 19 --- [ Timer-0] o.s.c.n.t.CommonsInstanceDiscovery : Fetching instance list for apps: [log-test]
2017-08-23T10:12:10.764+02:00 [APP/PROC/WEB/0] [OUT] 2017-08-23 08:12:10.764 INFO 19 --- [ Timer-0] c.n.t.discovery.InstanceObservable : Retrieved hosts from InstanceDiscovery: 2
2017-08-23T10:12:10.765+02:00 [APP/PROC/WEB/0] [OUT] 2017-08-23 08:12:10.764 INFO 19 --- [ Timer-0] c.n.t.discovery.InstanceObservable : Found hosts that have been previously terminated: 0
2017-08-23T10:12:10.765+02:00 [APP/PROC/WEB/0] [OUT] 2017-08-23 08:12:10.764 DEBUG 19 --- [ Timer-0] c.n.t.discovery.InstanceObservable : Retrieved hosts from InstanceDiscovery: [StatsInstance [hostname=log-test.myurl:80, cluster: LOG-TEST, isUp: true, attrs={securePort=443, fusedHostPort=log-test.myurl:443, instanceId=log-test:97d83c44-8b9e-44c4-56b4-742cef7bada0, port=80}], StatsInstance [hostname=log-test.myurl:80, cluster: LOG-TEST, isUp: true, attrs={securePort=443, fusedHostPort=log-test.myurl:443, instanceId=log-test:3d8359e4-a5c1-4aa0-5109-5b49a77a1f6f, port=80}]]
2017-08-23T10:12:10.765+02:00 [APP/PROC/WEB/0] [OUT] 2017-08-23 08:12:10.764 INFO 19 --- [ Timer-0] c.n.t.discovery.InstanceObservable : Hosts up:1, hosts down: 0
So I wonder if Turbine actually aggregates the Hystrix streams of the two instances. Turbine would have to contact the instances e.g. using Cloud Foundry specific header parameters like X-CF-APP-INSTANCE. Not sure if this already this happens.
Is the described approach even feasible on Cloud Foundry or do I have to use Turbine Stream with RabbitMQ instead?
I got an official reply from the Spring Cloud Netflix Issue tracker: aggregation of Hystrix data from multiple app instances on Cloud Foundry requires Turbine Stream in combination with a broker (e.g. RabbitMQ).
To open Turbine in aggregate way, it's the same steps as Hystrix, but you should inform the cluster via Turbine: http://localhost:8989//turbine.stream?cluster=READ.
That will open the same screen that Hystrix, but if I have more services, they will appear in an aggregate way.

Not able to access HDFS

I installed cloudera vm and started trying some basic stuff. First I just wanted to ls the hdfs directoires. so I issued the below command.
[cloudera#quickstart ~]$ hadoop fs -ls /
ls: Failed on local exception: java.net.SocketException: Network is unreachable; Host Details : local host is: "quickstart.cloudera/10.0.2.15"; destination host is: "quickstart.cloudera":8020;
though ps -fu hdfs says both namenode and data node is running. I checked the status using the service command.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Thinking all the problems will be resolved if I restart all the services, I executed the below command.
[cloudera#quickstart conf]$ sudo /home/cloudera/cloudera-manager --express --force
[QuickStart] Shutting down CDH services via init scripts...
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager daemons...
[QuickStart] Waiting for Cloudera Manager API...
[QuickStart] Configuring deployment...
Submitted jobs: 92
[QuickStart] Deploying client configuration...
Submitted jobs: 93
[QuickStart] Starting Cloudera Management Service...
Submitted jobs: 101
[QuickStart] Enabling Cloudera Manager daemons on boot...
Now I thought all services will be up so again checked the status of namenode service. Again it came failed.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Now I decided to manually stop and start the namenode service. Again not much use.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode stop
no namenode to stop
Stopped Hadoop namenode: [ OK ]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Failed to start Hadoop namenode. Return value: 1 [FAILED]
I checked the file /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out . It just said below
log4j:ERROR Could not find value for key log4j.appender.RFA
log4j:ERROR Could not instantiate appender named "RFA".
I also checked /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-quickstart.cloudera.log.out . Found below when I searched for error. Can anyone please suggest me what is the best way to get the services back on track. Unfortunately I am not able to access cloudera manager from browser. Anything that I can do from command line?
2016-02-24 21:02:48,105 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[NAMENODE], CATEGORY=[LOG_MESSAGE], ROLE=[hdfs-NAMENODE], SEVERITY=[IMPORTANT], SERVICE=[hdfs], HOST_IDS=[quickstart.cloudera], SERVICE_TYPE=[HDFS], LOG_LEVEL=[WARN], HOSTS=[quickstart.cloudera], EVENTCODE=[EV_LOG_EVENT]}, content=Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!, timestamp=1456295437905} - 1 of 17 failure(s) in last 79302s
java.io.IOException: Error connecting to quickstart.cloudera/10.0.2.15:7184
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Network is unreachable
You can try this:
check witch process is using the port 7184 of namenode (i.e netstat linux command)
and kill that and then restart
Or
change you namenode port from conf and restart hadoop