WSO2 MB Cluster Giving Connection reset by peer - wso2

Test cluster of two brokers, WKA membership scheme, PostgreSQL message store, working fine for a couple of days, then throwing following errors:
TID: [] [] [2016-07-19 12:09:24,738] ERROR {org.wso2.andes.server.protocol.MultiVersionProtocolEngine} - Error establishing session {org.wso2.andes.server.protocol.MultiVersionProtocolEngine}
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.mina.transport.socket.nio.SocketIoProcessor.read(SocketIoProcessor.java:218)
at org.apache.mina.transport.socket.nio.SocketIoProcessor.process(SocketIoProcessor.java:198)
at org.apache.mina.transport.socket.nio.SocketIoProcessor.access$400(SocketIoProcessor.java:45)
at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:485)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
at java.lang.Thread.run(Thread.java:745)
Startup of Message Broker looks fine, no errors, JDBC connection to PostgreSQL DB is ok, Registry mount looks ok. Then after that error appears in wso2carbon.log several times/minute.
Anyone any ideas? As far as I know nothing's changed and I don't know what it's trying to connect to.

This usually happens when client's whom connected to MB tries to create connections per message. jms is heavy connection and not recommended to create connections per each message. Therefore, please go through client implementation and verify connections are not created per message.
If by any chance you are using wso2 esb to publish/subscribe queues/topics to mb there is a property "transport.jms.CacheLevel" connection caching in esb axis2.xml.Read the documentation and use appropriate caching level for your usecase.
There was bug in connection caching property to be ignored in esb 4.8.1 which is currently fixed in 4.9.0 as well.
These are the possible cases I can think of with the given information. If you need more info please provide a detailed usecase.

Related

Target Connection is Stale ERROR in WSO2 EI 6.4.0

I'm trying to push incoming JSON payload to AWS SQS Queue in WSO2 EI 6.4.0. Facing Exception like java.io.IOException: Target Connection is stale intermittently.
We're unable to push payload to Queue.
Log:
[2022-08-27 03:08:49,801] [-1] [] [HTTPS-Sender I/O dispatcher-5] WARN {org.apache.synapse.transport.passthru.Targe
tHandler} - Connection closed by target host while sending the request Remote Address : proxy.abc.com/
10.0.x.x:3090
[2022-08-27 03:08:49,801] [-1234] [] [PassThroughMessageProcessor-29] ERROR {org.apache.synapse.transport.passthru.P
assThroughHttpSSLSender} - IO while building message
java.io.IOException: Target Connection is stale..
As per this wso2 link, do i need to disable this http.connection.stalecheck by making value as 1 in <ESB_Home>/repository/conf/nhttp.properties file?
Please suggest to resolve this issue.
The error java.io.IOException: Target Connection is stale is not the cause of your issue, it's just a by product of the actual issue. With the information provided I believe the actual issue is the following.
[2022-08-27 03:08:49,801] [-1] [] [HTTPS-Sender I/O dispatcher-5] WARN {org.apache.synapse.transport.passthru.Targe
tHandler} - Connection closed by target host while sending the request Remote Address : proxy.abc.com/
10.0.x.x:3090
Since the connection was closed by the target(Assume this is SQS) the collections are getting staled after some time. This is expected, so try to find out why the target is closing the connection. If you are going through a corporate Proxy, check the proxy first, then check SQS side to see whether there is any information useful to debug the issue.

Redislabs UI logging error when number of nodes more than one

I am new at Redis Enterprise and can't fix this problem:
I have a Redis Enterprise cluster (v.6.0) in AWS with two nodes. When I have only one node I can enter UI, but after adding other (second) nodes always throws me out to the login page after entering credentials. Meanwhile, the cluster works fine (information is taken from rladmin).
In what direction I should investigate the issue?
P.S.: Can this error from logs cause an issue?
ERROR redis_mgr MainThread: Connect failed: connect: connection failed: Error 2 connecting to unix socket: /var/opt/redislabs/run/ccs.sock. No such file or directory.: retrying
Possibly, this solution will help anybody:
the reason was that ALB before UI didn't use sticky sessions.
the solution was to enable a sticky session and it works.

Agent Unreachable after restore VMware Horizon Connection Server

After a power outage I got both of my View Connection Servers unbootable with BSOD and I could not recovery it and also I don't have backup of it.
After all steps below I could not get things fixed, all VMs are "Agent Unreachable"
Created a new VM for connection server (WITH THE SAME NAME AS THE OLD "VIEWCS01")
Before doing it correctly I connected the base disk, and not the last
snapshot, to the new VM and broke up the whole thing with the error
"file system specific implementation of ioctl [file] failed", I solved
this correcting the CID - https://kb.vmware.com/s/article/1007969
Installed Windows (Same version)
As in https://kb.vmware.com/s/article/76770:
Installed Connection Server (Same Version)
Restored LDF backup
Removed all View Connection Servers and all Security Servers (vdmadmin -S -s -r viewcs01, viewss01 and viewcs02)
Uninstalled the connection server
Reinstalled the connection server reusing the AD LDS
I did removed the "viewcs01" because with my previous tests I was not removing it, I think because of >this, after the recovery steps done no console was opening, In previous tests I also not using the old >machine name, instead of it I was using "viewcs03".
Ok, Console opened, I changed the vCenter credentials (Just putting password was not working with error - https://kb.vmware.com/s/article/60152 - Log below:
2020-12-31T18:23:48.108-02:00 ERROR (18F8-0D0C) <MessageFrameWorkDispatch> [ws_java_bridgeDLL] BCryptDecrypt FAILED, status={Data Error}
An error in reading or writing data occurred. (0xC000003E)
2020-12-31T18:23:48.109-02:00 ERROR (18F8-0C54) <VCHealthUpdate> [SecurityManagerUtil] decryptAsText: com.vmware.vdi.crypto.SecurityManagerException: decrypt: Cannot decrypt: Cipher scheme decryption failed.
2020-12-31T18:23:48.109-02:00 DEBUG (18F8-0C54) <VCHealthUpdate> [ServiceConnection25] Connecting instance VCHealth Test instance at URL https://vcenterd.DOMAIN.net:443/sdk
Corrected Composer credentials, and added license.
All machines are "Agent Unreachable" - Connection Server Log below:
2020-12-31T18:23:49.160-02:00 DEBUG (18F8-1A6C) <DesktopControlJMS> [DesktopTracker] CHANGEKEY message from agent/bda3fbe6-029c-41f8-b9f8-017af574f56b accepted as key and thumbprints match machine record
2020-12-31T18:23:49.162-02:00 DEBUG (18F8-1A6C) <DesktopControlJMS> [DesktopTracker] found broker thumbprints: 0f:9e:80:5d:f6:33:c7:1b:a2:d5:8c:9a:9f:12:45:16:0f:6f:c0:2b:46:8d:d0:33:62:87:53:a9:48:8d:57:8c#SHA_256;51:c5:d0:44:02:7f:ca:6d:5a:ad:5b:f6:8d:f5:11:23:e8:aa:e1:91:d0:5c:ff:71:3b:fb:e2:4b:f4:12:5e:d5#SHA_256
2020-12-31T18:23:49.162-02:00 WARN (18F8-1A6C) <DesktopControlJMS> [JMSMessageSecurity] Failed to sign message: Cannot sign message
2020-12-31T18:23:49.162-02:00 DEBUG (18F8-1A6C) <DesktopControlJMS> [DesktopTracker] CHANGEKEY message from agent/bda3fbe6-029c-41f8-b9f8-017af574f56b result: true (success)
Excerpt from VM agent log:
2020-12-31T19:53:44.322-03:00 DEBUG (1EDC-0FA8) <Thread-4> [AgentJmsConfig] Using paired signing key
2020-12-31T19:53:44.322-03:00 DEBUG (1EDC-0FA8) <Thread-4> [AgentMessageSecurityHandler] Configuring message security (ENHANCED).
2020-12-31T19:53:44.369-03:00 DEBUG (1EDC-0FA8) <Thread-4> [BrokerUpdateUtility] Published CHANGEKEY request
2020-12-31T19:53:59.386-03:00 DEBUG (1EDC-0FA8) <Thread-4> [BrokerUpdateUtility] Timeout waiting for success response
2020-12-31T19:59:33.944-03:00 DEBUG (1430-2558) <Thread-4> [JmsManager] Using connection broker viewcs01.DOMAIN.net
2020-12-31T19:59:33.944-03:00 DEBUG (1430-2494) <MessageFrameWorkDispatch> [MessageFrameWork] KeyVault service got operation=getEndEntityCertificates, ok=1, msecs=0
2020-12-31T19:59:33.944-03:00 DEBUG (1430-2494) <MessageFrameWorkDispatch> [MessageFrameWork] KeyVault service got operation=getEndEntityCertificates, ok=1, msecs=0
2020-12-31T19:59:33.975-03:00 DEBUG (1430-2558) <Thread-4> [JmsManager] username for swiftmq connection is: agent/90916ab8-704c-4fe3-a605-c4a7745b246e
2020-12-31T19:59:33.975-03:00 DEBUG (1430-2558) <Thread-4> [AgentJmsConfig] Skipping pair operation: already paired
2020-12-31T19:59:33.975-03:00 DEBUG (1430-2558) <Thread-4> [AgentMessageSecurityHandler] Configuring message security (ENHANCED).
2020-12-31T19:59:33.975-03:00 DEBUG (1430-2558) <Thread-4> [JmsManager] Re-connecting using secure port 4002
2020-12-31T19:59:34.381-03:00 DEBUG (1430-2780) <SwiftMQ-ConnectorPool-2> [AgentSSLSocketFactory] Received cert with subject cn=router/viewcs01
2020-12-31T19:59:34.381-03:00 WARN (1430-2780) <SwiftMQ-ConnectorPool-2> [AgentSSLSocketFactory] Certificate thumbprint verification failed, no matching thumbprint. Presented identity: router/viewcs01
2020-12-31T19:59:34.381-03:00 DEBUG (1430-2558) <Thread-4> [JmsManager] Unable to connect to JMS server viewcs01.DOMAIN.net com.vmware.vdi.logger.Logger.debug(Logger.java:44)
javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Unexpected certificate: router/viewcs01
2020-12-31T19:59:34.381-03:00 WARN (1430-2558) <Thread-4> [JmsManager] Unable to connect to any listed host. The agent will continue to retry: [viewcs02.DOMAIN.net, viewcs01.DOMAIN.net]
Reinstalled the agent and also tried the command below, as mentioned in https://kb.vmware.com/s/article/2038679, nothing has worked at all.
vdmadmin -A -d desktop-pool-name -m name-of-machine-in-pool -resetkey
Update to question:
Some piece (That I don't have now) of the log lead me to This KB, so I have uninstalled the Connection Server, remove all certificates and reinstalled it again, nothing changed.
After reading the following links [1], [2], [3]:
I've changed the security mode to Mixed, nothing changed.
But after change it to mixed and after reinstalling the Agent (I've reinstalled it before change to mixed and it didn't worked) from the VM have turned it to Available, I still not able to access machine with tunnel errors (Changed tunnel configurations also).
Updated the Connection Server to 7.13, stopped to open console.
Started the whole process from zero, now machine name is viewcs04, not worked also.
For any who encounter the same problem, I decided to create a new Connection Server and I will create manual pools, so people can work and I will migrate everyone to new linked-clone pools.
Just to mention, I cannot just create new pools, all pools are dedicated and have manually installed applications, printers etc.

Random “upstream connect error or disconnect/reset before headers” between services with Istio 1.3

So, this problem is happening randomly (it seems) and between different services.
For example we have a service A which needs to talk to service B, and some times we get this error, but after a while, the error goes away. And this error doesn't happen too often.
When this happens, we see the error log in service A throwing the “upstream connect error” message, but none in service B. So we think it might be related with the sidecars.
One thing we notice is that in service B, we get a lot of this error messages in the istio-proxy container:
[src/istio/mixerclient/report_batch.cc:109] Mixer Report failed with: UNAVAILABLE:upstream connect error or disconnect/reset before headers. reset reason: connection failure
And according to documentation when a request comes in, envoy asks Mixer if everything is good (authorization and other things), and if Mixer doesn’t reply, the request is not success. So that’s why exists an option called policyCheckFailOpen.
We have that in false, I guess is a sane default, we don’t want the request to go through if Mixer cannot be reached, but why can’t?
disablePolicyChecks: true
policyCheckFailOpen: false
controlPlaneSecurityEnabled: false
NOTE: istio-policy is running with the istio-proxy sidecar. Is that correct?
We don’t see that error in some other service which can also fail.
Another log that I can see a lot, and this one happens in all the services not running as root with fsGroup defined in the YAML files is:
watchFileEvents: "/etc/certs": MODIFY|ATTRIB
watchFileEvents: "/etc/certs/..2020_02_10_09_41_46.891624651": MODIFY|ATTRIB
watchFileEvents: notifying
One of the leads I'm chasing is about default circuitBreakers values. Could that be related with this?
Thanks
The error you are seeing is because of a failure to establish a connection to istio-policy
Based on this github issue
Community members add two answers here which could help you with your issue
If mTLS is enabled globally make sure you set controlPlaneSecurityEnabled: true
I was facing the same issue, then I read about protocol selection. I realised the name of the port in the service definition should start with for example http-. This fixed the issue for me. And . if you face the issue still you might need to look at the tls-check for the pods and resolve it using destinationrules and policies.
istio-policy is running with the istio-proxy sidecar. Is that correct?
Yes, I just checked it and it's with sidecar.
Let me know if that help.

Worker role using event hubs gives 'No connection handler was found for virtual host'

I have a worker role that uses an EventProcessorHost to ingest data from an EventHub. I frequently receive error messages of the following kind:
Microsoft.ServiceBus.Messaging.MessagingCommunicationException:
No connection handler was found for virtual host 'myservicebusnamespace.servicebus.windows.net:42777'. Remote container id is 'f37c72ee313c4d658588ad9855773e51'. TrackingId:1d200122575745cc89bb714ffd533b6d_B5_B5, SystemTracker:SharedConnectionListener, Timestamp:8/29/2016 6:13:45 AM
at Microsoft.ServiceBus.Common.ExceptionDispatcher.Throw(Exception exception)
at Microsoft.ServiceBus.Common.Parallel.TaskHelpers.EndAsyncResult(IAsyncResult asyncResult)
at Microsoft.ServiceBus.Messaging.IteratorAsyncResult`1.StepCallback(IAsyncResult result)
I can't seem to find a way to catch this exception. It seems I can just ignore the error because everything works as expected (I had previously mentioned here that it was dropping messages because of this error, but I have since found out that a bug in the software that sends the messages caused this problem), however I would like to know what causes these errors, since they are clogging up my logging now and then.
Can anyone shed some light on the cause?
The Event Hub partitions are distributed across multiple servers. They sometimes move due to load balancing, upgrade and other reasons. When this happens, the client connection is lost with this error. The connection will be reestablished very quickly so you should not see any issues with message processing. It is safe to ignore this communication error.