Target Connection is Stale ERROR in WSO2 EI 6.4.0 - wso2

I'm trying to push incoming JSON payload to AWS SQS Queue in WSO2 EI 6.4.0. Facing Exception like java.io.IOException: Target Connection is stale intermittently.
We're unable to push payload to Queue.
Log:
[2022-08-27 03:08:49,801] [-1] [] [HTTPS-Sender I/O dispatcher-5] WARN {org.apache.synapse.transport.passthru.Targe
tHandler} - Connection closed by target host while sending the request Remote Address : proxy.abc.com/
10.0.x.x:3090
[2022-08-27 03:08:49,801] [-1234] [] [PassThroughMessageProcessor-29] ERROR {org.apache.synapse.transport.passthru.P
assThroughHttpSSLSender} - IO while building message
java.io.IOException: Target Connection is stale..
As per this wso2 link, do i need to disable this http.connection.stalecheck by making value as 1 in <ESB_Home>/repository/conf/nhttp.properties file?
Please suggest to resolve this issue.

The error java.io.IOException: Target Connection is stale is not the cause of your issue, it's just a by product of the actual issue. With the information provided I believe the actual issue is the following.
[2022-08-27 03:08:49,801] [-1] [] [HTTPS-Sender I/O dispatcher-5] WARN {org.apache.synapse.transport.passthru.Targe
tHandler} - Connection closed by target host while sending the request Remote Address : proxy.abc.com/
10.0.x.x:3090
Since the connection was closed by the target(Assume this is SQS) the collections are getting staled after some time. This is expected, so try to find out why the target is closing the connection. If you are going through a corporate Proxy, check the proxy first, then check SQS side to see whether there is any information useful to debug the issue.

Related

Is there any way (like retry )to handle AWS SQS Sendmessage Failure scenario in WSO2 EI 6.4.0?

I am performing SendMessage operation in WSO2 EI 6.4.0 using AWS SQS Connector (V1.0.7).
Sometimes Message is not posted to AWS SQS Queue, got some ERROR/WARNING Message in Log mentioned below
ERROR Code from Log:
Error_code = 101506 or Error_code = 101508
Warning Message:
[HTTPS-Sender I/O dispatcher-2] WARN {org.apache.synapse.transport.passthru.Targe
tHandler} - Connection closed by target host before receiving the response Remote Address : host/ip
So whenever failure occurs, mediation will go to fault sequence , I'm just looking for some solution like retry .
Can i add some endpoint timeout error handling inside sendMessage template code and trying to rebuild the same?
Or else inside faulty sequence shall i perform same sendMessage Operation once again ?
Kindly let me know the feasible solution..
Did you try to use a Message Store and a Message Processor to implement a Guaranteed Delivery System? You have to publish the message to a Message Store. A Message Processor can try to post it to the SQS. If it fails, it will be added to another Failover Message Store. We can add the message to the original message store after some time with the help of another Message Processor. This way, it will keep on retrying until it succeeds.
https://docs.wso2.com/display/EI640/Guaranteed+Delivery+with+Failover+Message+Store+and+Scheduled+Failover+Message+Forwarding+Processor
If this solution is too complex, you can go with your second option where you call the sendMessage Operation inside the fault Sequence.

How does WSO2 APIM handle 40x http error codes returned by endpoint?

I'm working with WSO2 APIM 3.1.0 and some of my endpoints are getting constantly in SUSPENDED state.
The endpoint was built to return http code 400, 403, 404 as part of its business logic.
In the Advanced Configurations for a given endpoint, we might set the Error Code to move the endpoint into suspension or timeout state.
The error codes below are available for selection:
101000 Receiver input/output error sending
101001 Receiver input/output error receiving
101500 Sender input/output error sending
101501 Sender input/output error receiving
101503 Connection failed
101504 Connection timed out (no input was detected on this connection over the maximum period of inactivity)
101505 Connection closed
101506 NHTTP protocol violation
101507 Connection canceled
101508 Request to establish new connection timed out
101509 Send abort
101510 Response processing failed
Http codes, such as 400/403/404, returned by the endpoint are mapped to some of those WSO2 error codes ?
Yes, If you set suspension code, it will be internally mapped with the response code.

ERROR on API Manager 2.0.0 gateway worker on start-up

The following ERROR is logged on the gateway worker nodes on start-up.
2016-08-23 12:32:42,344 [-] [Timer-5] ERROR KeyTemplateRetriever Exception when retrieving throttling data from remote endpoint
Unexpected character (<) at position 0.
at org.json.simple.parser.Yylex.yylex(Unknown Source)
at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.wso2.carbon.apimgt.gateway.throttling.util.KeyTemplateRetriever.retrieveKeyTemplateData(KeyTemplateRetriever.java:100)
at org.wso2.carbon.apimgt.gateway.throttling.util.KeyTemplateRetriever.loadKeyTemplatesFromWebService(KeyTemplateRetriever.java:111)
at org.wso2.carbon.apimgt.gateway.throttling.util.KeyTemplateRetriever.run(KeyTemplateRetriever.java:54)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Despite the error the gateway worker nodes startup and the environment can be used to successfully invoke a sample API.
All the apim nodes, bar the traffic manager, however report these warnings
2016-08-22 16:40:56,652 [-] [Timer-5] WARN KeyTemplateRetriever Failed retrieving throttling data from remote endpoint: Connection refused. Retrying after 15 seconds...
2016-08-22 16:40:56,653 [-] [Timer-4] WARN BlockingConditionRetriever Failed retrieving Blocking Conditions from remote endpoint: Connection refused. Retrying after 15 seconds...
Environment:
APIM 2.0.0 cluster
publisher (default profile)
store (default profile)
gw manager and 2 gw workers (default profiles)
traffic manager (using traffic-manager profile)
Database: MariaDB Server, wsrep_25.10.r4144
Userstore : Read/write LDAP
JVM: java version "1.8.0_92"
OS: CentOS Linux release 7.0.1406 (Core)
n.b. key manager un-configured using default pack settings
If you disable Advanced Throttling in api-manager.xml like below, that error will go away. If you enable that, it requires a key manager node.
<EnableAdvanceThrottling>false</EnableAdvanceThrottling>
I encountered the issue recently and the issue was throttle#data#v1.war (repository/deployment/server/webapps/throttle#data#v1.war) hasn't been deployed at the time worker starts up.
If you have a distributed AM 2.0 deployment make sure Keymanager is up and throttle#data#v1.war is deployed in keymanager before worker srartup..

WSO2 MB Cluster Giving Connection reset by peer

Test cluster of two brokers, WKA membership scheme, PostgreSQL message store, working fine for a couple of days, then throwing following errors:
TID: [] [] [2016-07-19 12:09:24,738] ERROR {org.wso2.andes.server.protocol.MultiVersionProtocolEngine} - Error establishing session {org.wso2.andes.server.protocol.MultiVersionProtocolEngine}
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.mina.transport.socket.nio.SocketIoProcessor.read(SocketIoProcessor.java:218)
at org.apache.mina.transport.socket.nio.SocketIoProcessor.process(SocketIoProcessor.java:198)
at org.apache.mina.transport.socket.nio.SocketIoProcessor.access$400(SocketIoProcessor.java:45)
at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:485)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
at java.lang.Thread.run(Thread.java:745)
Startup of Message Broker looks fine, no errors, JDBC connection to PostgreSQL DB is ok, Registry mount looks ok. Then after that error appears in wso2carbon.log several times/minute.
Anyone any ideas? As far as I know nothing's changed and I don't know what it's trying to connect to.
This usually happens when client's whom connected to MB tries to create connections per message. jms is heavy connection and not recommended to create connections per each message. Therefore, please go through client implementation and verify connections are not created per message.
If by any chance you are using wso2 esb to publish/subscribe queues/topics to mb there is a property "transport.jms.CacheLevel" connection caching in esb axis2.xml.Read the documentation and use appropriate caching level for your usecase.
There was bug in connection caching property to be ignored in esb 4.8.1 which is currently fixed in 4.9.0 as well.
These are the possible cases I can think of with the given information. If you need more info please provide a detailed usecase.

Spark - Remote Akka Client Disassociated

I am setting up Spark 0.9 on AWS and am finding that when launching the interactive Pyspark shell, my executors / remote workers are first being registered:
14/07/08 22:48:05 INFO cluster.SparkDeploySchedulerBackend: Registered executor:
Actor[akka.tcp://sparkExecutor#ip-xx-xx-xxx-xxx.ec2.internal:54110/user/
Executor#-862786598] with ID 0
and then disassociated almost immediately, before I have the chance to run anything:
14/07/08 22:48:05 INFO cluster.SparkDeploySchedulerBackend: Executor 0 disconnected,
so removing it
14/07/08 22:48:05 ERROR scheduler.TaskSchedulerImpl: Lost an executor 0 (already
removed): remote Akka client disassociated
Any idea what might be wrong? I've tried adjusting the JVM options spark.akka.frameSize and spark.akka.timeout, but I'm pretty sure this is not the issue since (1) I'm not running anything to begin with, and (2) my executors are disconnecting a few seconds after startup, which is well within the default 100s timeout.
Thanks!
Jack
I had a very similar problem, if not the same.
It started to work for me once the workers were connecting to master by using the very same name as the master thought it had.
My log messages were something like:
ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker#idc1-hrm1.heylinux.com:7078] -> [akka.tcp://sparkMaster#vagrant-centos64.vagrantup.com:7077]: Error [Association failed with [akka.tcp://sparkMaster#vagrant-centos64.vagrantup.com:7077]].
ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker#192.168.121.127:7078] -> [akka.tcp://sparkMaster#idc1-hrm1.heylinux.com:7077]: Error [Association failed with [akka.tcp://sparkMaster#idc1-hrm1.heylinux.com:7077]]
WARN util.Utils: Your hostname, idc1-hrm1 resolves to a loopback address: 127.0.0.1; using 192.168.121.187 instead (on interface eth0)
So check the log of the master and see what name it thinks it has.
Then use that very same name on the workers.