connection was closed in ignite map reduce tasks - mapreduce

Environment:
ignite server:
centos6.5 with kernel 2.6.32-431.el6.x86_64
ignite version 1.9
hadoop version 2.6.2
3 server nodes with each having '-Xms16g -Xmx16g -server -XX:+AggressiveOpts -XX:MaxMetaspaceSize=256m' set when started
I run a map reduce test job with ignite map reduce. The job is simply getting the average number for each people. The data is like:
Jack 0.35
Tom 0.78
Lily 0.92
Jack 0.28
Tom 0.18
...
At first, I generated a data set of 100M lines. It's about 2.53GB. The job finished correctly in about 30s. Then I generated a data set of 1 Billion lines, about 25.3GB. The job always failed with exceptions. I tried several times but the same result.
The ignite server node threw exception below:
[15:06:56,804][ERROR][sys-#2740%null%][GridTcpRestProtocol] Failed to process client request [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]], msg=GridClientTaskRequest [taskName=o.a.i.i.processors.hadoop.proto.HadoopProtocolJobStatusTask, arg=HadoopProtocolTaskArguments []]]
class org.apache.ignite.IgniteCheckedException: Failed to send message (connection was closed): GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]]
at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7239)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:170)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:119)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1$1.apply(GridTcpRestNioListener.java:264)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1$1.apply(GridTcpRestNioListener.java:261)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:228)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:261)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:229)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.processors.rest.GridRestProcessor$2$1.apply(GridRestProcessor.java:158)
at org.apache.ignite.internal.processors.rest.GridRestProcessor$2$1.apply(GridRestProcessor.java:155)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.util.future.GridFutureChainListener.applyCallback(GridFutureChainListener.java:78)
at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:70)
at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:30)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler$2.apply(GridTaskCommandHandler.java:294)
at org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler$2.apply(GridTaskCommandHandler.java:257)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1579)
at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1547)
at org.apache.ignite.internal.processors.task.GridTaskWorker.reduce(GridTaskWorker.java:1157)
at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:942)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:996)
at org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1221)
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1222)
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:850)
at org.apache.ignite.internal.managers.communication.GridIoManager.access$2100(GridIoManager.java:108)
at org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:790)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to send message (connection was closed): GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]]
at org.apache.ignite.internal.util.nio.GridNioServer.send0(GridNioServer.java:554)
at org.apache.ignite.internal.util.nio.GridNioServer.send(GridNioServer.java:494)
at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionWrite(GridNioServer.java:3036)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:118)
at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionWrite(GridNioCodecFilter.java:94)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:118)
at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionWrite(GridNioFilterChain.java:264)
at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionWrite(GridNioFilterChain.java:189)
at org.apache.ignite.internal.util.nio.GridNioSessionImpl.send(GridNioSessionImpl.java:108)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:258)
... 40 more
The job client threw exception below:
java.io.IOException: Failed to get job status: job_1fbf9083-9a44-4be9-9199-695a97652dc2_0002
at org.apache.ignite.internal.processors.hadoop.impl.proto.HadoopClientProtocol.getJobStatus(HadoopClientProtocol.java:197)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:611)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1357)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1318)
at com.tscloud.sdk.test.ignite.MRTest.run(MRTest.java:81)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.tscloud.sdk.test.ignite.MRTest.main(MRTest.java:53)
Caused by: class org.apache.ignite.internal.client.impl.connection.GridClientConnectionResetException: Failed to perform request (connection failed): /172.31.68.204:11211
at org.apache.ignite.internal.client.impl.connection.GridClientConnection.getCloseReasonAsException(GridClientConnection.java:491)
at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.close(GridClientNioTcpConnection.java:339)
at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.close(GridClientNioTcpConnection.java:299)
at org.apache.ignite.internal.client.impl.connection.GridClientConnectionManagerAdapter$NioListener.onDisconnected(GridClientConnectionManagerAdapter.java:630)
at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionClosed(GridNioFilterChain.java:253)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:93)
at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionClosed(GridNioCodecFilter.java:70)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:93)
at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3005)
at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:147)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2306)
at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:929)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2026)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1863)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1568)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
The job configuration is below:
Configuration configuration = new Configuration();
configuration.set(MRConfig.FRAMEWORK_NAME, IgniteHadoopClientProtocolProvider.FRAMEWORK_NAME);
configuration.set(MRConfig.MASTER_ADDRESS, "172.31.68.202:11211");
configuration.set("fs.igfs.impl", "org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem");
configuration.set("fs.default.name", "igfs://igfs#172.31.68.202/");
I checked nodes status after the job failed using ignitevisorcmd.sh. All server nodes were OK, but there were sometime one of the node server was down. I did not know why it behaved like this.
Any help is appreciated.
Edit(2017-05-16):
I changed the hadoop core-site.xml and add hadoop.tmp.dir property as below
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/hadoop-2.6.2/tmp</value>
</property>
Then I reformatted hdfs and uploaded the 25.3GB data file. I run the test successfully. It turns out my hdfs has something wrong. Reformatting namenode solves the problem.
Before above steps, I tried checking the jvm heap usage by VisualVM.
One of the server node visualvm monitor snapshot

If your node was down temporarily, then it is reasonable that a job would fail-over to another node or fail altogether if there are no other nodes. I would check that your network was not reset or down. Also, you should check for the presence of any software firewalls between your nodes (it is best to disable operating system firewalls).

Related

Migrating data from Teradata to BigQuery

I'm running Teradata express on VMWare player. In order to migrate the data to BigQuery I'm trying to follow the official documentation capturing BigQuery transfer job.
https://cloud.google.com/bigquery-transfer/docs/teradata-migration
I have downloaded the required jars and trying to start the migration process alfer successful initialization. Providing the config file contents generated after successful initialization.
TDExpress1620_Sles11:~/Desktop # cat testpath/bq.config
{
"agent-id": "84f7cf04-2133-4c5a-ae43-dd22a30281ba",
"transfer-configuration": {
"project-id": "106730138445",
"location": "us",
"id": "5e1c7eda-0000-249c-aab6-94eb2c06245a"
},
"source-type": "teradata",
"console-log": false,
"silent": false,
"teradata-config": {
"connection": {
"host": "localhost"
},
"local-processing-space": "testpath",
"database-credentials-file-path": "",
"max-local-storage": "200GB",
"gcs-upload-chunk-size": "32MB",
"use-tpt": false,
"max-sessions": 0,
"spool-mode": "NoSpool",
"max-parallel-upload": 1,
"max-parallel-extract-threads": 1,
"session-charset": "UTF8",
"max-unload-file-size": "2GB"
}
After running the command to start the migration process, I come across below error.
TDExpress1620_Sles11:~/Desktop # java -cp terajdbc4.jar:mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --configuration-file=testpath/bq.config
Reading data from gs://data_transfer_agent/latest/version
WARNING: Failed to get the latest released agent version with error: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Exception in thread "main" com.google.api.gax.rpc.UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:69)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1349)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:398)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1024)
at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:670)
at com.google.common.util.concurrent.ForwardingListenableFuture.addListener(ForwardingListenableFuture.java:45)
at com.google.api.core.ApiFutureToListenableFuture.addListener(ApiFutureToListenableFuture.java:52)
at com.google.common.util.concurrent.Futures.addCallback(Futures.java:1330)
at com.google.api.core.ApiFutures.addCallback(ApiFutures.java:63)
at com.google.api.gax.grpc.GrpcExceptionCallable.futureCall(GrpcExceptionCallable.java:67)
at com.google.api.gax.rpc.AttemptCallable.call(AttemptCallable.java:86)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57)
at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)
at com.google.cloud.bigquery.datatransfer.v1.DataTransferServiceClient.getTransferConfig(DataTransferServiceClient.java:741)
at com.google.cloud.bigquery.datatransfer.v1.DataTransferServiceClient.getTransferConfig(DataTransferServiceClient.java:718)
at com.google.cloud.bigquery.dms.gcloud.TransferServiceClient.getTransferConfig(TransferServiceClient.java:62)
at com.google.cloud.bigquery.dms.gcloud.TransferServiceClient.getDestinationBucketName(TransferServiceClient.java:94)
at com.google.cloud.bigquery.dms.Agent.main(Agent.java:235)
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:490)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:399)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:500)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:65)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:592)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:508)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:632)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
... 7 more
Caused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: null: bigquerydatatransfer.googleapis.com/2404:6800:4009:810:0:0:0:200a:443
at io.grpc.netty.shaded.io.netty.channel.unix.Errors.throwConnectException(Errors.java:104)
at io.grpc.netty.shaded.io.netty.channel.unix.Socket.connect(Socket.java:257)
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel.doConnect0(AbstractEpollChannel.java:730)
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel.doConnect(AbstractEpollChannel.java:715)
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.connect(AbstractEpollChannel.java:557)
at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1340)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:532)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:517)
at io.grpc.netty.shaded.io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
at io.grpc.netty.shaded.io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:136)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:532)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:38)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:522)
at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
Caused by: java.net.NoRouteToHostException
... 19 more
Please let me know if somebody has successfully implemented the same or able to figure out the exact issue being faced by me.

Issues while configuring the API Subscription BPS WSO2

So, i've my WSO2 BPS 3.6.0 configured to support SSL and a custom hostname i.e. mydomain.domain.com:9445 etc. and i'm trying to implement the API Subscription Workflow by following this documentation.
Now i've performed the following steps:
set the offset of wso2 bps to 2 and it is running fine with port: 9445
edited the wsa:Address tag in bothSubscriptionService.epr and SubscriptionCallbackService.epr located in API-M_HOME/business-processes/epr
as the bps server had a customized hostname instead of localhost (not sure if performing this step was right)
SubscriptionService.epr
SubscriptionCallBackService.epr
copy-pasted the epr folder from API-M_HOME/business-processes/epr to BPS_HOME/repository/conf/epr
Added the required BPEL package and human task accordingly
Navigated to the carbon console from APIM and edited the workflow-extensions.xml, here's how it looks like
set the TaskCoordinationEnabled tag of b4p-cordination-config.xml to true located in BPS_Home\repository\conf
Consider OTHER required configurations:
At API Manager End:
site.json file located at APIM_Home\repository\deployment\server\jaggeryapps\admin\site\conf
{
"theme": {
"base": "wso2",
"subtheme": "modern"
},
"context": "/admin",
"request_url": "READ_FROM_REQUEST",
"tasksPerPage": 10,
"allowedPermission": "/permission/admin/manage/apim_admin",
"workflows": {
"workFlowServerURL": "https://mydomain.domain.com:9445/services/",
},
"ssoConfiguration": {
"enabled": "false",
"issuer": "API_WORKFLOW_ADMIN",
"identityProviderURL": "https://localhost:9443/samlsso",
"keyStorePassword": "",
"identityAlias": "",
"keyStoreName": "",
"verifyAssertionValidityPeriod": "true",
"audienceRestrictionsEnabled": "true",
"responseSigningEnabled": "true",
"assertionSigningEnabled": "true",
"assertionEncryptionEnabled": "false",
"idpInit" : "false",
"idpInitSSOURL" : "https://localhost:9443/samlsso?spEntityID=API_WORKFLOW_ADMIN",
"externalLogoutPage" : "https://localhost:9443/samlsso?slo=true"
},
"reverseProxy": {
"enabled": false,
// values true , false , "auto" - will look for X-Forwarded-* headers
"host": "sample.proxydomain.com",
// If reverse proxy do not have a domain name use IP
"context": ""
//"regContext":"" // Use only if different path is used for registry
}
}
the workflowconfiguration in api-manager.xml
At BPS end:
carbon.xml
Issue: Now whenever a user navigates to APIM Store and subscribes to any API, the subscription request is listed at the APIM Admin console. When i select APPROVE from the provided ddl and click on the COMPLETE button, the record vanishes. However, this is the error that i see at WSO2's CMD windows:
APIM's cmd window
[2017-11-09 00:13:17,022] INFO - TimeoutHandler This engine will
expire all cal lbacks after GLOBAL_TIMEOUT: 120 seconds, irrespective
of the timeout action, af ter the specified or optional timeout
[2017-11-09 00:13:17,164] ERROR - TargetHandler I/O error: Host name
verificatio n failed for host : localhost javax.net.ssl.SSLException:
Host name verification failed for host : localhost
at org.apache.synapse.transport.http.conn.ClientSSLSetupHandler.verify(C
lientSSLSetupHandler.java:171)
at org.apache.http.nio.reactor.ssl.SSLIOSession.doHandshake(SSLIOSession
.java:308)
at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady(SSLIOSes
sion.java:410)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(Abstra
ctIODispatch.java:119)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor
.java:159)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(Abstr
actIOReactor.java:338)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(Abst
ractIOReactor.java:316)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIO
Reactor.java:277)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.
java:105)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.
run(AbstractMultiworkerIOReactor.java:586)
at java.lang.Thread.run(Thread.java:745)
[2017-11-09 00:13:17,188] WARN - EndpointContext Endpoint : AnonymousEndpoint w
ith address
https://localhost:9443/store/site/blocks/workflow/workflow-listener/
ajax/workflow-listener.jag will be marked SUSPENDED as it failed
[2017-11-09 00:13:17,193] WARN - EndpointContext Suspending endpoint
: Anonymou sEndpoint with address
https://localhost:9443/store/site/blocks/workflow/workflo
w-listener/ajax/workflow-listener.jag - current suspend duration is :
30000ms - Next retry after : Thu Nov 09 00:13:47 EST 2017
[2017-11-0900:13:17,201] INFO - LogMediator STATUS = Executing default 'fault'
sequence, ERROR_CODE = 101500, ERROR_MESSAGE = Error in Sender
[2017-11-09 00:14:17,238] INFO - SourceHandler Writer null when
calling informW riterError [2017-11-09 00:14:17,238] WARN -
SourceHandler Connection time out after reques t is read:
http-incoming-1 Socket Timeout : 60000 Remote Address : /10.10.30.130
:49249
[2017-11-09 00:14:24,671] ERROR - AxisEngine The endpoint
reference (EPR) for th e Operation not found is
/services/WorkflowCallbackService and the WSA Action = null. If this
EPR was previously reachable, please contact the server administra
tor. org.apache.axis2.AxisFault: The endpoint reference (EPR) for the
Operation not f ound is /services/WorkflowCallbackService and the WSA
Action = null. If this EPR was previously reachable, please contact
the server administrator.
at org.apache.axis2.engine.DispatchPhase.checkPostConditions(DispatchPha
se.java:102)
at org.apache.axis2.engine.Phase.invoke(Phase.java:329)
at org.apache.axis2.engine.AxisEngine.invoke(AxisEngine.java:261)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:167)
at org.apache.synapse.transport.passthru.ServerWorker.processNonEntityEn
closingRESTHandler(ServerWorker.java:325)
at org.apache.synapse.transport.passthru.ServerWorker.run(ServerWorker.j
ava:158)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(Native
WorkerPool.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
at java.lang.Thread.run(Thread.java:745) [2017-11-09 00:14:24,673] ERROR - ServerWorker Error processing GET request for :
/services/WorkflowCallbackService org.apache.axis2.AxisFault: The
endpoint reference (EPR) for the Operation not f ound is
/services/WorkflowCallbackService and the WSA Action = null. If this
EPR was previously reachable, please contact the server
administrator.
at org.apache.axis2.engine.DispatchPhase.checkPostConditions(DispatchPha
se.java:102)
at org.apache.axis2.engine.Phase.invoke(Phase.java:329)
at org.apache.axis2.engine.AxisEngine.invoke(AxisEngine.java:261)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:167)
at org.apache.synapse.transport.passthru.ServerWorker.processNonEntityEn
closingRESTHandler(ServerWorker.java:325)
at org.apache.synapse.transport.passthru.ServerWorker.run(ServerWorker.j
ava:158)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(Native
WorkerPool.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
at java.lang.Thread.run(Thread.java:745)
BPS's cmd window:
[2017-11-09 00:14:16,738] ERROR {org.wso2.carbon.bpel.core.ode.integration.Partn erService} - Error
sending message to Axis2 for ODE mex {PartnerRoleMex#hqejbhc
nphrcr2a32g83oh [PID
{http://workflow.subscription.apimgt.carbon.wso2.org}Subscr
iptionApprovalWorkFlowProcess-1] calling
org.apache.ode.bpel.epr.WSAEndpoint#705 fc38f.resumeEvent(...) Status
REQUEST} org.apache.axis2.AxisFault: Read timed out
at org.apache.axis2.AxisFault.makeFault(AxisFault.java:430)
at org.apache.axis2.transport.http.HTTPSender.sendViaPost(HTTPSender.jav
a:199)
at org.apache.axis2.transport.http.HTTPSender.send(HTTPSender.java:77)
at org.apache.axis2.transport.http.CommonsHTTPTransportSender.writeMessa
geWithCommons(CommonsHTTPTransportSender.java:451)
at org.apache.axis2.transport.http.CommonsHTTPTransportSender.invoke(Com
monsHTTPTransportSender.java:278)
at org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:442)
at org.apache.axis2.description.OutOnlyAxisOperationClient.executeImpl(O
utOnlyAxisOperation.java:297)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:
149)
at org.wso2.carbon.bpel.core.ode.integration.utils.AxisServiceUtils.invo
keService(AxisServiceUtils.java:323)
at org.wso2.carbon.bpel.core.ode.integration.PartnerService.invoke(Partn
erService.java:333)
at org.wso2.carbon.bpel.core.ode.integration.BPELMessageExchangeContextI
mpl.invokePartner(BPELMessageExchangeContextImpl.java:43)
at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.invoke(BpelRuntimeC
ontextImpl.java:897)
at org.apache.ode.bpel.runtime.INVOKE.run(INVOKE.java:130)
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.run(JacobVPU.java:4
51)
at org.apache.ode.jacob.vpu.JacobVPU.execute(JacobVPU.java:139)
at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.execute(BpelRuntime
ContextImpl.java:1002)
at org.apache.ode.bpel.engine.PartnerLinkMyRoleImpl.invokeInstance(Partn
erLinkMyRoleImpl.java:250)
at org.apache.ode.bpel.engine.BpelProcess$1.invoke(BpelProcess.java:288)
at org.apache.ode.bpel.engine.BpelProcess.invokeProcess(BpelProcess.java
:224)
at org.apache.ode.bpel.engine.BpelProcess.invokeProcess(BpelProcess.java
:279)
at org.apache.ode.bpel.engine.BpelProcess.handleJobDetails(BpelProcess.j
ava:434)
at org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineIm
pl.java:558)
at org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerIm
pl.java:467)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleS
cheduler.java:633)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleS
cheduler.java:627)
at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(Simpl
eScheduler.java:298)
at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(Simpl
eScheduler.java:253)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleSch
eduler.java:627)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleSch
eduler.java:611)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:918)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:
78)
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106
)
at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.
java:1116)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Http
ConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMetho
dBase.java:1973)
at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodB
ase.java:1735)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.j
ava:1098)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Htt
pMethodDirector.java:398)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMe
thodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav
a:397)
at org.apache.axis2.transport.http.AbstractHTTPSender.executeMethod(Abst
ractHTTPSender.java:659)
at org.apache.axis2.transport.http.HTTPSender.sendViaPost(HTTPSender.jav
a:195)
... 34 more
What could be the issue here? Any idea? do let me know. Thanks
Note that the bps workflow for API STATE CHANGE works just fine with the same configurations
Please note, that you are using calls with HTTPS with specific domain names
Host name verification failed for host : localhost at org.apache.synapse.transport.http.conn.ClientSSLSetupHandler.verify(ClientSSLSetupHandler.java:171) at org.apache.http.nio.reactor.ssl.SSLIOSession.doHandshake(SSLIOSession .java:308)
the certificate provided is CN=localhost, so indeed the host verification fails
what you can do about it
simplest way is switching to http when on secure network (behind firewall, vpn, ..)
update SSL certificates of BPS and APIM to match their hostnames and they have to trust each others certificate (or certificate issuer)
disable SSL hostname validation in axis2.xml (I do not recommend it, good for DEV, VERY BAD for PROD) - set <parameter name="HostnameVerifier">AllowAll</parameter>

Runtime exception while executing sqoop job

I am trying to execute sqoop job in biginsights. I am importing data from oracle db to hdfs. Below is the sqoop command which starts executing mapper and stops after sometime.
sudo sqoop import --connect "jdbc:oracle:thin:#(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = ***.**.**.***)(PORT = 1908)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = GPRSDB1) ) )" --username **** --password ***** --query "select distinct pr_con_id from s_org_ext where \$CONDITIONS" --where "PAR_DIVN_ID ='1-1POG9V' AND rownum< 100" --target-dir /user/bigsql/crm/s_contact_mh --fields-terminated-by ',' --m 1
Below is the error :
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLException: The Network Adapter could not establish the connection
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.sql.SQLException: The Network Adapter could not establish the connection
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)
... 9 more
Caused by: java.sql.SQLException: The Network Adapter could not establish the connection
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:199)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:480)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:413)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:508)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:203)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:33)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:510)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:213)
... 10 more
Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:328)
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:421)
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:630)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:206)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:966)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:292)
... 18 more
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.<init>(Socket.java:425)
at java.net.Socket.<init>(Socket.java:208)
at oracle.net.nt.TcpNTAdapter.connect(TcpNTAdapter.java:127)
at oracle.net.nt.ConnOption.connect(ConnOption.java:126)
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:306)
... 23 more
Please help me out to solve this issue. Thanks in advance.
SQOOP Eval function just acts on the RDBMS database and returns you the resultset. Here hadoop does not come into picture.
SQOOP Import function tries to import the data from RDBMS and load it into HDFS. For Sqoop Import to work as desired the db server should be reachable from all nodes in the cluster.

wso2 bam2.4 connect to external cassandra failed by .AuthenticationException

I think I have followed all the examples what I can find,
and I am working on bam2.4.1, the apache cassandra is 2.1.2.
I have configured casandra to disable security by setting cassandra.yaml
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
Then I follow bam2.4.1 document to connect to external cassandra, then start wso2 bam by
wso2server.bat -Ddisable.cassandra.server.startup=true
All most everything works except authentication
[2014-11-25 20:39:41,295] ERROR {org.wso2.carbon.bam.notification.task.Notificat
ionDispatchTask} - Error executing notification dispatch task: Access denied fo
r user wso2bam to login TCP,192.168.1.7:7611,TCP,192.168.1.7:7711
org.wso2.carbon.databridge.commons.exception.AuthenticationException: Access den
ied for user wso2bam to login TCP,192.168.1.7:7611,TCP,192.168.1.7:7711
at org.wso2.carbon.databridge.agent.thrift.internal.publisher.authentica
tor.AgentAuthenticator.connect(AgentAuthenticator.java:54)
at org.wso2.carbon.databridge.agent.thrift.DataPublisher.start(DataPubli
sher.java:273)
at org.wso2.carbon.databridge.agent.thrift.DataPublisher.<init>(DataPubl
isher.java:211)
at org.wso2.carbon.bam.notification.task.NotificationDispatchTask.initPu
blisherKS(NotificationDispatchTask.java:100)
at org.wso2.carbon.bam.notification.task.NotificationDispatchTask.execut
e(NotificationDispatchTask.java:210)
at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuar
tzJobAdapter.java:67)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:47
1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.wso2.carbon.databridge.agent.thrift.exception.AgentAuthenticatorE
xception: Thrift exception
at org.wso2.carbon.databridge.agent.thrift.internal.publisher.authentica
tor.ThriftAgentAuthenticator.connect(ThriftAgentAuthenticator.java:51)
at org.wso2.carbon.databridge.agent.thrift.internal.publisher.authentica
tor.AgentAuthenticator.connect(AgentAuthenticator.java:51)
... 12 more
I found some more exception like
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketExcep
tion: Connection closed by remote host
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTranspo
rt.java:147)
at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.j
ava:163)
at org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryP
rotocol.java:91)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
at org.wso2.carbon.databridge.commons.thrift.service.secure.ThriftSecure
EventTransmissionService$Client.send_connect(ThriftSecureEventTransmissionServic
e.java:82)
at org.wso2.carbon.databridge.commons.thrift.service.secure.ThriftSecure
EventTransmissionService$Client.connect(ThriftSecureEventTransmissionService.jav
a:73)
at org.wso2.carbon.databridge.agent.thrift.internal.publisher.authentica
tor.ThriftAgentAuthenticator.connect(ThriftAgentAuthenticator.java:47)
... 13 more
Caused by: java.net.SocketException: Connection closed by remote host
at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1506)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:70)
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTranspo
rt.java:145)
... 19 more

java.io.IOException: Broken pipe on increasing number of mappers/reducers, a lot

I am running MapReduce job on a hadoop cluster of 6 nodes with 4 map tasks and 10 reduce tasks configured.
Mapper/Reducer fails a lot on increasing number of map/reduce tasks as below,
I encounter the following error:
stderr logs
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
and this:
syslog logs
2014-03-01 15:11:30,118 WARN org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:569)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2014-03-01 15:11:30,118 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2014-03-01 15:11:30,121 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-03-01 15:11:30,146 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2014-03-01 15:11:30,146 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hduser for UID 1001 from the native implementation
2014-03-01 15:11:30,147 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2014-03-01 15:11:30,149 WARN org.apache.hadoop.mapred.Task: Parent died. Exiting attempt_201402281751_0042_r_000004_0
2014-03-01 15:11:31,252 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=983976/1957694
Even the simplest program is creating this problem:
mapper.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
if line:
print "%s\n%s"%(line, line)
reducer.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
if line:
print "%s"%(line)
I am using the following command to run hadoop.
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.3.jar -D mapred.reduce.tasks=10 -file /home/hduser/code/K1D/code1/mapper2.py -mapper mapper2.py -file /home/hduser/code/K1D/code1/reducer2.py -reducer reducer2.py -input /user/hduser/data-out/part-00000 -output /user/hduser/data-out1 -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
Can you suggest anything?
I ran into something similar and realized that I didn't have the exact version of python installed into one of my data nodes. I had
#!/usr/bin/env python
which I changed to
#!/usr/bin/env python2.7