SWF workflow causing START_TO_CLOSE for selected workflow - amazon-web-services

I am using SWF for one of my application where I used ExponentialRetry to poll continuously for an activity result. However sometime the activity caused START_TO_CLOSE timeout. This does not happen for all my workflows and happens 1 in 30 there by making it difficult to debug/reproduce. From the deciders logs i can see below. Could someone explain what the issue might be?
com.amazon.metrics.swf.AwsSwfMetricsRequestHandler: An error was thrown in previous decision in the thread <SWF Decider AWSFuncTaskList_110.0 7>, task-token <AAAAKgAAAAIAAAAAAAAAAgwtVps42342343434343A3hvItwKY0Vav/Kpexk2cat5fsWkiN1SxhfeoRwVgl+F2/EZQrhBP4RoA41LmLLC77WLU26uSMXaVnl+Cz64x+RZP0sBzofJWdAOdiwHAzsePFNQETXfyl+HibRiYxxO4Xyxn8ndVQ50f97W3IKkwrO7mySJSXbpe6Yaw/AiPmi4f6VoqQo/+nhRSEbzQpKNQeZAaCcAB/6oxEKOgYbW75AF9JsPbZEOdYE7Kq2JVjyghP2id9xAGKgj3ww3d1UBoRFxlulSUsNJmlpgR2+HPyWDHZKF7ECw==>, workflow-execution <RiskAnalysis-313434142331#22NB47i321UtA7w9dPnUTmmtKMeIP1DWrepdAJb0WdGqc=>, domain <Prod>, workflow-type <RiskAnalysisWF#1.7>
at com.amazon.metrics.swf.DecisionsMetricsExtractor.internalHandlePollForDecisionTask(DecisionsMetricsExtractor.java:183)
at com.amazon.metrics.swf.DecisionsMetricsExtractor.handlePollForDecisionTask(DecisionsMetricsExtractor.java:168)
at com.amazon.metrics.swf.AwsSwfMetricsRequestHandler.handlePollForDecisionTask(AwsSwfMetricsRequestHandler.java:508)
at com.amazon.metrics.swf.AwsSwfMetricsRequestHandler.extractMetrics(AwsSwfMetricsRequestHandler.java:362)
at com.amazon.metrics.sdk.AwsSdkMetricsRequestHandler.handleCall(AwsSdkMetricsRequestHandler.java:218)
at com.amazon.metrics.sdk.AwsSdkMetricsRequestHandler.afterResponse(AwsSdkMetricsRequestHandler.java:196)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.afterResponse(AmazonHttpClient.java:975)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:746)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.simpleworkflow.AmazonSimpleWorkflowClient.doInvoke(AmazonSimpleWorkflowClient.java:3390)
at com.amazonaws.services.simpleworkflow.AmazonSimpleWorkflowClient.invoke(AmazonSimpleWorkflowClient.java:3366)
at com.amazonaws.services.simpleworkflow.AmazonSimpleWorkflowClient.executePollForDecisionTask(AmazonSimpleWorkflowClient.java:2112)
at com.amazonaws.services.simpleworkflow.AmazonSimpleWorkflowClient.pollForDecisionTask(AmazonSimpleWorkflowClient.java:2088)
at com.amazonaws.services.simpleworkflow.flow.worker.DecisionTaskPoller.poll(DecisionTaskPoller.java:191)
at com.amazonaws.services.simpleworkflow.flow.worker.DecisionTaskPoller.access$000(DecisionTaskPoller.java:39)
at com.amazonaws.services.simpleworkflow.flow.worker.DecisionTaskPoller$DecisionTaskIterator.next(DecisionTaskPoller.java:71)
at com.amazonaws.services.simpleworkflow.flow.worker.DecisionTaskPoller$DecisionTaskIterator.next(DecisionTaskPoller.java:45)
at com.amazonaws.services.simpleworkflow.flow.worker.HistoryHelper$EventsIterator.<init>(HistoryHelper.java:269)
at com.amazonaws.services.simpleworkflow.flow.worker.HistoryHelper$SingleDecisionEventsIterator.<init>(HistoryHelper.java:74)
at com.amazonaws.services.simpleworkflow.flow.worker.HistoryHelper.<init>(HistoryHelper.java:318)
at com.amazonaws.services.simpleworkflow.flow.worker.AsyncDecisionTaskHandler.handleDecisionTask(AsyncDecisionTaskHandler.java:73)
at com.amazonaws.services.simpleworkflow.flow.worker.DecisionTaskPoller.pollAndProcessSingleTask(DecisionTaskPoller.java:223)
at com.amazonaws.services.simpleworkflow.flow.worker.GenericWorker$PollServiceTask.run(GenericWorker.java:85)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

All activities have a startToCloseTimeout when they are created (by the Decider). If not explicitly specified, they use the default from the versioned ActivityType that is defined in SWF for that activity. You are hitting the timeout because your retries are allowing you to go past that configured timeout. If you think you need more time, then when the activity is created, you will need to specify the startToCloseTimeout as a larger value in the scheduleActivityTaskDecision.

Related

pipeline-aws-plugin of cfnUpdate trows WaiterUnrecoverableException

When trying to run cfnUpdate of pipeline-aws-plugin I get WaiterUnrecoverableException, but when creating the stack by the Amazon console it is created without problem
Details:
Version Pipeline: AWS Steps 1.27
I'm trying to execute:
cfnUpdate(stack:"${stack}", url:"${urlTemplate}", params: 'roleName':"${roleName}",'bucket':"${bucket}",'pathS3':"${pathS3}",'handler':"${handler}"],timeoutInMinutes:10)
Where
${stack} is a number of the stack
${urlTemplate} is a link to a template saved at S3
And throws in the Jenkins log:
com.amazonaws.waiters.WaiterUnrecoverableException: Resource never entered the desired state as it failed.
at com.amazonaws.waiters.WaiterExecution.pollResource(WaiterExecution.java:78)
at com.amazonaws.waiters.WaiterImpl.run(WaiterImpl.java:88)
at com.amazonaws.waiters.WaiterImpl$1.call(WaiterImpl.java:110)
at com.amazonaws.waiters.WaiterImpl$1.call(WaiterImpl.java:106)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused: java.util.concurrent.ExecutionException
at org.apache.http.concurrent.BasicFuture.getResult(BasicFuture.java:71)
at org.apache.http.concurrent.BasicFuture.get(BasicFuture.java:84)
at de.taimos.pipeline.aws.cloudformation.EventPrinter.waitAndPrintEvents(EventPrinter.java:135)
at de.taimos.pipeline.aws.cloudformation.EventPrinter.waitAndPrintStackEvents(EventPrinter.java:92)
at de.taimos.pipeline.aws.cloudformation.CloudFormationStack.create(CloudFormationStack.java:119)
at de.taimos.pipeline.aws.cloudformation.CFNUpdateStep$Execution.whenStackMissing(CFNUpdateStep.java:125)
at de.taimos.pipeline.aws.cloudformation.AbstractCFNCreateStep$Execution$1.run(AbstractCFNCreateStep.java:137)
As a reference, my template is similar to:
Cloudformation Template
Maybe someone can help me with this or recommend me some adjustment?
regards
You are closing the "params" array, but you are not opening it. Try this:
cfnUpdate(stack:"${stack}", url:"${urlTemplate}", params: ['roleName':"${roleName}",'bucket':"${bucket}",'pathS3':"${pathS3}",'handler':"${handler}"],timeoutInMinutes:10)

Unable to resolve akka.pattern.AskTimeoutException: Ask timed out

I am trying to replace the default LevelDB in OpenDaylight with Apache Ignite which i am unable to do after making changes to the akka.conf file and deploying the akka-persistence-ignite jar that i found here. https://github.com/Romeh/akka-persistance-ignite
I am facing an issue in the following line of the source code (AbstractDataStoreClientActor class) where it throws a Runtime Exception.
private static final Function1<ActorRef, ?> GET_CLIENT_FACTORY = ExplicitAsk.toScala(GetClientRequest::new);
#SuppressWarnings("checkstyle:IllegalCatch")
public static DataStoreClient getDistributedDataStoreClient(#Nonnull final ActorRef actor,
final long timeout, final TimeUnit unit) {
return (DataStoreClient) Await.result(ExplicitAsk.ask(actor, GET_CLIENT_FACTORY,
Timeout.apply(timeout, unit)), Duration.Inf());
which gives the following error
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://opendaylight-cluster-data/user/$a#-809157907]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.databroker.actors.dds.GetClientRequest".
My question is how can i know the behavior of the actor to which the above message is sent? Is there any way to check if the actor has been created properly? What could be the reason for which the Ask method is going to timeout?
EDIT:::: error stack trace from karaf.log
2018-07-12T11:27:01,755 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-18 | DistributedDataStoreClientActor | 90 - com.typesafe.akka.slf4j - 2.5.11 | Persistence failure when replaying events for persistenceId [member-1-frontend-datastore-config]. Last known sequence number [0]
java.lang.NullPointerException: null
at akka.japi.Util$.option(JavaAPI.scala:271) ~[84:com.typesafe.akka.actor:2.5.11]
at akka.persistence.snapshot.japi.SnapshotStore.$anonfun$loadAsync$1(SnapshotStore.scala:20) ~[87:com.typesafe.akka.persistence:2.5.11]
at scala.util.Success.$anonfun$map$1(Try.scala:251) ~[323:org.scala-lang.scala-library:2.12.5.v20180316-130912-VFINAL-30a1428]
at scala.util.Success.map(Try.scala:209) ~[323:org.scala-lang.scala-library:2.12.5.v20180316-130912-VFINAL-30a1428]
at scala.concurrent.Future.$anonfun$map$1(Future.scala:288) ~[323:org.scala-lang.scala-library:2.12.5.v20180316-130912-VFINAL-30a1428]
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29) ~[323:org.scala-lang.scala-library:2.12.5.v20180316-130912-VFINAL-30a1428]
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) ~[323:org.scala-lang.scala-library:2.12.5.v20180316-130912-VFINAL-30a1428]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) ~[323:org.scala-lang.scala-library:2.12.5.v20180316-130912-VFINAL-30a1428]
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) ~[84:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91) ~[84:com.typesafe.akka.actor:2.5.11]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) [323:org.scala-lang.scala-library:2.12.5.v20180316-130912-VFINAL-30a1428]
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) [323:org.scala-lang.scala-library:2.12.5.v20180316-130912-VFINAL-30a1428]
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91) [84:com.typesafe.akka.actor:2.5.11]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) [84:com.typesafe.akka.actor:2.5.11]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
at java.lang.Thread.run(Thread.java:748) [?:?]
The issue is not with DistributedDatastoreClientActor - it is a side-effect of an issue with the persistence backend - see my previous comment. Notice that the error stack trace contains an NPE emanating from akka.persistence.snapshot.japi.SnapshotStore which indicates the backing SnapshotStore unexpectedly returned null from loadAsync. This points to the ignite plugin.

"Response is committed, can't handle exception" Error When sending the same Rest API request twice

I am using teiid Virtual Procedure to create a Rest API and expose my data. I have enabled result set caching using Cache Hints. When I send the same API request twice I get no data in the second attempt and teiid console logs the bellow exception. However when the caching is disabled or if I send the second request after waiting till the cache get invalidated (after ttl time) requests are executed properly and I get the relevant response. And another important observation that i made is that when the response size is limited to be less than some size (eg. using LIMIT clause to limit response size to 10 records), the requests are served properly with caching enabled. This happens only when I increase the records size after a particular size (in my case 15).
Can I know the reason behind this and any fixes or workarounds so I can continue to use result set caching without having this issue.
05:04:52,909 ERROR [io.undertow.request] (default task-20) UT005023: Exception handling request to /TestView_1/report/get_data: org.jboss.resteasy.spi.UnhandledException: RESTEASY003770: Response is committed, can't handle exception
at org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:167)
at org.jboss.resteasy.core.SynchronousDispatcher.writeResponse(SynchronousDispatcher.java:471)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:415)
at org.jboss.resteasy.core.SynchronousDispatcher.invokePropagateNotFound(SynchronousDispatcher.java:240)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:225)
at org.jboss.resteasy.plugins.server.servlet.FilterDispatcher.doFilter(FilterDispatcher.java:62)
at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:60)
at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131)
at io.undertow.servlet.handlers.FilterHandler.handleRequest(FilterHandler.java:84)
at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62)
at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36)
at org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(SecurityContextAssociationHandler.java:78)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:131)
at io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:57)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:46)
at io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:64)
at io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:60)
at io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:77)
at io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(NotificationReceiverHandler.java:50)
at io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(AbstractSecurityContextAssociationHandler.java:43)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(JACCContextIdHandler.java:61)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:284)
at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:263)
at io.undertow.servlet.handlers.ServletInitialHandler.access$000(ServletInitialHandler.java:81)
at io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(ServletInitialHandler.java:174)
at io.undertow.server.Connectors.executeRootHandler(Connectors.java:202)
at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:793)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: already removed
at org.teiid.common.buffer.FileStore.checkRemoved(FileStore.java:162)
at org.teiid.common.buffer.FileStore.read(FileStore.java:156)
at org.teiid.common.buffer.FileStore$1.nextBuffer(FileStore.java:223)
at org.teiid.common.buffer.ExtensibleBufferedInputStream.ensureBytes(ExtensibleBufferedInputStream.java:42)
at org.teiid.common.buffer.ExtensibleBufferedInputStream.read(ExtensibleBufferedInputStream.java:54)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Reader.read(Reader.java:100)
at org.teiid.core.util.ReaderInputStream.read(ReaderInputStream.java:94)
at org.teiid.core.util.ObjectConverterUtil.write(ObjectConverterUtil.java:106)
at org.teiid.core.util.ObjectConverterUtil.write(ObjectConverterUtil.java:143)
at org.teiid.core.util.ObjectConverterUtil.write(ObjectConverterUtil.java:139)
at org.teiid.jboss.rest.TeiidRSProvider$1.write(TeiidRSProvider.java:72)
at org.jboss.resteasy.plugins.providers.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:32)
at org.jboss.resteasy.plugins.providers.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:17)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.writeTo(AbstractWriterInterceptorContext.java:131)
at org.jboss.resteasy.core.interception.ServerWriterInterceptorContext.writeTo(ServerWriterInterceptorContext.java:60)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:120)
at org.jboss.resteasy.security.doseta.DigitalSigningInterceptor.aroundWriteTo(DigitalSigningInterceptor.java:145)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:124)
at org.jboss.resteasy.plugins.interceptors.encoding.GZIPEncodingInterceptor.aroundWriteTo(GZIPEncodingInterceptor.java:100)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:124)
at org.jboss.resteasy.core.ServerResponseWriter.writeNomapResponse(ServerResponseWriter.java:98)
at org.jboss.resteasy.core.SynchronousDispatcher.writeResponse(SynchronousDispatcher.java:466)
... 33 more
I was able to find the solution for this. I tried Jdbc cliet as #ramesh mentioned. But the issue persist.
This issue persist for both XML and JSON format responses we retrieve from REST API.
This only happens when the response size is larger than 4000 characters which is the default limit for teiid. I increased this limit from the System Properties using management console and re-start the teiid cluster.
property value boot-time
org.teiid.maxStringLength 200000 true
This solved this Empty cache response issue.

storm.kafka.UpdateOffsetException - Issue with Opaque Trident Kafka Spout

I'm using trident topology with OpaqueTridentKafkaSpout.
Code snippet of TridentKafkaConfig i’m using :-
OpaqueTridentKafkaSpout kafkaSpout = null;
TridentKafkaConfig spoutConfig = new TridentKafkaConfig(new ZkHosts("xxx.x.x.9:2181,xxx.x.x.1:2181,xxx.x.x.2:2181"), "topic_name");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConfig.fetchSizeBytes = 147483600;
kafkaSpout = new OpaqueTridentKafkaSpout(spoutConfig);
I get this runtime exception from one of the workers :-
java.lang.RuntimeException: storm.kafka.UpdateOffsetException at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:135)
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:106)
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
at
backtype.storm.daemon.executor$fn_5694$fn5707$fn5758.invoke(executor.clj:819)
at backtype.storm.util$async_loop$fn545.invoke(util.clj:479) at
clojure.lang.AFn.run(AFn.java:22) at
java.lang.Thread.run(Thread.java:745) Caused by:
storm.kafka.UpdateOffsetException at
storm.kafka.KafkaUtils.fetchMessages(KafkaUtils.java:186) at
storm.kafka.trident.TridentKafkaEmitter.fetchMessages(TridentKafkaEmitter.java:132)
at
storm.kafka.trident.TridentKafkaEmitter.doEmitNewPartitionBatch(TridentKafkaEmitter.java:113)
at
storm.kafka.trident.TridentKafkaEmitter.failFastEmitNewPartitionBatch(TridentKafkaEmitter.java:72)
at
storm.kafka.trident.TridentKafkaEmitter.emitNewPartitionBatch(TridentKafkaEmitter.java:79)
at
storm.kafka.trident.TridentKafkaEmitter.access$000(TridentKafkaEmitter.java:46)
at
storm.kafka.trident.TridentKafkaEmitter$1.emitPartitionBatch(TridentKafkaEmitter.java:204)
at
storm.kafka.trident.TridentKafkaEmitter$1.emitPartitionBatch(TridentKafkaEmitter.java:194)
at
storm.trident.spout.OpaquePartitionedTridentSpoutExecutor$Emitter.emitBatch(OpaquePartitionedTridentSpoutExecutor.java:127)
at
storm.trident.spout.TridentSpoutExecutor.execute(TridentSpoutExecutor.java:82)
at
storm.trident.topology.TridentBoltExecutor.execute(TridentBoltExecutor.java:370)
at
backtype.storm.daemon.executor$fn5694$tuple_action_fn5696.invoke(executor.clj:690)
at
backtype.storm.daemon.executor$mk_task_receiver$fn5615.invoke(executor.clj:436)
at
backtype.storm.disruptor$clojure_handler$reify_5189.onEvent(disruptor.clj:58)
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:127)
... 6 more
As per some posts, I have tried setting spoutConfig :-
spoutConfig.maxOffsetBehind = Long.MAX_VALUE;
spoutConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime();
My Kafka retention time is default - 128 Hours i.e. 7 Days and kafka producer is sending 6800 messages/second to Storm/Trident topology. I have gone through most of the posts, but none of them seem to solve this issue. What's the best way to handle this issue ?
I still dont know what caused this issue. But basically we did not shut down storm, zookeeper and kafka properly. This resulted in storm topologies failing, we had to tear down the entire cluster and re-build it again. Updating to storm 0.10.0 helped fixing some of the other issues.

Wso2 BPS: No such channel Error

I'm using wso2 bps 2.1.2 for running simple bpel process with tree invokes called one by one in loop. The loop is around one hundred times. Problem is that sometimes process hang in running state. In logs I get error:
[2013-03-25 14:44:17,897] ERROR - BpelEngineImpl - Scheduled job failed; jobDetail=JobDetails( instanceId: 14109433 mexId: null processId: null type: TIMER channel: 11513 correlatorId: null correlationKeySet: null retryCount: null inMem: false detailsExt: {})
java.lang.IllegalArgumentException: No such channel; id=11513
at org.apache.ode.jacob.vpu.ExecutionQueueImpl.findChannelFrame(ExecutionQueueImpl.java:205)
at org.apache.ode.jacob.vpu.ExecutionQueueImpl.consumeExport(ExecutionQueueImpl.java:232)
at org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.importChannel(JacobVPU.java:369)
at org.apache.ode.jacob.JacobObject.importChannel(JacobObject.java:47)
at org.apache.ode.bpel.engine.BpelRuntimeContextImpl$5.run(BpelRuntimeContextImpl.java:964)
at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.run(JacobVPU.java:451)
at org.apache.ode.jacob.vpu.JacobVPU.execute(JacobVPU.java:139)
at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.execute(BpelRuntimeContextImpl.java:879)
at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.timerEvent(BpelRuntimeContextImpl.java:968)
at org.apache.ode.bpel.engine.BpelProcess.handleJobDetails(BpelProcess.java:478)
at org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:560)
at org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:445)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:537)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:531)
at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:284)
at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:239)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:531)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:515)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:738)
I can't find any useful information about this error. I'm using oracle database. I tried to modify bps.xml with:
<tns:OpenJPAConfig>
<tns:property name="openjpa.FlushBeforeQueries" value="true"/>
<!-- added this line as for https://wso2.org/jira/browse/CARBON-7500 (use also Oracle 11g Driver!!) -->
<tns:property name="openjpa.jdbc.DBDictionary" value="oracle(batchLimit=0)"/>
</tns:OpenJPAConfig>
But this didn't help.
Process is really simple it look like this:
<forEach counterName="count" parallel="no" >
< doXslTransform …>
<wait 1s>
<invoke ...>
<doXslTransform …>
<wait 1s>
<invoke ...>
< doXslTransform …>
<wait 1s>
<invoke ...>
</forEach>
How can I solve “No such channel” errors?
Thanks Tomek
We identified one issue that was causing this problem and fixed it. It was due to a missing process instance lock in the ode run-time embedded within BPS. We found this issue and fixed it.
https://issues.apache.org/jira/browse/ODE-989
https://wso2.org/jira/browse/BPS-218
If you can attach your sample scenario to the jira, it would help us add another test case. The fix is already available in the trunk and will be available in the next release.
Regards
Nandika
I had remove waits from process and everything start working without problems. It seems that there is some bug in < wait > activity in WSO2 BPS 2.1.2. In BPS 3.0.0 it seams that waits are working.