How to handle response of more than 4000 characters using camunda external task - camunda

When using external task feature provided by camunda I am facing issue if response is more that 4k characters. I am using this module to implement external task feature
https://www.npmjs.com/package/camunda-external-task-client-js
Exception:-
NULL,
3,
NULL,
NULL
) was aborted. Call getNextException to see the cause.
org.postgresql.util.PSQLException: **ERROR: value too long for type character varying(4000)**
'. Flush summary:
[
INSERT HistoricVariableInstanceEntity[dd43fad7-904e-11ea-bb50-0242ac1f030a]
INSERT HistoricJobLogEventEntity[dd4421ea-904e-11ea-bb50-0242ac1f030a]
INSERT HistoricVariableUpdateEventEntity[dd3be481-904e-11ea-bb50-0242ac1f030a]
INSERT HistoricVariableUpdateEventEntity[dd3c32a4-904e-11ea-bb50-0242ac1f030a]
INSERT HistoricVariableUpdateEventEntity[dd43fad8-904e-11ea-bb50-0242ac1f030a]
INSERT HistoricExternalTaskLogEntity[dd3ccee6-904e-11ea-bb50-0242ac1f030a]
INSERT ByteArrayEntity[dd3c32a2-904e-11ea-bb50-0242ac1f030a]
INSERT ByteArrayEntity[dd3c32a3-904e-11ea-bb50-0242ac1f030a]
INSERT ByteArrayEntity[dd3ca7d5-904e-11ea-bb50-0242ac1f030a]
INSERT VariableInstanceEntity[dd43fad7-904e-11ea-bb50-0242ac1f030a]
INSERT MessageEntity[dd4421e9-904e-11ea-bb50-0242ac1f030a]
DELETE ExternalTaskEntity[cadab10c-904e-11ea-bb50-0242ac1f030a]
UPDATE VariableInstanceEntity[c82b4842-904e-11ea-bb50-0242ac1f030a]
UPDATE VariableInstanceEntity[c82b4845-904e-11ea-bb50-0242ac1f030a]
DELETE VariableInstanceEntity[cad273a0-904e-11ea-bb50-0242ac1f030a]
DELETE VariableInstanceEntity[cad273a2-904e-11ea-bb50-0242ac1f030a]
DELETE VariableInstanceEntity[cad273a4-904e-11ea-bb50-0242ac1f030a]

This is due to the Camunda variables limit of 4000 chars. Your instance process in Camunda has to serve the response in any format but not as a String to sort it out. A couple of suggestions:
Wrap the response into an object (not a String). This way it is saved in the database as a blob.
Use Camunda Spin and process it as JSON https://docs.camunda.org/manual/7.12/reference/spin/

Related

How to solve stability problems in Google Dataflow

I have a Dataflow job that has been running stable for several months.
The last 3 days or so, I've problems with the job, it's getting stuck after a certain amount of time and the only thing I can do is stop the job and start a new one. This happened after 2, 6 and 24 hours of processing. Here is the latest exception:
java.lang.ExceptionInInitializerError
at org.apache.beam.runners.dataflow.worker.options.StreamingDataflowWorkerOptions$WindmillServerStubFactory.create (StreamingDataflowWorkerOptions.java:183)
at org.apache.beam.runners.dataflow.worker.options.StreamingDataflowWorkerOptions$WindmillServerStubFactory.create (StreamingDataflowWorkerOptions.java:169)
at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper (ProxyInvocationHandler.java:592)
at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault (ProxyInvocationHandler.java:533)
at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke (ProxyInvocationHandler.java:158)
at com.sun.proxy.$Proxy54.getWindmillServerStub (Unknown Source)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.<init> (StreamingDataflowWorker.java:677)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.fromDataflowWorkerHarnessOptions (StreamingDataflowWorker.java:562)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.main (StreamingDataflowWorker.java:274)
Caused by: java.lang.RuntimeException: Loading windmill_service failed:
at org.apache.beam.runners.dataflow.worker.windmill.WindmillServer.<clinit> (WindmillServer.java:42)
Caused by: java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0 (Native Method)
at sun.nio.ch.FileDispatcherImpl.write (FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer (IOUtil.java:93)
at sun.nio.ch.IOUtil.write (IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write (FileChannelImpl.java:211)
at java.nio.channels.Channels.writeFullyImpl (Channels.java:78)
at java.nio.channels.Channels.writeFully (Channels.java:101)
at java.nio.channels.Channels.access$000 (Channels.java:61)
at java.nio.channels.Channels$1.write (Channels.java:174)
at java.nio.file.Files.copy (Files.java:2909)
at java.nio.file.Files.copy (Files.java:3027)
at org.apache.beam.runners.dataflow.worker.windmill.WindmillServer.<clinit> (WindmillServer.java:39)
Seems like there is no space left on a device, but shouldn't this be managed by Google? Or is this an error in my job somehow?
UPDATE:
The workflow is as follows:
Reading mass data from PubSub (up to 1500/s)
Filter some messages
Keeping session window on key and grouping by it
Sort the data and do calculations
Output the data to another PubSub
You can increase the storage capacity in the parameter of your pipelise. Look at this one diskSizeGb in this page
In addition, more you keep data in memory, more you need memory. It's the case for the windows, if you never close them, or if you allow late data for too long time, you need a lot of memory to keep all these data up.
Tune either your pipeline, or your machine type. Or both!

alpakka cassandrasource read data from cassandra continuously

We are doing some POC to read cassandra table continuosly using Alpakka CassandraSource. Following is the sample code:
final Statement stmt = new SimpleStatement("SELECT * FROM testdb.emp1").setFetchSize(20);
final CompletionStage<List<Row>> rows = CassandraSource.create(stmt, session).runWith(Sink.seq(), materializer);
rows.thenAcceptAsync( e -> e.forEach(System.out::println));
The above code fetches the rows from emp1 table. Since this table grows continuosly we need to keep reading as soon as data available. Is there any way we can set continuous read in CassandraSource?
There is currently no support for continuously reading a table in Alpakka Cassandra connector. However you can make it work by wrapping CassandraSource.create in a RestartSource.withBackoff that will restart the cassandra source after it completes. More about restarting sources in the documentation.

Streaming MutationGroups into Spanner

I'm trying to stream MutationGroups into spanner with SpannerIO.
The goal is to write new MuationGroups every 10 seconds, as we will use spanner to query near-time KPI's.
When I don't use any windows, I get the following error:
Exception in thread "main" java.lang.IllegalStateException: GroupByKey cannot be applied to non-bounded PCollection in the GlobalWindow without a trigger. Use a Window.into or Window.triggering transform prior to GroupByKey.
at org.apache.beam.sdk.transforms.GroupByKey.applicableTo(GroupByKey.java:173)
at org.apache.beam.sdk.transforms.GroupByKey.expand(GroupByKey.java:204)
at org.apache.beam.sdk.transforms.GroupByKey.expand(GroupByKey.java:120)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
at org.apache.beam.sdk.transforms.Combine$PerKey.expand(Combine.java:1585)
at org.apache.beam.sdk.transforms.Combine$PerKey.expand(Combine.java:1470)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$WriteGrouped.expand(SpannerIO.java:868)
at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$WriteGrouped.expand(SpannerIO.java:823)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
at quantum.base.transform.entity.spanner.SpannerProtoWrite.expand(SpannerProtoWrite.java:52)
at quantum.base.transform.entity.spanner.SpannerProtoWrite.expand(SpannerProtoWrite.java:20)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
at quantum.entitybuilder.pipeline.EntityBuilderPipeline$Write$SpannerWrite.expand(EntityBuilderPipeline.java:388)
at quantum.entitybuilder.pipeline.EntityBuilderPipeline$Write$SpannerWrite.expand(EntityBuilderPipeline.java:372)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
at quantum.entitybuilder.pipeline.EntityBuilderPipeline.main(EntityBuilderPipeline.java:122)
:entityBuilder FAILED
Because of the error above I assume the input collection needs to be windowed and triggered, as SpannerIO uses a GroupByKey (this is also what I need for my use case):
...
.apply("1-minute windows", Window.<MutationGroup>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(Repeatedly.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(10))
).orFinally(AfterWatermark.pastEndOfWindow()))
.discardingFiredPanes()
.withAllowedLateness(Duration.ZERO))
.apply(SpannerIO.write()
.withProjectId(entityConfig.getSpannerProject())
.withInstanceId(entityConfig.getSpannerInstance())
.withDatabaseId(entityConfig.getSpannerDb())
.grouped());
When I do this, I get the following exceptions during runtime:
java.lang.IllegalArgumentException: Attempted to get side input window for GlobalWindow from non-global WindowFn
org.apache.beam.sdk.transforms.windowing.PartitioningWindowFn$1.getSideInputWindow(PartitioningWindowFn.java:49)
com.google.cloud.dataflow.worker.StreamingModeExecutionContext$StepContext.issueSideInputFetch(StreamingModeExecutionContext.java:631)
com.google.cloud.dataflow.worker.StreamingModeExecutionContext$UserStepContext.issueSideInputFetch(StreamingModeExecutionContext.java:683)
com.google.cloud.dataflow.worker.StreamingSideInputFetcher.storeIfBlocked(StreamingSideInputFetcher.java:182)
com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunner.processElement(StreamingSideInputDoFnRunner.java:71)
com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:323)
com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:43)
com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:271)
org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:219)
org.apache.beam.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:69)
org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:517)
org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:505)
org.apache.beam.sdk.values.ValueWithRecordId$StripIdsDoFn.processElement(ValueWithRecordId.java:145)
After investigating further it appears to be due to the .apply(Wait.on(input)) in SpannerIO: It has a global side input which does not seem to work with my fixed windows, as the docs of Wait.java state:
If signal is globally windowed, main input must also be. This typically would be useful
* only in a batch pipeline, because the global window of an infinite PCollection never
* closes, so the wait signal will never be ready.
As a temporary workaround I tried the following:
add a GlobalWindow with triggers instead of fixed windows:
.apply("globalwindow", Window.<MutationGroup>into(new GlobalWindows())
.triggering(Repeatedly.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(10))
).orFinally(AfterWatermark.pastEndOfWindow()))
.discardingFiredPanes()
.withAllowedLateness(Duration.ZERO))
This results in writes to spanner only when I drain my pipeline. I have the impression the Wait.on() signal is only triggered when the Global windows closes, and doesn't work with triggers.
Disable the .apply(Wait.on(input)) in SpannerIO:
This results in the pipeline getting stuck on the view creation which
is described in this SO post:
SpannerIO Dataflow 2.3.0 stuck in CreateDataflowView.
When I check the worker logs for clues, I do get the following warnings:
logger: "org.apache.beam.sdk.coders.SerializableCoder"
message: "Can't verify serialized elements of type SpannerSchema have well defined equals method. This may produce incorrect results on some PipelineRunner
logger: "org.apache.beam.sdk.coders.SerializableCoder"
message: "Can't verify serialized elements of type BoundedSource have well defined equals method. This may produce incorrect results on some PipelineRunner"
Note that everything works with the DirectRunner and that I'm trying to use the DataflowRunner.
Does anyone have any other suggestions for things I can try to get this running? I can hardly imagine that I'm the only one trying to stream MutationGroups into spanner.
Thanks in advance!
Currently, SpannerIO connector is not supported with Beam Streaming. Please follow this Pull Request which adds streaming support for spanner IO connector.

The flume event was truncated

Here I'm facing a issue that I receive message from Kafka source, and write a interceptor to extract two fields(dataSoure and businessType) from the kafka message(json format). Here I'm using gson.fromJson(). But the issue is I got below error.
Here I want to know whether the Flume truncate the Flume event when it exceed a limit? If yes, how to setup it to bigger value. As my kafka message always very long, about 60K bytes.
Looking forward reply. Thanks in advance!
2015-12-09 11:48:05,665 (PollableSourceRunner-KafkaSource-apply)
[ERROR -
org.apache.flume.source.kafka.KafkaSource.process(KafkaSource.java:153)]
KafkaSource EXCEPTION, {} com.google.gson.JsonSyntaxException:
com.google.gson.stream.MalformedJsonException: Unterminated string at
line 1 column 4096
at com.google.gson.Gson.fromJson(Gson.java:809)
at com.google.gson.Gson.fromJson(Gson.java:761)
at com.google.gson.Gson.fromJson(Gson.java:710)
at com.xxx.flume.interceptor.JsonLogTypeInterceptor.intercept(JsonLogTypeInterceptor.java:43)
at com.xxx.flume.interceptor.JsonLogTypeInterceptor.intercept(JsonLogTypeInterceptor.java:61)
at org.apache.flume.interceptor.InterceptorChain.intercept(InterceptorChain.java:62)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:146)
at org.apache.flume.source.kafka.KafkaSource.process(KafkaSource.java:130)
Finally, I find the root cause by debug the source code.
It is becaues I tried to convert event.getBody() to a map using Gson, which is incorrect, as the event.getBody() is a byte[], not a String, which can't be converted. The correct code should be as below:
String body = new String(event.getBody(), "UTF-8");
Map<String, Object> map = gson.fromJson(body, new TypeToken<Map<String, Object>>() {}.getType());

How to debug "could not receive data from client: Connection reset by peer"

I'm running a django-celery application on Ubuntu-12.04.
When I run a celery task from my web interface, I get the following error, taken form postgresql-9.3 logfile (maximum level of log):
2013-11-12 13:57:01 GMT tss_usr 8113 LOG: could not receive data from client: Connection reset by peer
tss_usr is the postgresql user of the django application database and (in this example) 8113 is the pid of the process who killed the connection, I guess.
Have you got any idea on why this happens or at least how to debug this issue?
To make things work again I need to restart postgresql which is extremely uncomfortable.
I know this is an older post, but I just found it because I had the same error today in my postgres logs. I narrowed it down to a PDO select statement. I'm using Zend Framework 1.10.3 on Ubuntu Precise.
The following pdo statement generated an error if $opinion is a long text string. The column opinion is type Text in my postgres table. The query succeeds if $opinion is under a certain number of characters. 1000 characters works fine. 2000 characters fails with "could not receive data from client: Connection reset by peer".
$select = $this->db->select()
->from( 'datauserstopics' )
->where("opinion = ?",trim($opinion))
->where("datatopicsid = ?",trim($tid))
->where("datausersid= ?",$datausersid);
$stmt = $this->db->query($select);
I circumvented the problem by using:
->where("substr(opinion,1,100) = ?",trim(substr($opinion,1,100)))
This is not a perfect solution, but for my purposes, the select statement using substr() suffices.
Note that I have no problem inserting long strings into the same table/column. The disconnect problem only appears for me on the PDO select with relatively long text strings.
I'm getting it in 2017 with 9.4, I have no text fields, don't know what a PDO is. My select statement is about 50 bytes long, I'm trying to fetch an int4 and a double precision. I suspect the error message can mean multiple things.
I've since found https://dba.stackexchange.com/questions/142350/postgres-could-not-receive-data-from-client-connection-reset-by-peer which indicates it could be a problem with the client configuration. My client is libpg and PQconnectdb() is giving me a CONNECTION_OK return. It works at least partly.
For me, restarting the hypervisor where both the Postgres and the application using it helped. I've seen stack traces in dmesg before, though.