WSO2 EI log about Java heap space - wso2

I have called an endpoint and it response a large data, unfortunately show the error message in WSO2 carbon log . How can I solve it? Thank you.
TID: [-1] [] [2018-02-26 17:48:47,869] ERROR {org.wso2.carbon.das.messageflow.data.publisher.data.MessageFlowObserverStore} - Error occurred while notifying the statistics observer {org.wso2.carbon.das.messageflow.data.publisher.data.MessageFlowObserverStore}
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:181)
at com.esotericsoftware.kryo.io.Output.require(Output.java:160)
at com.esotericsoftware.kryo.io.Output.writeString_slow(Output.java:462)
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:363)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:191)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:184)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:113)
at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:39)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
at org.wso2.carbon.das.messageflow.data.publisher.publish.StatisticsPublisher.addEventData(StatisticsPublisher.java:116)
at org.wso2.carbon.das.messageflow.data.publisher.publish.StatisticsPublisher.process(StatisticsPublisher.java:67)
at org.wso2.carbon.das.messageflow.data.publisher.observer.DASMediationFlowObserver.updateStatistics(DASMediationFlowObserver.java:55)
at org.wso2.carbon.das.messageflow.data.publisher.data.MessageFlowObserverStore.notifyObservers(MessageFlowObserverStore.java:71)
at org.wso2.carbon.das.messageflow.data.publisher.services.MessageFlowReporterThread.processAndPublishEventList(MessageFlowReporterThread.java:225)
at org.wso2.carbon.das.messageflow.data.publisher.services.MessageFlowReporterThread.run(MessageFlowReporterThread.java:95)

By looking at the out of memory issue it is hard to say anything about the culprit. In order to find out the actual root cause we have to analyze the heapdump (There will heapdump created by wso2 servers automatically in CARBON_HOME/repository/logs/heap-dump.hprof) using an analyzing tool such as MAT, jprofile.
However, if the response message is large, there is a possibility that the server goes OOM as it keeps and may build the response message in memory. If you want to process large messages, you can tune the heap memory allocation as in the doc.

Related

How to solve stability problems in Google Dataflow

I have a Dataflow job that has been running stable for several months.
The last 3 days or so, I've problems with the job, it's getting stuck after a certain amount of time and the only thing I can do is stop the job and start a new one. This happened after 2, 6 and 24 hours of processing. Here is the latest exception:
java.lang.ExceptionInInitializerError
at org.apache.beam.runners.dataflow.worker.options.StreamingDataflowWorkerOptions$WindmillServerStubFactory.create (StreamingDataflowWorkerOptions.java:183)
at org.apache.beam.runners.dataflow.worker.options.StreamingDataflowWorkerOptions$WindmillServerStubFactory.create (StreamingDataflowWorkerOptions.java:169)
at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper (ProxyInvocationHandler.java:592)
at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault (ProxyInvocationHandler.java:533)
at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke (ProxyInvocationHandler.java:158)
at com.sun.proxy.$Proxy54.getWindmillServerStub (Unknown Source)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.<init> (StreamingDataflowWorker.java:677)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.fromDataflowWorkerHarnessOptions (StreamingDataflowWorker.java:562)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.main (StreamingDataflowWorker.java:274)
Caused by: java.lang.RuntimeException: Loading windmill_service failed:
at org.apache.beam.runners.dataflow.worker.windmill.WindmillServer.<clinit> (WindmillServer.java:42)
Caused by: java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0 (Native Method)
at sun.nio.ch.FileDispatcherImpl.write (FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer (IOUtil.java:93)
at sun.nio.ch.IOUtil.write (IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write (FileChannelImpl.java:211)
at java.nio.channels.Channels.writeFullyImpl (Channels.java:78)
at java.nio.channels.Channels.writeFully (Channels.java:101)
at java.nio.channels.Channels.access$000 (Channels.java:61)
at java.nio.channels.Channels$1.write (Channels.java:174)
at java.nio.file.Files.copy (Files.java:2909)
at java.nio.file.Files.copy (Files.java:3027)
at org.apache.beam.runners.dataflow.worker.windmill.WindmillServer.<clinit> (WindmillServer.java:39)
Seems like there is no space left on a device, but shouldn't this be managed by Google? Or is this an error in my job somehow?
UPDATE:
The workflow is as follows:
Reading mass data from PubSub (up to 1500/s)
Filter some messages
Keeping session window on key and grouping by it
Sort the data and do calculations
Output the data to another PubSub
You can increase the storage capacity in the parameter of your pipelise. Look at this one diskSizeGb in this page
In addition, more you keep data in memory, more you need memory. It's the case for the windows, if you never close them, or if you allow late data for too long time, you need a lot of memory to keep all these data up.
Tune either your pipeline, or your machine type. Or both!

akka.persistence.RecoveryTimedOut: Recovery timed out, didn't get snapshot within 30000 milliseconds

I have a failing test because of the timeout. Here's what I see in log output:
2018-05-15 10:47:56.152 WARN com.datastax.driver.core.NettyUtil [UserDataServiceSpec-cassandra-plugin-default-dispatcher-27] [] [] - Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead.
2018-05-15 10:48:38.616 ERROR n.f.c.indexing.UniquelyIndexingActor [UserDataServiceSpec-akka.actor.default-dispatcher-39] [UserDataServiceSpec-akka.actor.default-dispatcher-29] [akka.tcp://UserDataServiceSpec#127.0.0.1:51627/user/$c/user-email-indexer] - Persistence failure when replaying events for persistenceId [user-email-indexer]. Last known sequence number [0]
akka.persistence.RecoveryTimedOut: Recovery timed out, didn't get snapshot within 30000 milliseconds
2018-05-15 10:48:38.617 ERROR a.c.sharding.PersistentShardCoordinator [UserDataServiceSpec-akka.actor.default-dispatcher-39] [UserDataServiceSpec-akka.actor.default-dispatcher-30] [akka.tcp://UserDataServiceSpec#127.0.0.1:51627/system/sharding/userdataCoordinator/singleton/coordinator] - Persistence failure when replaying events for persistenceId [/sharding/userdataCoordinator]. Last known sequence number [0]
akka.persistence.RecoveryTimedOut: Recovery timed out, didn't get snapshot within 30000 milliseconds
2018-05-15 10:48:38.618 INFO akka.actor.LocalActorRef [UserDataServiceSpec-akka.actor.default-dispatcher-39] [UserDataServiceSpec-akka.actor.default-dispatcher-35] [akka://UserDataServiceSpec/user/$c/user-email-indexer] - Message [akka.persistence.SnapshotProtocol$LoadSnapshotFailed] from Actor[akka://UserDataServiceSpec/system/cassandra-snapshot-store#-750137778] to Actor[akka://UserDataServiceSpec/user/$c/user-email-indexer#1201357728] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
2018-05-15 10:48:38.619 INFO akka.actor.LocalActorRef [UserDataServiceSpec-akka.actor.default-dispatcher-37] [UserDataServiceSpec-akka.actor.default-dispatcher-39] [akka://UserDataServiceSpec/system/sharding/userdataCoordinator/singleton/coordinator] - Message [akka.cluster.sharding.ShardCoordinator$RebalanceTick$] from Actor[akka://UserDataServiceSpec/system/sharding/userdataCoordinator/singleton/coordinator#-75387958] to Actor[akka://UserDataServiceSpec/system/sharding/userdataCoordinator/singleton/coordinator#-75387958] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
(Don't ask my why we are not using in memory storage to test persistent actors. It's not relevant to the problem right now.)
I am not experienced in Akka and JVM, but the messages I see are just the equivalence of "You screwed up, man". There is no hint in them how to fix the problem or why this RecoveryTimedOut occurs.
If someone could give me a valuable advice how to diagnose the problem, it would be nice.
UniquelyIndexingActor is created as cluster singleton.
Try adding this to your config:
akka {
persistence {
journal-plugin-fallback {
recovery-event-timeout = 60s
}
}
}
It solved the problem for me. I found a reference to it in https://github.com/akka/akka/blob/master/akka-persistence/src/main/resources/reference.conf

Some questions about protobuf

We are building a RTB(real time bidding) platform. Using nginx as http server, the bidder is writen in lua, google protocol buffer for serializing data and Zlog for logs. After test runs, we got three error messages in the nginx error log:
"[libprotobuf Error, google/protobuf/wire_format.cc:1053]
String field contains invalid UTF-8 data when parsing a protocol buffer.
Use the 'bytes' type if you intend to send raw bytes."
So we went back to check the source code of protocol buffer, and found that this check is controlled by a macro(-DNDEBUG: it means NOT debug mode?, according to the comment). And -DNDEBUG disables GOOGLE_PROTOBUF_UTF8_VALIDATION(i think?). So, we enabled this macro(-DNDEBUG) in the configuration. However, after testing, we still got the same error message. And then, we changed all the "String" type to "Bytes" typr in XXX.proto. After testing, the same error message showed.
worker process 53574 exited on signal 11(core dumped),then process died.
lua entry thread aborted: runtime error:/home/bilin/rtb/src/lua/shared/log.lua:34: 'short' is not callable"
Hope somebody can help us solving those problems.
Thank you.

Couchbase - ElasticSearch Java Heap memory

We have a Couchbase instance mounted on a AmazoneWeb Service Server, and an Elastic Search instance running on the same server.
The connection bewtween the two of them is being done ok, and currently replicating fine until...
Out of the blue, we got the following error log on ElasticSearch:
[2013-08-29 21:27:34,947][WARN ][cluster.metadata ] [01-Thor] failed to dynamically update the mapping in cluster_state from shard
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:343)
at org.elasticsearch.common.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:103)
at org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:1848)
at org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator.writeString(UTF8JsonGenerator.java:436)
at org.elasticsearch.common.xcontent.json.JsonXContentGenerator.writeString(JsonXContentGenerator.java:84)
at org.elasticsearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:314)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.doXContentBody(AbstractFieldMapper.java:601)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.doXContentBody(NumberFieldMapper.java:286)
at org.elasticsearch.index.mapper.core.LongFieldMapper.doXContentBody(LongFieldMapper.java:338)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.toXContent(AbstractFieldMapper.java:595)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:920)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:852)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:920)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:852)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:920)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:852)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:920)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:852)
at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:920)
at org.elasticsearch.index.mapper.DocumentMapper.toXContent(DocumentMapper.java:700)
at org.elasticsearch.index.mapper.DocumentMapper.refreshSource(DocumentMapper.java:682)
at org.elasticsearch.index.mapper.DocumentMapper.<init>(DocumentMapper.java:342)
at org.elasticsearch.index.mapper.DocumentMapper$Builder.build(DocumentMapper.java:224)
at org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:231)
at org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:380)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:190)
at org.elasticsearch.cluster.metadata.MetaDataMappingService$2.execute(MetaDataMappingService.java:185)
at org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:229)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2013-08-29 21:27:56,948][WARN ][indices.ttl ] [01-Thor] failed to execute ttl purge
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.ByteBlockPool$Allocator.getByteBlock(ByteBlockPool.java:66)
at org.apache.lucene.util.ByteBlockPool.nextBuffer(ByteBlockPool.java:202)
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:319)
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:274)
at org.apache.lucene.search.ConstantScoreAutoRewrite$CutOffTermCollector.collect(ConstantScoreAutoRewrite.java:131)
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:79)
at org.apache.lucene.search.ConstantScoreAutoRewrite.rewrite(ConstantScoreAutoRewrite.java:95)
at org.apache.lucene.search.MultiTermQuery$ConstantScoreAutoRewrite.rewrite(MultiTermQuery.java:220)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:639)
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:686)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.indices.ttl.IndicesTTLService.purgeShards(IndicesTTLService.java:186)
at org.elasticsearch.indices.ttl.IndicesTTLService.access$000(IndicesTTLService.java:65)
at org.elasticsearch.indices.ttl.IndicesTTLService$PurgerThread.run(IndicesTTLService.java:122)
[2013-08-29 21:29:23,919][WARN ][indices.ttl ] [01-Thor] failed to execute ttl purge
java.lang.OutOfMemoryError: Java heap space
We tried changing several memory values, but we cant seem to get it right.
Did some one experienced the same issue?
A few troubleshooting tips:
Generally smart to dedicate one AWS instance only to Elasticsearch for predictable performance / ease of debugging.
Monitor your memory usage using the Bigdesk plugin. This will show you if your memory bottleneck is occurring from Elasticsearch - might be from the OS, simultaneous heavy querying and indexing, or else something unexpected.
Elasticsearch's Java heap should be set around 50% of your boxes's total memory.
This gist from Shay Banon offers several solutions to solve memory problems in Elasticsearch.

Jetty 8.1 flooding the log file with "Dispatched Failed" messages

We are using Jetty 8.1 as an embedded HTTP server. Under overload conditions the server sometimes starts flooding the log file with these messages:
warn: java.util.concurrent.RejectedExecutionException
warn: Dispatched Failed! SCEP#76107610{l(...)<->r(...),d=false,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}...
The same message is repeated thousands of times, and the amount of logging appears to slow down the whole system. The messages itself are fine, our request handler ist just to slow to process the requests in time. But the huge number of repeated messages makes things actually worse and makes it more difficult for the system to recover from the overload.
So, my question is: is this a normal behaviour, or are we doing something wrong?
Here is how we set up the server:
Server server = new Server();
SelectChannelConnector connector = new SelectChannelConnector();
connector.setAcceptQueueSize( 10 );
server.setConnectors( new Connector[]{ connector } );
server.setThreadPool( new ExecutorThreadPool( 32, 32, 60, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>( 10 )));
The SelectChannelEndPoint is the origin of this log message.
To not see it, just set your named logger of org.eclipse.jetty.io.nio.SelectChannelEndPoint to LEVEL=OFF.
Now as for why you see it, that is more interesting to the developers of Jetty. Can you detail what specific version of Jetty you are using and also what specific JVM you are using?