Kafka Streams : Stream Thread failed to lock State Directory - unit-testing

I am trying to test my Kafka Streams application. I have built a simple topology where I read from an input topic and store the same data in a state store.
I tried writing unit tests for this topology using TopologyTestDriver. When I run the test, I got encountered with following error.
org.apache.kafka.streams.errors.LockException: stream-thread [main] task [0_0] Failed to lock the state directory for task 0_0
at org.apache.kafka.streams.processor.internals.AbstractTask.registerStateStores(AbstractTask.java:197)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeStateStores(StreamTask.java:275)
at org.apache.kafka.streams.TopologyTestDriver.<init>(TopologyTestDriver.java:403)
at org.apache.kafka.streams.TopologyTestDriver.<init>(TopologyTestDriver.java:257)
at org.apache.kafka.streams.TopologyTestDriver.<init>(TopologyTestDriver.java:228)
at streams.checkStreams.checkStreamsTest.setup(checkStreamsTest.java:99)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
....
I can see state store getting created locally in /tmp/kafka-streams, but somehow streams thread is unable to get a lock over it. I searched and found that this error might be because of two streams threads are trying to acces it, one has the lock so that other has to wait. But I don't see two streams thread getting created in my code. I am new to this kafka streams and its testing, am I missing any thing here?

The TopologyTestDriver does not create any background threads, so multi-threading (from KafkaStreams itself) should not be an issue -- however, as #BartoszWardziƄski pointed out, if your testing framework executed tests in parallel, and you use the same application.id in different tests, it may lead to locking issues.
The recommendation for tests is, to generate a random application.id to avoid this issue.

If your tests are not running in parallel a solution could be to call the close() method on the TopologyTestDriver. This will clean the resources and remove the locks. This is probably best practice for disposable objects anyway.
If running tests in parallel you can set a random application.id. The problem with this is if you're using a schema registry and connected to a test registry, this will create potentially thousands of schemes (one for each test).
Your two options here are:
Have a unique application.id per test but which is hard-coded (i.e. the name
of the test) and not random.
Don't run your tests in parallel and call close() on
TopologyTestDriver

Related

Threads parked with HTTP-Kit

I have a few threads on the go, each of which make a blocking call to HTTP Kit. My code's been working but has recently taken to freezing after about 30 minutes. All of my threads are stuck at the following point:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
clojure.core$promise$reify__7005.deref(core.clj:6823)
clojure.core$deref.invokeStatic(core.clj:2228)
clojure.core$deref.invoke(core.clj:2214)
my_project.web$fetch.invokeStatic(web.clj:35)
Line my_project.web.clj:35 is something like:
(let [result #(org.httpkit.client/get "http://example.com")]
(I'm using plain Java threads rather than core.async because I'm running the context of a set of concurrent Apache Kafka clients each in their own thread. The Kafka Client does spin up a lot of its own threads, especially as I'm running it a few times, e.g. 5 in parallel).
The fact that all of my threads end up parked like this in HTTP Kit suggests a resource leak, or some code in HTTP Kit dying before it has chance to deliver, or perhaps resource starvation.
Another thread seems to be stuck here. It's possible that it's blocking all of the promise deliveries.
sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:850)
sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
org.httpkit.client.HttpsRequest.unwrapRead(HttpsRequest.java:35)
org.httpkit.client.HttpClient.doRead(HttpClient.java:131)
org.httpkit.client.HttpClient.run(HttpClient.java:377)
java.lang.Thread.run(Thread.java:748)
Any ideas what the problem could be, or pointers for how to diagnose it?
A common thing to do is to set up a DefaultUncaughtExceptionHandler.
This will at least give you an indication if there are exceptions in your threads.
(defn init-jvm-uncaught-exception-logging []
(Thread/setDefaultUncaughtExceptionHandler
(reify Thread$UncaughtExceptionHandler
(uncaughtException [_ thread ex]
(log/error ex "Uncaught exception on" (.getName thread))))))
Stuart Sierra has written nicely on this: https://stuartsierra.com/2015/05/27/clojure-uncaught-exceptions

Issue with mule starting up (at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)

mule is not starting up, it tries to start hangs for a while and after some time tries to start from first again like restart. Took the thread dump. There is a warning while analyzing thread dump which says "3 threads are transitively BLOCKED. It's indicating lock is not getting released." which could be potential issue probably some thing to do with jetty, but not clear what that is. Here is part of thread dump analysis
0x00000000e0f43f40
Object
Held by:
qtp383251638-61-acceptor-0-ServerConnector#7d75f858{HTTP/1.1}{0.0.0.0:7777}
Threads waiting to take lock:
qtp383251638-62-acceptor-1-ServerConnector#7d75f858{HTTP/1.1}{0.0.0.0:7777}
qtp383251638-63-acceptor-2-ServerConnector#7d75f858{HTTP/1.1}{0.0.0.0:7777}
qtp383251638-64-acceptor-3-ServerConnector#7d75f858{HTTP/1.1}{0.0.0.0:7777}
"qtp383251638-61-acceptor-0-ServerConnector#7d75f858{HTTP/1.1}{0.0.0.0:7777}": running, holding [0x00000000e0f43f40]
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:321)
at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:460)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
at java.lang.Thread.run(Thread.java:745)
Acceptors are always in blocked state when they are not actively accepting connections, that is normal for that kind of thread.
Your issue is elsewhere.
You haven't given enough details about it to troubleshoot though. (sorry)
Resolved the issue from thread dump. There was issue in establishing connection with message broker.
nid=0xe128 in Object.wait() [0x00007f41303ef000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:613)
- locked <0x00000000ddddecf0> (a java.lang.Object)
at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)

Neo4j 1.9 throws "java.net.ConnectException: Connection refused" with multi-threaded neocons client

G'day,
I've written a little program in Clojure that uses neocons to jam a bunch of data into Neo4J v1.9.4, and after getting it working have been tinkering with performance.
On large data sets the bottleneck is inserting relationships into Neo4j, which isn't all that surprising given they have to be done one-at-a-time. So my thought was to sprinkle some pmap magic on it, to see if some naive parallelism helped.
Unexpectedly (at least to me), that resulted in neocons throwing a "java.net.ConnectException: Connection refused" exception, which seems odd given that the client will be defaulting to 10 threads (pmap creates no more than numberOfProcessors + 2 threads), while Neo4j will be defaulting to 80 threads (numberOfProcessors * 10, at least if I'm reading the docs right). Last time I checked, 10 was less than 80, so Neo4j should have... <takes off shoes>... lots of threads to spare.
The line of code in question is here - the only change that was made was to switch the "map" call to a "pmap" call.
Any ideas / suggestions?
Thanks in advance,
Peter
Peter,
I would recommend to use batch-mode for the creation of relationships too. I saw you use batch-creation for the nodes already.
Make sure that your batch sizes is roughly between 20k and 50k elements to be most efficient.
Otherwise you end up with two issues:
using one transaction per tx really drains your resources because it does a synchronized forced write to the transaction-log at each commit
as creating relationships locks both start and end-nodes you'll get a lot of locks that other threads wait for and so you can easily end up stalling all your server threads who wait for locks to be released on their start or end-nodes
You should see these locked threads by issuing a kill -3 (or jstack ) against the neo4j server.
Batching those relationship-creations and grouping them by subgraphs so that there is as little overlap as possible between the batches should help a lot.
Not related to your issue, but still worth investigating later.
Not sure what neocons uses there under the hood, but you might fare better with the transactional endpoint and cypher in neo4j 2.0.

Jetty 9 Hangs, QueuedThreadPool Growing Large

We recently upgraded our Jetty servers from version 6.1.25 to 9.0.4. They are deployed on Java 1.7.0_11 64-bit on a Windows 2008 server.
Other than required configuration changes for Jetty (start.ini - very nice), we kept all the JVM flags the same as they were previously. 6 days after deploying in the production environment, the server became unresponsive to HTTP requests. Internal 'heartbeat' processing continued to run per normal during this time but it was not servicing external requests. The service was restarted and 6 days later it again became unresponsive.
During my initial review, I thought I was onto something with https://bugs.eclipse.org/bugs/show_bug.cgi?id=357318. However, that JVM issue was backported from Java 1.8_0XX to Java 1.7.0_06. This led me to review the Thread processing.
Thought it might be related to case 400617/410550 on the eclipse site although it doesn't present itself quite like the write-up, and the case had been apparently resolved in Jetty 9.0.3.
Monitoring the application via JMX shows that Thread count for 'qtp' threads continues to grow over time and I've been unsuccessful in searching for a resolution. Thread configuration is currently set for:
threads.min=10
threads.max=200
threads.timeout=60000
All the qtp threads are typically in WAITING state with the following stack trace:
Name: qtp1805176801-285
State: WAITING on java.util.concurrent.Semaphore$NonfairSync#4bf4a3b0
Total blocked: 0 Total waited: 110
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source)
java.util.concurrent.Semaphore.acquire(Unknown Source)
org.eclipse.jetty.util.BlockingCallback.block(BlockingCallback.java:96)
org.eclipse.jetty.server.HttpConnection$Input.blockForContent(HttpConnection.java:457)
org.eclipse.jetty.server.HttpInput.consumeAll(HttpInput.java:282)
- locked org.eclipse.jetty.util.ArrayQueue#3273ba91
org.eclipse.jetty.server.HttpConnection.completed(HttpConnection.java:360)
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:340)
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:224)
org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
java.lang.Thread.run(Unknown Source)
After a closer look, this appears different from the newest threads that have the following state:
Name: qtp1805176801-734
State: TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#77b83b6e
Total blocked: 5 Total waited: 478
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:390)
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:509)
org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:48)
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:563)
java.lang.Thread.run(Unknown Source)
Based on the naming convention, some of the qtp threads are very old (qtp1805176801-206) while some are very new (qtp1805176801-6973). I find it interesting that the older threads aren't timing out based on the 60 second idle timeout. The application services customers during US business hours and is largely idle in the early morning hours at which time I'd expect almost all of the pool to get cleaned up.
Hoping someone may be able to point me the right direction in terms of how to track this issue down. My experience with Jetty leads me to believe their stuff is very solid and most issues are either programmatic in our implementation (been there) or JVM related (done that). Also open to suggestions if you think I might be chasing a red-herring on the Threads.
NEW INFORMATION:
Tracing the exceptions a little farther, this appears to be caused when GWT RPC calls are timing out while waiting for a response. The following stack trace shows an exception in the log file that is related to a Thread that is in an invalid state. Using this to review and look for other reports of Jetty/GWT interaction issues.
2013-09-03 08:41:49.249:WARN:/webapp:qtp488328684-414: Exception while dispatching incoming RPC call
java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 30015/30000 ms
at org.eclipse.jetty.util.BlockingCallback.block(BlockingCallback.java:103)
at org.eclipse.jetty.server.HttpConnection$Input.blockForContent(HttpConnection.java:457)
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:130)
at java.io.InputStream.read(Unknown Source)
at com.google.gwt.user.server.rpc.RPCServletUtils.readContent(RPCServletUtils.java:175)
at com.google.gwt.user.server.rpc.RPCServletUtils.readContentAsGwtRpc(RPCServletUtils.java:205)
at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.readContent(AbstractRemoteServiceServlet.java:182)
at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(RemoteServiceServlet.java:239)
at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1506)
at c.t.b.servlet.PipelineFilter.doFilter(PipelineFilter.java:56)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1494)
at c.v.servlet.SetRequestEncoding.doFilter(SetRequestEncoding.java:27)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1494)
at c.t.b.servlet.OutOfMemoryFilter.doFilter(OutOfMemoryFilter.java:39)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1094)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1028)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:445)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:267)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:224)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
at java.lang.Thread.run(Unknown Source)
Caused by:
java.util.concurrent.TimeoutException: Idle timeout expired: 30015/30000 ms
at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:153)
at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Ended up posting the question on the Eclipse/Jetty web site. The following link can be used to track any permanent fix to the solution.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=416477
The issue has to do with the Semaphore locking on a QTP Thread that has been timed out during the request as part of a GWT RPC call. The original request is timed, with a timeout of 30 seconds. The request times out while it is waiting on the Semaphore.acquire method to complete. As part of the clean-up of the request, the HTTPConnection attempts to .consumeAll on the request, which again attempts a Sempahore.acquire. This time, the request is not timed and the lock remains in place until the thread is interrupted.
The issue does appear to be very specific to the platform as Jetty has not been able to reproduce the issue and I've not be able to find any other reports of the issue. Furthermore, this only occurs in one of our production environments. My guess is that there is something going on between the GWT RPC Code, Jetty and the Operating System. We have minor upgrades planned for the JDK, Jetty and the GWT SDK.
Workaround
The initial work around was to manually interrupt locked threads a couple times a day via the JMX console. Our longer term solution was to build a clean-up mechanism that looks for these locked threads and calls the interrupt method on them.
The QueuedThreadPool is a shared pool of threads. The threads in it will be reused for other processing. Yes, chasing the thread pool, assuming threads will be cleaned up, is a red herring. Those threads will fall off the pool, slowly, over a long period of time (think hours). This is a performance decision in the thread pool (create is expensive, do it as infrequently as possible).
As for the stacktrace you pasted, its incomplete, so the amount of guessing on behavior is extremely high. But that being said, those 2 lines can indicate normal operations, but without the rest of the stacktrace there's little to go on.
Also, the versions of Java you are using 1.7.0_06 and 1.7.0_11 are very old, and you are subject to hundreds of bugs fixes.
I have the same with Jetty 9.2.3.v20140905 and Java (build 1.8.0_20-b26) 64 bit.
Workaround. Install monit http://mmonit.com/monit/
# monit.conf
check process jetty-service with pidfile "/opt/jetty-service/jetty.pid"
start program = "/usr/sbin/service jetty-service start" with timeout 30 seconds
stop program = "/usr/sbin/service jetty-service stop"
if totalmem is greater than 1268 MB for 10 cycles then restart
if 5 restarts within 5 cycles then timeout

scala specs don't exit when testing actors

I'm trying to test some actors using scala specs. I run the test in IDEA or Maven (as junit) and it does not exit. Looking at the code, my test finished, but some internal threads (scheduler) are hanging around. How can I make the test finish?
Currently this is only possible by causing the actor framework's scheduler to forcibly shut down:
scala.actors.Scheduler.impl.shutdown
However, the underlying implementation of the scheduler has been changing in patch-releases lately, so this may be different, or not quite work with the version you are on. In 2.7.7 the default scheduler appears to be an instance of scala.actors.FJTaskScheduler2 for which this approach should work, however if you end up with a SingleThreadedScheduler it will not, as the shutdown method is a no-op
This will only work if your actors are not waiting on a react at that time