Akka wait till all messages are processed in whole ActorSystem - akka

is there any way to wait till all messages in mailboxes(and stashes) are processed in whole ActorSystem?
I'm trying to develop architecture of integration tests for system based on Akka. And it would be convenient to have method something like actorSystem.awaitAllWorkDone(), in this case arbitrary test would look like:
Start whole environment consisting of many actors
Send message triggering functionality being tested
Call actorSystem.awaitAllWorkDone(). Which waits till all messages are processed, including message sent from step 2 and all inner communication messages which it causes. For example message from step 2 fires 10 other messages sent to other actors, so it should wait for them too.
Now the system in "consistent" state and I can verify it's state. Like verify that all required changes were made in db or all required side-effects were made.
Actually, step 3 can be replaced with waiting for message specific for this test which indicates that all inner-communication are completed. But whole point is to have one "await" method for any test case, and not to searching which message for this test case is indicator that all inner-communication are completed.
Anybody have an idea how to achieve this?

Related

Spring Integration Multiple consumers not processing concurrently

I am using Spring Integration with ActiveMQ. I defined a DefaultMessageListenerContainer with maxConcurrentConsumers = 5. It is referenced in a . After an int-xml:validating-filter and an int-xml:unmarshalling-transformer, I defined a queue channel actionInstructionTransformed. And I have got a poller for this queue channel. When I start my application, in the ActiveMQ console, I can see that a connection is created and inside five sessions.
Now, I have got a #MessageEndpoint with a method annotated
#ServiceActivator(inputChannel = "actionInstructionTransformed", poller = #Poller(value = "customPoller")).
I have got a log statement at the method entrance. Processing of each message is long (several minutes). In my logs, I can see that thread-1 starts the processing and then I can only see thread-1 outputs. Only when thread-1 has finished processing 1 message, I can see thread-2 starts processing the next message, etc. I do NOT have any synchronized block inside my class annotated #MessageEndpoint. I have not managed to get thread-1, thread-2, etc process messages concurrently.
Has anybody experienced something similar?
Look, you say:
After an int-xml:validating-filter and an int-xml:unmarshalling-transformer, I defined a queue channel actionInstructionTransformed.
Now let's go to the QueueChannel and PollingConsumer definitions!
On the other hand, a channel adapter connected to a channel that implements the org.springframework.messaging.PollableChannel interface (e.g. a QueueChannel) will produce an instance of PollingConsumer.
And pay attention that #Poller (PollerMetadata) has taskExecutor option.
By default the TaskScedhuler ask QueueChannel for data periodically according to the trigger configuration. If that is PeriodicTrigger with default options like fixedRate = false, the next poll really happens after the previous one. That's why you see only one Thread.
So, try to configure taskExecutor and your messages from that queue will go in parallel.
The concurrency on the DefaultMessageListenerContainer does not have effect. Because in the end you place all those messages to the QueueChannel. And here a new Threading model starts to work based on the #Poller configuration.

How to prevent other workers from accessing a message which is being currently processed?

I am working on a project that will require multiple workers to access the same queue to get information about a file which they will manipulate. Files are ranging from size, from mere megabytes to hundreds of gigabytes. For this reason, a visibility timeout doesn't seem to make sense because I cannot be certain how long it will take. I have though of a couple of ways but if there is a better way, please let me know.
The message is deleted from the original queue and put into a
‘waiting’ queue. When the program finished processing the file, it
deletes it, otherwise the message is deleted from the queue and put
back into the original queue.
The message id is checked with a database. If the message id is
found, it is ignored. Otherwise the program starts processing the
message and inserts the message id into the database.
Thanks in advance!
Use the default-provided SQS timeout but take advantage of ChangeMessageVisibility.
You can specify the timeout in several ways:
When the queue is created (default timeout)
When the message is retrieved
By having the worker call back to SQS and extend the timeout
If you are worried that you do not know the appropriate processing time, use a default value that is good for most situations, but don't make it so big that things become unnecessarily delayed.
Then, modify your workers to make a ChangeMessageVisiblity call to SQS periodically to extend the timeout. If a worker dies, the message stops being extended and it will reappear on the queue to be processed by another worker.
See: MessageVisibility documentation

Check if Kafka Queue is Empty

Right now I have functionality that writes a couple hundred messages onto a kafka queue. But when all of those messages have been consumed I need to also execute additional functionality. Is there a way to place a listener on a kafka queue to get notified when it has been emptied?
You could solve this two ways, I think:
Kafka's Fetch Response contains a HighwaterMarkOffset, which essentially is an offset of the last message in a partition. You could check whether your message has that offset and if so - you've reached the end. However, this won't work if you have producer and consumer working at the same time - consumer can just consume messages faster and thus stop earlier than you need.
Send a "poison pill" message - say you need to produce 100 messages. Then your producer sends these 100 messages + 1 special message (some UUID for example, but be sure it never appears under normal circumstances in your logic) that would mean "the end". On consumer side you would check whether the received message is a poison pill and shutdown if it is.

Akka's TestProbe expectMsg to match if expected message is among sent

I have a test for particular actor. This actor depends on some other actors, so I use TestProbe() to test in isolation.
My problem is, that I receive more messages then I am interested in testing at this very particular test. For example:
val a = TestProbe()
val b = TestProbe()
val actor = TestActorRef(new MyActor(a.ref, b.ref))
actor ! Message(1, 2)
b.expectMsg(3)
The test fails, because while creating MyActor it sends some kind of "registration" message to ones passed in constructor.
The message 3 arrives eventually, but assertion fails - this is not the first message to arrive.
I would like to avoid asserting more messages than I need for a test - those can change, etc, it is not the scope of a particular test anyway.
As the TestProbe does not contain such methods - I suspect there may be something wrong with my test setup (or rather my project architecture then). I see there are many methods like fishForMessage but all those require a explicit time parameters which seems like irrelevant as my whole test is purely synchronous.
Is there any way to accomplish such test is desired message is just among all the were received? If not, how can my setup can be improved to be easy testable?
The fishForMessage actually fits. All these assertions including expectMsg are asynchronous. expectMsg just uses the preconfigured timeFactor as timeout.
TestActorRef guarantees you only that CallingThreadDispatcher will be used to send messages and execute futures (if they use dispatcher from the test actor), so they will act sequentially til they're use context.dispatcher. Nothing stops some code inside your MyActor from using another dispatcher to send a response, so all checks still should be asynchronous - you just can't get rid of that.

What happens to running processes on a continuous Azure WebJob when website is redeployed?

I've read about graceful shutdowns here using the WEBJOBS_SHUTDOWN_FILE and here using Cancellation Tokens, so I understand the premise of graceful shutdowns, however I'm not sure how they will affect WebJobs that are in the middle of processing a queue message.
So here's the scenario:
I have a WebJob with functions listening to queues.
Message is added to Queue and job begins processing.
While processing, someone pushes to develop, triggering a redeploy.
Assuming I have my WebJobs hooked up to deploy on git pushes, this deploy will also trigger the WebJobs to be updated, which (as far as I understand) will kick off some sort of shutdown workflow in the jobs. So I have a few questions stemming from that.
Will jobs in the middle of processing a queue message finish processing the message before the job quits? Or is any shutdown notification essentially treated as "this bitch is about to shutdown. If you don't have anything to handle it, you're SOL."
If we are SOL, is our best option for handling shutdowns essentially to wrap anything you're doing in the equivalent of DB transactions and implement your shutdown handler in such a way that all changes are rolled back on shutdown?
If a queue message is in the middle of being processed and the WebJob shuts down, will that message be requeued? If not, does that mean that my shutdown handler needs to handle requeuing that message?
Is it possible for functions listening to queues to grab any more queue messages after the Job has been notified that it needs to shutdown?
Any guidance here is greatly appreciated! Also, if anyone has any other useful links on how to handle job shutdowns besides the ones I mentioned, it would be great if you could share those.
After no small amount of testing, I think I've found the answers to my questions and I hope someone else can gain some insight from my experience.
NOTE: All of these scenarios were tested using .NET Console Apps and Azure queues, so I'm not sure how blobs or table storage, or different types of Job file types, would handle these different scenarios.
After a Job has been marked to exit, the triggered functions that are running will have the configured amount of time (grace period) (5 seconds by default, but I think that is configurable by using a settings.job file) to finish before they are exited. If they do not finish in the grace period, the function quits. Main() (or whichever file you declared host.RunAndBlock() in), however, will finish running any code after host.RunAndBlock() for up to the amount of time remaining in the grace period (I'm not sure how that would work if you used an infinite loop instead of RunAndBlock). As far as handling the quit in your functions, you can essentially "listen" to the CancellationToken that you can pass in to your triggered functions for IsCancellationRequired and then handle it accordingly. Also, you are not SOL if you don't handle the quits yourself. Huzzah! See point #3.
While you are not SOL if you don't handle the quit (see point #3), I do think it is a good idea to wrap all of your jobs in transactions that you won't commit until you're absolutely sure the job has ran its course. This way if your function exits mid-process, you'll be less likely to have to worry about corrupted data. I can think of a couple scenarios where you might want to commit transactions as they pass (batch jobs, for instance), however you would need to structure your data or logic so that previously processed entities aren't reprocessed after the job restarts.
You are not in trouble if you don't handle job quits yourself. My understanding of what's going on under the covers is virtually non-existent, however I am quite sure of the results. If a function is in the middle of processing a queue message and is forced to quit before it can finish, HAVE NO FEAR! When the job grabs the message to process, it will essentially hide it on the queue for a certain amount of time. If your function quits while processing the message, that message will "become visible" again after x amount of time, and it will be re-grabbed and ran against the potentially updated code that was just deployed.
So I have about 90% confidence in my findings for #4. And I say that because to attempt to test it involved quick-switching between windows while not actually being totally sure what was going on with certain pieces. But here's what I found: on the off chance that a queue has a new message added to it in the grace period b4 a job quits, I THINK one of two things can happen: If the function doesn't poll that queue before the job quits, then the message will stay on the queue and it will be grabbed when the job restarts. However if the function DOES grab the message, it will be treated the same as any other message that was interrupted: it will "become visible" on the queue again and be reran upon the restart of the job.
That pretty much sums it up. I hope other people will find this useful. Let me know if you want any of this expounded on and I'll be happy to try. Or if I'm full of it and you have lots of corrections, those are probably more welcome!