Spring Integration Multiple consumers not processing concurrently - concurrency

I am using Spring Integration with ActiveMQ. I defined a DefaultMessageListenerContainer with maxConcurrentConsumers = 5. It is referenced in a . After an int-xml:validating-filter and an int-xml:unmarshalling-transformer, I defined a queue channel actionInstructionTransformed. And I have got a poller for this queue channel. When I start my application, in the ActiveMQ console, I can see that a connection is created and inside five sessions.
Now, I have got a #MessageEndpoint with a method annotated
#ServiceActivator(inputChannel = "actionInstructionTransformed", poller = #Poller(value = "customPoller")).
I have got a log statement at the method entrance. Processing of each message is long (several minutes). In my logs, I can see that thread-1 starts the processing and then I can only see thread-1 outputs. Only when thread-1 has finished processing 1 message, I can see thread-2 starts processing the next message, etc. I do NOT have any synchronized block inside my class annotated #MessageEndpoint. I have not managed to get thread-1, thread-2, etc process messages concurrently.
Has anybody experienced something similar?

Look, you say:
After an int-xml:validating-filter and an int-xml:unmarshalling-transformer, I defined a queue channel actionInstructionTransformed.
Now let's go to the QueueChannel and PollingConsumer definitions!
On the other hand, a channel adapter connected to a channel that implements the org.springframework.messaging.PollableChannel interface (e.g. a QueueChannel) will produce an instance of PollingConsumer.
And pay attention that #Poller (PollerMetadata) has taskExecutor option.
By default the TaskScedhuler ask QueueChannel for data periodically according to the trigger configuration. If that is PeriodicTrigger with default options like fixedRate = false, the next poll really happens after the previous one. That's why you see only one Thread.
So, try to configure taskExecutor and your messages from that queue will go in parallel.
The concurrency on the DefaultMessageListenerContainer does not have effect. Because in the end you place all those messages to the QueueChannel. And here a new Threading model starts to work based on the #Poller configuration.

Related

DDD - Concurrency and Command retrying with side-effects

I am developing an event-sourced Electric Vehicle Charging Station Management System, which is connected to several Charging Stations. In this domain, I've come up with an aggregate for the Charging Station, which includes the internal state of the Charging Station(whether it is network-connected, if a car is charging using the station's connectors).
The station notifies me about its state through messages defined in a standardized protocol:
Heartbeat: whether the station is still "alive"
StatusNotification: whether the station has encountered an error(under voltage), or if everything is correct
And my server can send commands to this station:
RemoteStartTransaction: tells the station to unlock and reserve one of its connectors, for a car to charge using the connector.
I've developed an Aggregate for this Charging Station. It contains the internal entities of its connector, whether it's charging or not, if it has a problem in the power system, ...
And the Aggregate, which its memory representation resides in the server that I control, not in the Charging Station itself, has a StationClient service, which is responsible for sending these commands to the physical Charging Station(pseudocode):
class StationAggregate {
stationClient: StationClient
URL: string
connector: Connector[]
unlock(connectorId) {
if this.connectors.find(connectorId).isAvailableToBeUnlocked() {
return ErrorConnectorNotAvailable
}
error = this.stationClient.sendRemoteStartTransaction(this.URL, connectorId)
if error {
return ErrorStationRejectedUnlock
}
this.applyEvents([
StationUnlockedEvent(connectorId, now())
])
return Ok
}
receiveHeartbeat(timestamp) {
this.applyEvents([
StationSentHeartbeat(timestamp)
])
return Ok
}
}
I am using a optimistic concurrency, which means that, I load the Aggregate from a list of events, and I store the current version of the Aggregate in its memory representation: StationAggregate in version #2032, when a command is successfully processed and event(s) applied, it would the in version #2033, for example. In that way, I can put a unique constraint on the (StationID, Version) tuple on my persistence layer, and guarantee that only one event is persisted.
If by any chance, occurs a receival of a Heartbeat message, and the receival of a Unlock command. In both threads, they would load the StationAggregate and would be both in version X, in the case of the Heartbeat receival, there would be no side-effects, but in the case of the Unlock command, there would be a side-effect that tells the physical Charging Station to be unlocked. However as I'm using optimistic concurrency, that StationUnlocked event could be rejected from the persistence layer. I don't know how I could handle that, as I can't retry the command, because it its inherently not idempotent(as the physical Station would reject the second request)
I don't know if I'm modelling something wrong, or if it's really a hard domain to model.
I am not sure I fully understand the problem, but the idea of optimistic concurrency is to prevent writes in case of a race condition. Versions are used to ensure that your write operation has the version that is +1 from the version you've got from the database before executing the command.
So, in case there's a parallel write that won and you got the wrong version exception back from the event store, you retry the command execution entirely, meaning you read the stream again and by doing so you get the latest state with the new version. Then, you give the command to the aggregate, which decides if it makes sense to perform the operation or not.
The issue is not particularly related to Event Sourcing, it is just as relevant for any persistence and it is resolved in the same way.
Event Sourcing could bring you additional benefits since you know what happened. Imagine that by accident you got the Unlock command twice. When you got the "wrong version" back from the store, you can read the last event and decide if the command has already been executed. It can be done logically (there's no need to unlock if it's already unlocked, by the same customer), technically (put the command id to the event metadata and compare), or both ways.
When handling duplicate commands, it makes sense to ensure a decent level of idempotence of the command handling, ignore the duplicate and return OK instead of failing to the user's face.
Another observation that I can deduce from the very limited amount of information about the domain, is that heartbeats are telemetry and locking and unlocking are business. I don't think it makes a lot of sense to combine those two distinctly different things in one domain object.
Update, following the discussion in comments:
What you got with sending the command to the station at the same time as producing the event, is the variation of two-phase commit. Since it's not executed in a transaction, any of the two operations could fail and lead the system to an inconsistent state. You either don't know if the station got the command to unlock itself if the command failed to send, or you don't know that it's unlocked if the event persistence failed. You only got as far as the second operation, but the first case could happen too.
There are quite a few ways to solve it.
First, solving it entirely technical. With MassTransit, it's quite easy to fix using the Outbox. It will not send any outgoing messages until the consumer of the original message is fully completed its work. Therefore, if the consumer of the Unlock command fails to persist the event, the command will not be sent. Then, the retry filter would engage and the whole operation would be executed again and you already get out of the race condition, so the operation would be properly completed.
But it won't solve the issue when your command to the physical station fails to send (I reckon it is an edge case).
This issue can also be easily solved and here Event Sourcing is helpful. You'd need to convert sending the command to the station from the original (user-driven) command consumer to the subscriber. You subscribe to the event stream of StationUnlocked event and let the subscriber send commands to the station. With that, you would only send commands to the station if the event was persisted and you can retry sending the command as many times as you'd need.
Finally, you can solve it in a more meaningful way and change the semantics. I already mentioned that heartbeats are telemetry messages. I could expect the station also to respond to lock and unlock commands, telling you if it actually did what you asked.
You can use the station telemetry to create a representation of the physical station, which is not a part of the aggregate. In fact, it's more like an ACL to the physical world, represented as a read model.
When you have such a mirror of the physical station on your side, when you execute the Unlock command in your domain, you can engage a domain server to consult with the current station state and make a decision. If you find out that the station is already unlocked and the session id matches (yes, I remember our previous discussion :)) - you return OK and safely ignore the command. If it's locked - you proceed. If it's unlocked and the session id doesn't match - it's obviously an error and you need to do something else.
In this last option, you would clearly separate telemetry processing from the business so you won't have heartbeats impact your domain model, so you really won't have the versioning issue. You also would always have a place to look at to understand what is the current state of the physical station.

How to implement long running gRPC async streaming data updates in C++ server

I'm creating an async gRPC server in C++. One of the methods streams data from the server to clients - it's used to send data updates to clients. The frequency of the data updates isn't predictable. They could be nearly continuous or as infrequent as once per hour. The model used in the gRPC example with the "CallData" class and the CREATE/PROCESS/FINISH states doesn't seem like it would work very well for that. I've seen an example that shows how to create a 'polling' loop that sleeps for some time and then wakes up to check for new data, but that doesn't seem very efficient.
Is there another way to do this? If I use the "CallData" method can it block in the 'PROCESS' state until there's data (which probably wouldn't be my first choice)? Or better, can I structure my code so I can notify a gRPC handler when data is available?
Any ideas or examples would be appreciated.
In a server-side streaming example, you probably need more states, because you need to track whether there is currently a write already in progress. I would add two states, one called WRITE_PENDING that is used when a write is in progress, and another called WRITABLE that is used when a new message can be sent immediately. When a new message is produced, if you are in state WRITABLE, you can send immediately and go into state WRITE_PENDING, but if you are in state WRITE_PENDING, then the newly produced message needs to go into a queue to be sent after the current write finishes. When a write finishes, if the queue is non-empty, you can grab the next message from the queue and immediately start a write for it; otherwise, you can just go into state WRITABLE and wait for another message to be produced.
There should be no need to block here, and you probably don't want to do that anyway, because it would tie up a thread that should otherwise be polling the completion queue. If all of your threads wind up blocked that way, you will be blind to new events (such as new calls coming in).
An alternative here would be to use the C++ sync API, which is much easier to use. In that case, you can simply write straight-line blocking code. But the cost is that it creates one thread on the server for each in-progress call, so it may not be feasible, depending on the amount of traffic you're handling.
I hope this information is helpful!

Akka wait till all messages are processed in whole ActorSystem

is there any way to wait till all messages in mailboxes(and stashes) are processed in whole ActorSystem?
I'm trying to develop architecture of integration tests for system based on Akka. And it would be convenient to have method something like actorSystem.awaitAllWorkDone(), in this case arbitrary test would look like:
Start whole environment consisting of many actors
Send message triggering functionality being tested
Call actorSystem.awaitAllWorkDone(). Which waits till all messages are processed, including message sent from step 2 and all inner communication messages which it causes. For example message from step 2 fires 10 other messages sent to other actors, so it should wait for them too.
Now the system in "consistent" state and I can verify it's state. Like verify that all required changes were made in db or all required side-effects were made.
Actually, step 3 can be replaced with waiting for message specific for this test which indicates that all inner-communication are completed. But whole point is to have one "await" method for any test case, and not to searching which message for this test case is indicator that all inner-communication are completed.
Anybody have an idea how to achieve this?

How to prevent other workers from accessing a message which is being currently processed?

I am working on a project that will require multiple workers to access the same queue to get information about a file which they will manipulate. Files are ranging from size, from mere megabytes to hundreds of gigabytes. For this reason, a visibility timeout doesn't seem to make sense because I cannot be certain how long it will take. I have though of a couple of ways but if there is a better way, please let me know.
The message is deleted from the original queue and put into a
‘waiting’ queue. When the program finished processing the file, it
deletes it, otherwise the message is deleted from the queue and put
back into the original queue.
The message id is checked with a database. If the message id is
found, it is ignored. Otherwise the program starts processing the
message and inserts the message id into the database.
Thanks in advance!
Use the default-provided SQS timeout but take advantage of ChangeMessageVisibility.
You can specify the timeout in several ways:
When the queue is created (default timeout)
When the message is retrieved
By having the worker call back to SQS and extend the timeout
If you are worried that you do not know the appropriate processing time, use a default value that is good for most situations, but don't make it so big that things become unnecessarily delayed.
Then, modify your workers to make a ChangeMessageVisiblity call to SQS periodically to extend the timeout. If a worker dies, the message stops being extended and it will reappear on the queue to be processed by another worker.
See: MessageVisibility documentation

How to design a state machine in face of non-blocking I/O?

I'm using Qt framework which has by default non-blocking I/O to develop an application navigating through several web pages (online stores) and carrying out different actions on these pages. I'm "mapping" specific web page to a state machine which I use to navigate through this page.
This state machine has these transitions;
Connect, LogIn, Query, LogOut, Disconnect
and these states;
Start, Connecting, Connected, LoggingIn, LoggedIn, Querying, QueryDone, LoggingOut, LoggedOut, Disconnecting, Disconnected
Transitions from *ing to *ed states (Connecting->Connected), are due to LoadFinished asynchronous network events received from network object when currently requested url is loaded. Transitions from *ed to *ing states (Connected->LoggingIn) are due to events send by me.
I want to be able to send several events (commands) to this machine (like Connect, LogIn, Query("productA"), Query("productB"), LogOut, LogIn, Query("productC"), LogOut, Disconnect) at once and have it process them. I don't want to block waiting for the machine to finish processing all events I sent to it. The problem is they have to be interleaved with the above mentioned network events informing machine about the url being downloaded. Without interleaving machine can't advance its state (and process my events) because advancing from *ing to *ed occurs only after receiving network type of event.
How can I achieve my design goal?
EDIT
The state machine I'm using has its own event loop and events are not queued in it so could be missed by machine if they come when the machine is busy.
Network I/O events are not posted directly to neither the state machine nor the event queue I'm using. They are posted to my code (handler) and I have to handle them. I can forward them as I wish but please have in mind remark no. 1.
Take a look at my answer to this question where I described my current design in details. The question is if and how can I improve this design by making it
More robust
Simpler
Sounds like you want the state machine to have an event queue. Queue up the events, start processing the first one, and when that completes pull the next event off the queue and start on that. So instead of the state machine being driven by the client code directly, it's driven by the queue.
This means that any logic which involves using the result of one transition in the next one has to be in the machine. For example, if the "login complete" page tells you where to go next. If that's not possible, then the event could perhaps include a callback which the machine can call, to return whatever it needs to know.
Asking this question I already had a working design which I didn't want to write about not to skew answers in any direction :) I'm going to describe in this pseudo answer what the design I have is.
In addition to the state machine I have a queue of events. Instead of posting events directly to the machine I'm placing them in the queue. There is however problem with network events which are asynchronous and come in any moment. If the queue is not empty and a network event comes I can't place it in the queue because the machine will be stuck waiting for it before processing events already in the queue. And the machine will wait forever because this network event is waiting behind all events placed in the queue earlier.
To overcome this problem I have two types of messages; normal and priority ones. Normal ones are those send by me and priority ones are all network ones. When I get network event I don't place it in the queue but instead I send it directly to the machine. This way it can finish its current task and progress to the next state before pulling the next event from the queue of events.
It works designed this way only because there is exactly 1:1 interleave of my events and network events. Because of this when the machine is waiting for a network event it's not busy doing anything (so it's ready to accept it and does not miss it) and vice versa - when the machine waits for my task it's only waiting for my task and not another network one.
I asked this question in hope for some more simple design than what I have now.
Strictly speaking, you can't. Because you only have state "Connecting", you don't know whether you need top login afterwards. You'd have to introduce a state "ConnectingWithIntentToLogin" to represent the result of a "Connect, then Login" event from the Start state.
Naturally there will be a lot of overlap between the "Connecting" and the "ConnectingWithIntentToLogin" states. This is most easily achieved by a state machine architecture that supports state hierarchies.
--- edit ---
Reading your later reactions, it's now clear what your actual problem is.
You do need extra state, obviously, whether that's ingrained in the FSM or outside it in a separate queue. Let's follow the model you prefer, with extra events in a queue. The rick here is that you're wondering how to "interleave" those queued events vis-a-vis the realtime events. You don't - events from the queue are actively extracted when entering specific states. In your case, those would be the "*ed" states like "Connected". Only when the queue is empty would you stay in the "Connected" state.
If you don't want to block, that means you don't care about the network replies. If on the other hand the replies interest you, you have to block waiting for them. Trying to design your FSM otherwise will quickly lead to your automaton's size reaching infinity.
How about moving the state machine to a different thread, i. e. QThread. I would implent a input queue in the state machine so I could send queries non blocking and a output queue to read the results of the queries. You could even call back a slotted function in your main thread via connect(...) if a result of a query arrives, Qt is thread safe in this regard.
This way your state machine could block as long as it needs without blocking your main program.
Sounds like you just want to do a list of blocking I/O in the background.
So have a thread execute:
while( !commands.empty() )
{
command = command.pop_back();
switch( command )
{
Connect:
DoBlockingConnect();
break;
...
}
}
NotifySenderDone();