Difference between a shared variable and a channel - clojure

Apart from facilitating communication between processes, how does using a channel differ from using some form of shared state, an atom for instance?

They are very different:
An atom is a wrapper around a value so that future values are derived by function application (and optionally, you can pass a validation function). From the Clojure reference pages: "Atoms are an efficient way to represent some state that will never need to be coordinated with any other, and for which you wish to make synchronous changes".
Channels were introduced in a talk that described them using a conveyor belt analogy: you put some stuff on one end, they arrive on the consumer end. You can go and grab something from the channel if there is something on it (or wait until an item arrives).
You could use a sequence in an atom as a substitute of a channel, but it would be a poor replacement, most likely to require consumers of how to queue, consume, etc.

Related

At which point should actors' behaviors be split

I'm completely new to Akka. I am having a hard time grasping when I should split what used to be class methods/behavior into akka messages. Many examples show the messages received as one line - println("Howdy").
Let's assume I want to do the following:
Given a predefined set of regular expressions
Given an input stream of sentences from a book. Each message is a sentence.
Perform regular expression on the sentence
Increment count of matches and non-matches for the regular expression
If match, Perform HTTP post the sentence.
What is the guideline that akka experts use in their head to break this up? Would I make each step here a separate message rather than several method calls? In my head, The only thing I would use an akka message for, would be #1 (each message), and #6 (blocking http call). That would make my handling of each sentence actually perform a decent amount of work (but non-blocking work). Would it be similar to when I decide to use async over not using async? Which to me, is only ever when I have the chance for a blocking operation.
I assume that a state you want to track in your actor is the number of matches per regular expression.
In this case, your initial approach is valid in case you don't have a lot of regular expressions. If you have a lot of them, and every sentence goes through every expression, you will perform a lot of work on the actor thread. This work is non-blocking in the sense that there is no I/O, but it is actually prevents the progress of other messages sent to this actor, so it is blocking in this sense. Actors are single-threaded, so if you have a lot of incoming sentences actor's mailbox will start to grow. If you use an unbounded mailbox (which is default) you'll eventually go OOM.
One solution would be to dispatch regex matching to a Future. However, you can't share actor state (which is match count per regex) with that future, because (in general case) it will cause race conditions. To work around this, the result of your future will send another message to your actor with counts that need to be updated.
case class ProcessSentence(s: String)
case class ProcessParseResult(hits: mutable.Map[Regex,Int], s: String)
case class Publish(s: String)
class ParseActor {
val regexHits = Map("\\s+".r -> 0, "foo*".r -> 0)
def receive = {
case ProcessSentence(s) => Future(parseSentence(s, regexHits.keys)).pipeTo(self)
case ProcessParseResult(update, s) =>
// update regexHits map
if(update.values.sum > 0)
self ! Publish(s)
case Publish(s) => Future(/* send http request */)
}
def parseSentence(s: String, regexes: Seq[Regex]): Future[ProcessParseResult] =
Future{ /* match logic */}
}
The way I recommend approaching designing with Akka is to first identify the state in your problem.
Let's start there:
If you have state, evaluate if an actor is a simpler approach than synchronization and locking
If you don't have state, evaluate if you actually need to use an actor
If you have state, then an actor may be a good fit as it can help ensure you safely handle concurrent access to the state in the actor, and its fault tolerance mechanisms will help your application recover if that state becomes corrupt causing application errors.
In your case, a simple counter or two can be handled with Java's atomic integers, so I would actually recommend that you use a basic class instead of an actor. It will be much simpler than using an actor. If you want to return a future, you can use Java8s CompletableFuture or Scala's concurrent.Future and the result will be simpler than using an actor.
With all of that said, if you DO want to use an actor, the design might not warrant multiple actors because you only have one piece of state.
As a general rule, your actors should have only one responsibility - much like good object oriented design. While you might accept a message in your one single actor, you may decide to split the logic into multiple classes still. So you might have your 'RegexActor' and then you might have 'MatchBehavior' implemented with strategy pattern that the RegexActor is passed on construction.
All in, I don't think you need to build a bunch of actors - actors give you asynchronous boundaries between things but you will still benefit from using good OO in the behaviours that the actor has. The actor has one message - that of handling lines of the book - but it has a few behaviours that can be composed in several different classes. I would use basic classes for that behaviour and have the actor delegate to those other classes after it receives the message.
You want things to be simple - there is a cost of loss of type safety and added code when using actors so I recommend you make sure there is a good reason for using it - either state that exists in a concurrent environment or distribution.

What is the best architecture to frequently communicate values between multiple threads?

I am writing an application in C++14 that consists of a master thread and multiple slave threads. The master thread coordinates the slave threads which coordinately perform a search, each exploring a part of the search space. A slave thread sometimes encounters a bound on the search. Then it communicates this bound to the master thread which sends the bound to all other slave threads so that they can possibly narrow their searches.
A slave thread must very frequently check whether there is a new bound available, possibly at the entrance of a loop.
What would be the best way to communicate the bound to the slave threads? I can think of using std::atomic<int>, but I am afraid of the performance implications this has whenever the variable is read inside the loop.
The simplest way here is IMO to not overthink this. Just use a std::mutex for each thread, protecting a std::queue that the boundary information is in. Have the main thread wait on a std::condition_variable that each child can lock, write to a "new boundary" queue , then signals te cv, which the main thread then wakes up and copies the value to each child one at at time. As you said in your question, at the top of their loops, the child threads can check their thread-specific queue to see if there's additional bounding conditions.
You actually don't NEED the "main thread" in this. You could have the children write to all other children's queues directly (still mutex-protected), as long as you're careful to avoid deadlock, it would work that way too.
All of these classes can be seen in the thread support library, with decent documentation here.
Yes there's interrupt-based ways of doing things, but in this case polling is relatively cheap because it's not a lot of threads smashing on one mutex, but mostly thread-specific mutexes, and mutexes aren't all that expensive to lock, check, unlock quickly. You're not "holding" on to them for long periods, and thus it's OK. It's a bit of a test really: do you NEED the additional complexity of lock-free? If it's only a dozen (or less) threads, then probably not.
Basically you could make a bet with your architecture that a single write to a primitive datatype is atomic. As you only have one writer, your program would not break if you use the volatile keyword to prevent compiler optimizations that might perform updates to it only in local caches.
However everybody serious about doing things right(tm) will tell you otherwise. Have a look at this article to get a pretty good riskassessment: http://preshing.com/20130618/atomic-vs-non-atomic-operations/
So if you want to be on the safe side, which I recommend, you need to follow the C++ standard. As the C++ standard does not guarantee any atomicity even for the simplest operations, you are stuck with using std::atomic. But honestly, I don't think it is too bad. Sure there is a lock involved, but you can balance out the reading frequency with the benefit of knowing the new boundary early.
To prevent polling the atomic variable, you could use the POSIX signal mechanism to notify slave threads of an update (make sure it works with the platform you are programming for). If that benefits performance or not needs to be seen.
This is actually very simple. You only have to be aware of how things work to be confident the simple solution is not broken. So, what you need is two things:
1. Be sure the variable is written/read to/from memory every time you access it.
2. Be sure you read it in an atomic way, which means you have to read the full value in one go, or if it is not done naturally, have a cheap test to verify it.
To address #1, you have to declare it volatile. Make sure the volatile keyword is applied to the variable itself. Not it's pointer of anything like that.
To address #2, it depends on the type. On x86/64 accesses to integer types is atomic as long as they are aligned to their size. That is, int32_t has to be aligned to 4 bit boundary, and int64_t has to be aligned to 8 byte boundary.
So you may have something like this:
struct Params {
volatile uint64_t bound __attribute__((aligned(8)));
};
If your bounds variable is more complex (a struct) but still fits in 64 bits, you may union it with uint64_t and use the same attribute and volatile as above.
If it's too big for 64 bit, you will need some sort of a lock to ensure you did not read half stale value. The best lock for your circumstances (single writer, multiple readers) is a sequence lock. A sequence lock is simply an volatile int, like above, that serves as the version of the data. Its value starts from 0 and advances 2 on every update. You increment it by 1 before updating the protected value, and again afterwards. The net result is that even numbers are stable states and odd numbers are transient (value updating). In the readers you do this:
1. Read the version. If not changed - return
2. Read till you get an even number
3. Read the protected variable
4. Read the version again. If you get the same number as before - you're good
5. Otherwise - back to step 2
This is actually one of the topics in my next article. I'll implement that in C++ and let you know. Meanwhile, you can look at the seqlock in the linux kernel.
Another word of caution - you need compiler barriers between your memory accesses so that the compiler does not reorder things it should really not. That's how you do it in gcc:
asm volatile ("":::"memory");

State machine representation

I want to implement the GUI as a state machine. I think there are some benefits and some drawbacks of doing this, but this is not the topic of this questions.
After some reading about this I found several ways of modeling a state machine in C++ and I stuck on 2, but I don't know what method may fit better for GUI modeling.
Represent the State Machine as a list of states with following methods:
OnEvent(...);
OnEnterState(...);
OnExitState(...);
From StateMachine::OnEvent(...) I forward the event to CurrentState::OnEvent(...) and here the decision to make a transition or not is made. On transition I call CurrentState::OnExitState(...), NewState::OnEnterState() and CurrentState = NewState;
With this approach the state will be tightly coupled with actions, but State might get complicated when from one state I can go to multiple states and I have to take different actions for different transitions.
Represent the state machine as list of transitions with following properties:
InitialState
FinalState
OnEvent(...)
DoTransition(...)
From StateMachine::OnEvent(...) I forward the event to all transitions where InitialState has same value as CurrentState in the state machine. If the transition condition is met the loop is stopped, DoTransition method is called and CurrentState set to Transition::FinalState.
With this approach Transition will be very simple, but the number of transition count might get very high. Also it will become harder to track what actions will be done when one state receives an event.
What approach do you think is better for GUI modeling. Do you know other representations that may be better for my problem?
Here is a third option:
Represent the state machine as a transition matrix
Matrix column index represents a state
Matrix row index represents a symbol (see below)
Matrix cell represents the state machihe should transit to. This could be both new state or the same state
Every state has OnEvent method which returns a symbol
From StateMachine::OnEvent(...) events are forwarded to State::OnEvent which returns a symbol - a result of execution. StateMachine then based on current state and returned symbol decides whether
Transition to different state must be made, or
Current state is preserved
Optionally, if transition is made, OnExitState and OnEnterState is called for a corresponsing states
Example matrix for 3 states and 3 symbols
0 1 2
1 2 0
2 0 1
In this example if if machine is in any od the states (0,1,2) and State::OnEvent returns symbol 0 (first row in the matrix) - it stays in the same state
Second row says, that if current state is 0 and returned symbol is 1 transition is made to state 1. For state 1 -> state 2 and for state 2 -> state 0.
Similary third row says that for symbol 2, state 0-> state 2, state 1 -> state 0, state 2 -> state 1
The point of this being:
Number of symbols will likely be much lower than that of states.
States are not aware of each other
All transition are controlled from one point, so the moment you want to handle symbol DB_ERROR differently to NETWORK_ERROR you just change the transition table and don't touch states implementation.
I don't know if this is the kind of answer you are expecting, but I use to deal with such state machines in a straightforward way.
Use a state variable of an enumerated type (the possible states). In every event handler of the GUI, test the state value, for instance using a switch statement. Do whatever processing there needs to be accordingly and set the next value of the state.
Lightweight and flexible. Keeping the code regular makes it readable and "formal".
I'd personally prefer the first method you said. I find the second one to be quite counter-intuitive and overly complicated. Having one class for each state is simple and easy, if then you set the correct event handlers in OnEnterState and remove them in OnExitState your code will be clean and everything will be self contained in the corresponding state, allowing for an easy read.
You will also avoid having huge switch statements to select the right event handler or procedure to call as everything a state does is perfectly visible inside the state itself thus making the state machine code short and simple.
Last but not least, this way of coding is an exact translation from the state machine draw to whatever language you'll use.
I prefer a really simple approach for this kind of code.
An enumeration of states.
Each event handler checks the current state before deciding what action to take. Actions are just composite blocks inside a switch statement or if chain, and set the next state.
When actions become more than a couple lines long or need to be reused, refactor as calls to separate helper methods.
This way there's no extra state machine management metadata structures and no code to manage that metadata. Just your business data and transition logic. And actions can directly inspect and modify all member variables, including the current state.
The downside is that you can't add additional data members localized to one single state. Which is not a real problem unless you have a really large number of states.
I find it also leads to more robust design if you always configure all UI attributes on entry to each state, instead of making assumptions about the previous setting and creating state-exit behaviors to restore invariants before state transitions. This applies regardless of what scheme you use for implementing transitions.
You can also consider modelling the desired behaviour using a Petri net. This would be preferable if you want to implement a more complex behaviour, since it allows you to determine exactly all possible scenarios and prevent deadlocks.
This library might be useful to implement a state machine to control your GUI: PTN Engine

Threading access to various buffers

I'm trying to figure out the best way to do this, but I'm getting a bit stuck in figuring out exactly what it is that I'm trying to do, so I'm going to explain what it is, what I'm thinking I want to do, and where I'm getting stuck.
I am working on a program that has a single array (Image really), which per frame can have a large number of objects placed on an image array. Each object is completely independent of all other objects. The only dependency is the output, in theory possible to have 2 of these objects placed on the same location on the array. I'm trying to increase the efficiency of placing the objects on the image, so that I can place more objects. In order to do that, I'm wanting to thread the problem.
The first step that I have taken towards threading it involves simply mutex protecting the array. All operations which place an object on the array will call the same function, so I only have to put the mutex lock in one place. So far, it is working, but it is not seeing the improvements that I would hope to have. I am hypothesizing that this is because most of the time, the limiting factor is the image write statement.
What I'm thinking I need to do next is to have multiple image buffers that I'm writing to, and to combine them when all of the operations are done. I should say that obscuration is not a problem, all that needs to be done is to simply add the pixel counts together. However, I'm struggling to figure out what mechanism I need to use in order to do this. I have looked at semaphores, but while I can see that they would limit a number of buffers, I can envision a situation in which two or more programs would be trying to write to the same buffer at the same time, potentially leading to inaccuracies.
I need a solution that does not involve any new non-standard libraries. I am more than willing to build the solution, but I would very much appreciate a few pointers in the right direction, as I'm currently just wandering around in the dark...
To help visualize this, imagine that I am told to place, say, balls at various locations on the image array. I am told to place the balls each frame, with a given brightness, location, and size. The exact location of the balls is dependent on the physics from the previous frame. All of the balls must be placed on a final image array, as quickly as they possibly can be. For the purpose of this example, if two balls are on top of each other, the brightness can simply be added together, thus there is no need to figure out if one is blocking the other. Also, no using GPU cards;-)
Psuedo-code would look like this: (Assuming that some logical object is given for location, brightness, and size). Also, assume, that isValidPoint simply finds if the point should be on the circle, given the location and radius of said circle.
global output_array[x_arrLimit*y_arrLimit)
void update_ball(int ball_num)
{
calc_ball_location(ball_num, *location, *brightness, *size); // location, brightness, size all set inside function
place_ball(location,brightness,size)
}
void place_ball(location,brighness,size)
{
get_bounds(location,size,*xlims,*ylims)
for (int x=xlims.min;x<xlims.max;y++)
{
for (int y=ylims.min;y<ylims.max;y++)
{
if (isValidPoint(location,size,x,y))
{
output_array(x,y)+=brightness;
}
}
}
}
The reason you're not seeing any speed up with the current design is that, with a single mutex for the entire buffer, you might as well not bother with threading, as all the objects have to be added serially anyway (unless there's significant processing being done to determine what to add, but it doesn't sound like that's the case). Depending on what it takes to "add an object to the buffer" (do you use scan-line algorithms, flood fill, or something else), you might consider having one mutex per row or a range of rows, or divide the image into rectangular tiles with one mutex per region or something. That would allow multiple threads to add to the image at the same time as long as they're not trying to update the same regions.
OK, you have an image member in some object. Add the, no doubt complex, code to add other image/objects to it. maipulate it, whatever. Aggregate in all the other objects that may be involved, add some command enun to tell threads what op to do and an 'OnCompletion' event to call when done.
Queue it to a pool of threads hanging on the end of a producer-consumer queue. Some thread will get the *object, perform the operation on the image/set and then call the event, (pass the completed *object as a parameter). In the event, you can do what you like, according to the needs of your app. Maybe you will add the processed images into a (thread-safe!!), vector or other container or queue them off to some other thread - whatever.
If the order of processing the images must be preserved, (eg. video stream), you could add an incrementing sequence-number to each object that is submitted to the pool, so enabling your 'OnComplete' handler to queue up 'later' images until all earlier ones have come in.
Since no two threads ever work on the same image, you need no locking while processing. The only locks you should, (may), need are those internal the queues, and they only lock for the time taken to push/pop object pointers to/from the queue - contention will be very rare.

How does LMAX's disruptor pattern work?

I am trying to understand the disruptor pattern. I have watched the InfoQ video and tried to read their paper. I understand there is a ring buffer involved, that it is initialized as an extremely large array to take advantage of cache locality, eliminate allocation of new memory.
It sounds like there are one or more atomic integers which keep track of positions. Each 'event' seems to get a unique id and it's position in the ring is found by finding its modulus with respect to the size of the ring, etc., etc.
Unfortunately, I don't have an intuitive sense of how it works. I have done many trading applications and studied the actor model, looked at SEDA, etc.
In their presentation they mentioned that this pattern is basically how routers work; however I haven't found any good descriptions of how routers work either.
Are there some good pointers to a better explanation?
The Google Code project does reference a technical paper on the implementation of the ring buffer, however it is a bit dry, academic and tough going for someone wanting to learn how it works. However there are some blog posts that have started to explain the internals in a more readable way. There is an explanation of ring buffer that is the core of the disruptor pattern, a description of the consumer barriers (the part related to reading from the disruptor) and some information on handling multiple producers available.
The simplest description of the Disruptor is: It is a way of sending messages between threads in the most efficient manner possible. It can be used as an alternative to a queue, but it also shares a number of features with SEDA and Actors.
Compared to Queues:
The Disruptor provides the ability to pass a message onto another threads, waking it up if required (similar to a BlockingQueue). However, there are 3 distinct differences.
The user of the Disruptor defines how messages are stored by extending Entry class and providing a factory to do the preallocation. This allows for either memory reuse (copying) or the Entry could contain a reference to another object.
Putting messages into the Disruptor is a 2-phase process, first a slot is claimed in the ring buffer, which provides the user with the Entry that can be filled with the appropriate data. Then the entry must be committed, this 2-phase approach is necessary to allow for the flexible use of memory mentioned above. It is the commit that makes the message visible to the consumer threads.
It is the responsibility of the consumer to keep track of the messages that have been consumed from the ring buffer. Moving this responsibility away from the ring buffer itself helped reduce the amount of write contention as each thread maintains its own counter.
Compared to Actors
The Actor model is closer the Disruptor than most other programming models, especially if you use the BatchConsumer/BatchHandler classes that are provided. These classes hide all of the complexities of maintaining the consumed sequence numbers and provide a set of simple callbacks when important events occur. However, there are a couple of subtle differences.
The Disruptor uses a 1 thread - 1 consumer model, where Actors use an N:M model i.e. you can have as many actors as you like and they will be distributed across a fixed numbers of threads (generally 1 per core).
The BatchHandler interface provides an additional (and very important) callback onEndOfBatch(). This allows for slow consumers, e.g. those doing I/O to batch events together to improve throughput. It is possible to do batching in other Actor frameworks, however as nearly all other frameworks don't provide a callback at the end of the batch you need to use a timeout to determine the end of the batch, resulting in poor latency.
Compared to SEDA
LMAX built the Disruptor pattern to replace a SEDA based approach.
The main improvement that it provided over SEDA was the ability to do work in parallel. To do this the Disruptor supports multi-casting the same messages (in the same order) to multiple consumers. This avoids the need for fork stages in the pipeline.
We also allow consumers to wait on the results of other consumers without having to put another queuing stage between them. A consumer can simply watch the sequence number of a consumer that it is dependent on. This avoids the need for join stages in pipeline.
Compared to Memory Barriers
Another way to think about it is as a structured, ordered memory barrier. Where the producer barrier forms the write barrier and the consumer barrier is the read barrier.
First we'd like to understand the programming model it offers.
There are one or more writers. There are one or more readers. There is a line of entries, totally ordered from old to new (pictured as left to right). Writers can add new entries on the right end. Every reader reads entries sequentially from left to right. Readers can't read past writers, obviously.
There is no concept of entry deletion. I use "reader" instead of "consumer" to avoid the image of entries being consumed. However we understand that entries on the left of the last reader become useless.
Generally readers can read concurrently and independently. However we can declare dependencies among readers. Reader dependencies can be arbitrary acyclic graph. If reader B depends on reader A, reader B can't read past reader A.
Reader dependency arises because reader A can annotate an entry, and reader B depends on that annotation. For example, A does some calculation on an entry, and stores the result in field a in the entry. A then move on, and now B can read the entry, and the value of a A stored. If reader C does not depend on A, C should not attempt to read a.
This is indeed an interesting programming model. Regardless of the performance, the model alone can benefit lots of applications.
Of course, LMAX's main goal is performance. It uses a pre-allocated ring of entries. The ring is large enough, but it's bounded so that the system will not be loaded beyond design capacity. If the ring is full, writer(s) will wait until the slowest readers advance and make room.
Entry objects are pre-allocated and live forever, to reduce garbage collection cost. We don't insert new entry objects or delete old entry objects, instead, a writer asks for a pre-existing entry, populate its fields, and notify readers. This apparent 2-phase action is really simply an atomic action
setNewEntry(EntryPopulator);
interface EntryPopulator{ void populate(Entry existingEntry); }
Pre-allocating entries also means adjacent entries (very likely) locate in adjacent memory cells, and because readers read entries sequentially, this is important to utilize CPU caches.
And lots of efforts to avoid lock, CAS, even memory barrier (e.g. use a non-volatile sequence variable if there's only one writer)
For developers of readers: Different annotating readers should write to different fields, to avoid write contention. (Actually they should write to different cache lines.) An annotating reader should not touch anything that other non-dependent readers may read. This is why I say these readers annotate entries, instead of modify entries.
Martin Fowler has written an article about LMAX and the disruptor pattern, The LMAX Architecture, which may clarify it further.
I actually took the time to study the actual source, out of sheer curiosity, and the idea behind it is quite simple. The most recent version at the time of writing this post is 3.2.1.
There is a buffer storing pre-allocated events that will hold the data for consumers to read.
The buffer is backed by an array of flags (integer array) of its length that describes the availability of the buffer slots (see further for details). The array is accessed like a java#AtomicIntegerArray, so for the purpose of this explenation you may as well assume it to be one.
There can be any number of producers. When the producer wants to write to the buffer, an long number is generated (as in calling AtomicLong#getAndIncrement, the Disruptor actually uses its own implementation, but it works in the same manner). Let's call this generated long a producerCallId. In a similar manner, a consumerCallId is generated when a consumer ENDS reading a slot from a buffer. The most recent consumerCallId is accessed.
(If there are many consumers, the call with the lowest id is choosen.)
These ids are then compared, and if the difference between the two is lesser that the buffer side, the producer is allowed to write.
(If the producerCallId is greater than the recent consumerCallId + bufferSize, it means that the buffer is full, and the producer is forced to bus-wait until a spot becomes available.)
The producer is then assigned the slot in the buffer based on his callId (which is prducerCallId modulo bufferSize, but since the bufferSize is always a power of 2 (limit enforced on buffer creation), the actuall operation used is producerCallId & (bufferSize - 1)). It is then free to modify the event in that slot.
(The actual algorithm is a bit more complicated, involving caching recent consumerId in a separate atomic reference, for optimisation purposes.)
When the event was modified, the change is "published". When publishing the respective slot in the flag array is filled with the updated flag. The flag value is the number of the loop (producerCallId divided by bufferSize (again since bufferSize is power of 2, the actual operation is a right shift).
In a similar manner there can be any number of consumers. Each time a consumer wants to access the buffer, a consumerCallId is generated (depending on how the consumers were added to the disruptor the atomic used in id generation may be shared or separate for each of them). This consumerCallId is then compared to the most recent producentCallId, and if it is lesser of the two, the reader is allowed to progress.
(Similarly if the producerCallId is even to the consumerCallId, it means that the buffer is empety and the consumer is forced to wait. The manner of waiting is defined by a WaitStrategy during disruptor creation.)
For individual consumers (the ones with their own id generator), the next thing checked is the ability to batch consume. The slots in the buffer are examined in order from the one respective to the consumerCallId (the index is determined in the same manner as for producers), to the one respective to the recent producerCallId.
They are examined in a loop by comparing the flag value written in the flag array, against a flag value generated for the consumerCallId. If the flags match it means that the producers filling the slots has commited their changes. If not, the loop is broken, and the highest commited changeId is returned. The slots from ConsumerCallId to received in changeId can be consumed in batch.
If a group of consumers read together (the ones with shared id generator), each one only takes a single callId, and only the slot for that single callId is checked and returned.
From this article:
The disruptor pattern is a batching queue backed up by a circular
array (i.e. the ring buffer) filled with pre-allocated transfer
objects which uses memory-barriers to synchronize producers and
consumers through sequences.
Memory-barriers are kind of hard to explain and Trisha's blog has done the best attempt in my opinion with this post: http://mechanitis.blogspot.com/2011/08/dissecting-disruptor-why-its-so-fast.html
But if you don't want to dive into the low-level details you can just know that memory-barriers in Java are implemented through the volatile keyword or through the java.util.concurrent.AtomicLong. The disruptor pattern sequences are AtomicLongs and are communicated back and forth among producers and consumers through memory-barriers instead of locks.
I find it easier to understand a concept through code, so the code below is a simple helloworld from CoralQueue, which is a disruptor pattern implementation done by CoralBlocks with which I am affiliated. In the code below you can see how the disruptor pattern implements batching and how the ring-buffer (i.e. circular array) allows for garbage-free communication between two threads:
package com.coralblocks.coralqueue.sample.queue;
import com.coralblocks.coralqueue.AtomicQueue;
import com.coralblocks.coralqueue.Queue;
import com.coralblocks.coralqueue.util.MutableLong;
public class Sample {
public static void main(String[] args) throws InterruptedException {
final Queue<MutableLong> queue = new AtomicQueue<MutableLong>(1024, MutableLong.class);
Thread consumer = new Thread() {
#Override
public void run() {
boolean running = true;
while(running) {
long avail;
while((avail = queue.availableToPoll()) == 0); // busy spin
for(int i = 0; i < avail; i++) {
MutableLong ml = queue.poll();
if (ml.get() == -1) {
running = false;
} else {
System.out.println(ml.get());
}
}
queue.donePolling();
}
}
};
consumer.start();
MutableLong ml;
for(int i = 0; i < 10; i++) {
while((ml = queue.nextToDispatch()) == null); // busy spin
ml.set(System.nanoTime());
queue.flush();
}
// send a message to stop consumer...
while((ml = queue.nextToDispatch()) == null); // busy spin
ml.set(-1);
queue.flush();
consumer.join(); // wait for the consumer thread to die...
}
}