How to understand linearizability a distributed system? - concurrency

All literatures I know regarding linearizability explain it as something related with real time like this:
The execution history can be reordered to be equivalent to some legal sequential history s.
The history s must be consistent with the real time order.
The concept of linearizability is originally defined in this paper, which considers the correctness of concurrent objects accessed in a multi-processor environment. So the real time can be understood as the wall clock of the computer.
But in a distributed system the clock is not synchronized, how to understand the real time order?

Related

Logical Time, Lamport Timestamps and Vector Clocks in distributed systems

I am currently studying Distributed Systems for an exam. I think I have understood all the principles so far, but I go crazy when it comes to the topics of logical time, lamport timestamps and vector clocks. I just can't get an overview and connections to the topics. Here is my index card, which I would like to write...
Logical Time
enables coordination of events without physical clock
achieved by: ordering events into "happened before" relations
each process knows order of local events, as timestamps are assigned to every event
these timestamps can be lamport timestamps:
or vector clocks:
I just don't know what to put behind lamport timestamps and vector clocks and generally what enables what, what implements what and why one can make competition visible and the other not. Can someone help me get an overview?
To look up definitions on wikipedia and reading scientific work has helped not that much...
In distributed systems it is very hard to keep wall clocks in sync. Different applications have different tolerance for clock drift. For example, some distributed systems use clock to resolve conflicts (last update wins) and they accept the risk that clocks might not be in sync. But in some cases there are no tolerance.
As you correctly said, logical clocks were invented to address timing in distributed systems. There are two main problems to resolve: a) events needs to be totally ordered and b) detect concurrent events. Lamport timestamp addresses the total order problem; and vector clock helps to detect concurrent events.
So if I would create an index card, I would actually create several of them:
Global/Wall clock
allows events to be ordered globally, but the risk is that clocks will get out of sync
worth exploring how Cassandra deals with Last Updates Win strategy
Lamport timestamp
this is a logical clock
it allows total order of events
(all other info on lamport timestamp)
Version clock
this is a logical clock
it allows detecting concurrent events
(...note what "concurrent" means)
all other info on vector clock
not to be confused with version vector
Version vector
...

Is callgrind profiling influenced by other processes?

I'd like to profile my application using callgrind. Now, since it takes a very long time, in the meanwhile I go on with web-browsing, compiling and other intensive tasks on the same machine.
Am I biasing the profiling results? I'm expecting that, since valgrind uses a simulated CPU, other external processes should not interfere with valgrind execution. Am I right?
By default, Callgrind does not record anything related to time, so you can expect all collected metrics to (mostly) be independent of other processes on the machine. As the Callgrind manual states,
By default, the collected data consists of the number of instructions executed, their relationship to source lines, the caller/callee relationship between functions, and the numbers of such calls.
As such, the metrics Callgrind reports should only depend on what instructions the program is executing on the (simulated) CPU - not on how much time such instructions take. Indeed, many times the output of Callgrind can be somewhat misleading, as the simulated CPU might operate different to the real one (particularly when it comes to branch prediction).
The Callgrind paper presented at ICCS 2004 is very clear about this as well:
We note that the simulation is not able to predict consumed wall clock time, as this would need a detailed simulation of the microarchitecture.
In any case, however, the simulated CPU is unaffected by what the real CPU is doing.
The reason is straightforward.
Like you said, your program is not executed on your machine at all.
Instead, at runtime, Valgrind dynamically translates your program, that is, it disassembles the binary into "UCode" for an simulated machine, adds analysis code (called instrumentation), then generates binary code that executes the simulation.
The addition of analysis code is what makes instruction counting (in Callgrind), memory checking (in Memcheck), and all other plugins possible.
Therein lies the twist, however.
Naturally there are limits to how isolated the program can run in such a dynamic simulation.
First, your program might interact with other programs.
While the time spent for doing so is irrelevant (as it is not accounted for), the return codes of inter-process communication can certainly change, depending on what else is going on in the system.
Second, most system calls need to be run untranslated and their return codes can change as well -- leading to different execution paths of your program and, thus, slightly different metrics being collected. (As an aside, Calgrind offers an option to record the wall clock time spent during syscalls, which will always be affected by what else goes on in the system).
More details about these restrictions can be found in the PhD Dissertation of Nicholas Nethercote ("Dynamic Binary Analysis and Instrumentation").

Does "real-time" constraints prevent the use of a task scheduler?

By task scheduler I mean any implementation of worker threads pool which distribute work to the threads according to whatever algorithm they are designed with. (like Intel TBB)
I know that "real-time" constraints imply that work gets done in predictable time (I'm not talking about speed). So my guess is that using a task scheduler, which, as far as I know, can't guarantee that some task will be executed before a given time, makes the application impossible to use in these constraints.
Or am I missing something? Is there a way to have both? Maybe by forcing assumptions on the quantity of data that can be processed? Or maybe there are predictable task schedulers?
I'm talking about "hard" real time constraints, not soft real time (like video games).
To clarify:
It is known that there are features in C++ that are not possible to use in this kind of context: new, delete, throw, dynamic_cast. They are not predictable (you don't know how much time can be spent on one of these operations, it depends on too much parameters that are not even known before execution).
You can't really use them in real time contexts.
What I ask is, does task schedulers have the same unpredictability that would make them unusable in real-time applications?
Yes, it can be done, but no it's not trivial, and yes there are limits.
You can write the scheduler to guarantee (for example) that an interrupt handler, exception handler (etc.) is guaranteed to be invoked without a fixed period of time from when it occurs. You can guarantee that any given thread will (for example) get at least X milliseconds of CPU time out of any given second (or suitable fraction of a second).
To enforce the latter, you generally need admittance criteria -- an ability for the scheduler to say "sorry, but I can't schedule this as a real-time thread, because the CPU is already under too much load.
In other cases, all it does is guarantee that at least (say) 99% of CPU time will be given the real-time tasks (if any exist) and it's up to whomever designs the system on top of that to schedule few enough real-time tasks that this will ensure they all finish quickly enough.
I feel obliged to add that the "hardness" of real-time requirements is almost entirely orthogonal to the response speed needed. Rather, it's almost entirely about the seriousness of the consequences of being late.
Just for example, consider a nuclear power plant. For a lot of what happens, you're dealing with time periods on the order of minutes, or in some cases even hours. Filling a particular chamber with, say, half a million gallons of water just isn't going to happen in microseconds or milliseconds.
At the same time, the consequences of a later answer can be huge -- quite possibly causing not just a few deaths like hospital equipment could, but potentially hundreds or even thousands of deaths, hundreds of millions in damage, etc. As such, it's about as "hard" as real-time requirements get, even though the deadlines are unusually "loose" by most typical standards.
In the other direction, digital audio playback has much tighter limits. Delays or dropouts can be quite audible down to a fraction of a millisecond in some cases. At the same time, unless you're providing sound processing for a large concert (or something on that order) the consequences of a dropout will generally be a moment's minor annoyance on the part of a user.
Of course, it's also possible to combine the two -- for an obvious example, in high-frequency trading, deadlines may well be in the order of microseconds (or so) and the loss from missing a deadline could easily be millions or tens of millions of (dollars|euros|pounds|etc.)
The term real-time is quite flexible. "Hard real-time" tends to mean things where a few tens of microseconds make the difference between "works right" and "doesn't work right. Not all "real-time" systems require that sort of real-time-ness.
I once worked on a radio-base-station for mobile phones. One of the devices on the board had an interrupt that fired every 2-something milliseconds. For correct operation (not losing calls), we had to deal with the interrupt, that is, do the work inside the interrupt and write the hardware registers with the new values, within 100 microseconds - if we missed, there would be dropped calls. If the interrupt wasn't taken after 160 microseconds, the system would reboot. That is "hard real-time", especially as the processor was just running at a few tens of MHz.
If you produce a video-player, it requires real-time in the a few milliseconds range.
A "display stock prices" probably can be within the 100ms range.
For a webserver it is probably acceptable to respond within 1-2seconds without any big problems.
Also, there is a difference between "worst case worse than X means failure" (like the case above with 100 microseconds or dropped calls - that's bad if it happens more than once every few weeks - and even a few times a year is really something that should be fixed). This is called "Hard real-time".
But other systems, missing your deadline means "Oh, well, we have to do that over again" or "a frame of video flickered a bit", as long as it doesn't happen very often, it's probably OK. This is called "soft real-time".
A lot of modern hardware will make "hard real-time" (the 10s or 100 microsecond range) difficult, because the graphics processor will simply stop the processor from accessing memory, or if the processor gets hot, the stopclk pin is pulled for 100 microseconds...
Most modern OS's, such as Linux and Windows, aren't really meant to be "hard real-time". There are sections of code that does disable interrupt for longer than 100 microseconds in some parts of these OS's.
You can almost certainly get some good "soft real-time" (that is, missing the deadline isn't a failure, just a minor annoyance) out of a mainstream modern OS with modern hardware. It'll probably require either modifications to the OS or a dedicated real-time OS (and perhaps suitable special hardware) to make the system do hard real-time.
But only a few things in the world requires that sort of hard real-time. Often the hard real-time requirements are dealt with by hardware - for example, the next generation of radio-base-stations that I described above, had more clever hardware, so you just needed to give it the new values within the next 2-something milliseconds, and you didn't have the "mad rush to get it done in a few tens of microseconds". In a modern mobile phone, the GSM or UMTS protocol is largely dealt with by a dedicated DSP (digital signal processor).
A "hard real-time" requirement is where the system is really failing if a particular deadline (or set of deadlines) can't be met, even if the failure to meet deadlines happens only once. However, different systems have different systems have different sensitivity to the actual time that the deadline is at (as Jerry Coffin mentions). It is almost certainly possible to find cases where a commercially available general purpose OS is perfectly adequate in dealing with the real-time requirements of a hard real-time system. It is also absolutely sure that there are other cases where such hard real-time requirements are NOT possible to meet without a specialized system.
I would say that if you want sub-millisecond guarantees from the OS, then Desktop Windows or Linux are not for you. This is really down to the overall philosophy of the OS and scheduler design, and to build a hard real-time OS requires a lot of thought about locking and potential for one thread to block another thread, from running, etc.
I don't think there is ONE answer that applies to your question. Yes, you can certainly use thread-pools in a system that has hard real-time requirements. You probably can't do it on a sub-millisecond basis unless there is specific support for this in the OS. And you may need to have dedicated threads and processes to deal with the highest priority real-time behaviour, which is not part of the thread-pool itself.
Sorry if this isn't saying "Yes" or "No" to your answer, but I think you will need to do some research into the actual behaviour of the OS, and see what sort of guarantees it can give (worst case). You will also have to decide what is the worse case scenario, and what happens if you miss a deadline - are (lots of) people dying (plane falling out of the sky), or are some banker going to lose millions, is the green lights going to come on at the same time on two directions on a road crossing or is it some bad sound coming out of a speaker?
"Real time" doesn't just mean "fast", it means that the system can respond to meet deadlines in the real world. Those deadlines depend on what you're dealing with in the real world.
Whether or not a task finishes in a particular timeframe is a characteristic of the task, not the scheduler. The scheduler might decide which task gets resources, and if a task hasn't finished by a deadline it might be stopped or have its resource usage constrained so that other tasks can meet their deadlines.
So, the answer to your question is that you need to consider the workload, deadlines and the scheduler together, and construct your system to meet your requirements. There is no magic scheduler that will take arbitrary tasks and make them complete in a predictable time.
Update:
A task scheduler can be used in real-time systems if it provides the guarantees you need. As others have said, there are task schedulers that provide those guarantees.
On the comments: The issue is the upper bound on time taken.
You can use new and delete if you overload them to have the performance characteristics you are after; the problem isn't new and delete, it is dynamic memory allocation. There is no requirement that new and delete use a general-purpose dynamic allocator, you can use them to allocate out of a statically allocated pool sized appropriately for your workload with deterministic behaviour.
On dynamic_cast: I tend not to use it, but I don't think it's performance is non-deterministic to the extent that it should be banned in real-time code. This is an example of the same issue: understanding worst-case performance is important.

Measuring parallel computation time for interdependent threads

I have a question concerning runtime measurements in parallel programs (I used C++ but I think the question is more general).
Some short explanations: 3 threads are running parallel (pthread), solving the same problem in different ways. Each thread may pass information to the other thread (e.g. partial solutions obtained by the one thread but not by the other, yet) for speeding up the other threads, depending on his own status / available information in his own calculation. The whole process stops as soon as the first thread is ready.
Now I would like to have a unique time measurement for evaluating the runtime from start until the problem is solved. ( In the end, I want to determine if using synergy effects through a parallel calculation is faster then calculation on a single thread).
In my eyes, the problem is, that (because of the operating system pausing / unpausing the single threads), the point when information is passed in the process is not deterministic in each process' state. That means, a certain information is acquired after xxx units of cpu time on thread 1, but it can not be controlled, whether thread 2 receives this information after yyy or zzz units of cpu time spent in its calculations. Assumed that this information would have finished thread 2's calculation in any case, the runtime of thread 2 was either yyy or zzz, depending on the operating system's action.
What can I do for obtaining a deterministic behaviour for runtime comparisons? Can I order the operation system to run each thread "undisturbed" (on a multicore machine)? Is there something I can do on implementation (c++) - basis?
Or are there other concepts for evaluating runtime (time gain) of such implementations?
Best regards
Martin
Any time someone uses the terms 'deterministic' and 'multicore' in the same sentence, it sets alarm bells ringing :-)
There are two big sources of non-determinism in your program: 1) the operating system, which adds noise to thread timings through OS jitter and scheduling decisions; and 2) the algorithm, because the program follows a different path depending on the order in which communication (of the partial solutions) occurs.
As a programmer, there's not much you can do about OS noise. A standard OS adds a lot of noise even for a program running on a dedicated (quiescent) node. Special purpose operating systems for compute nodes go some way to reducing this noise, for example Blue Gene systems exhibit significantly less OS noise and therefore less variation in timings.
Regarding the algorithm, you can introduce determinism to your program by adding synchronisation. If two threads synchronise, for example to exchange partial solutions, then the ordering of the computation before and after the synchronisation is deterministic. Your current code is asynchronous, as one thread 'sends' a partial solution but does not wait for it to be 'received'. You could convert this to a deterministic code by dividing the computation into steps and synchronising between threads after each step. For example, for each thread:
Compute one step
Record partial solution (if any)
Barrier - wait for all other threads
Read partial solutions from other threads
Repeat 1-4
Of course, we would not expect this code to perform as well, because now each thread has to wait for all the other threads to complete their computation before proceeding to the next step.
The best approach is probably to just accept the non-determinism, and use statistical methods to compare your timings. Run the program many times for a given number of threads and record the range, mean and standard deviation of the timings. It may be enough for you to know e.g. the maximum computation time across all runs for a given number of threads, or you may need a statistical test such as Student's t-test to answer more complicated questions like 'how certain is it that increasing from 4 to 8 threads reduces the runtime?'. As DanielKO says, the fluctuations in timings are what will actually be experienced by a user, so it makes sense to measure these and quantify them statistically, rather than aiming to eliminate them altogether.
What's the use of such a measurement?
Suppose you can, by some contrived method, set up the OS scheduler in a way that the threads run undisturbed (even by indirect events such as other processes using caches, MMU, etc), will that be realistic for the actual usage of the parallel program?
It's pretty rare for a modern OS to let an application take control over general interrupts handling, memory management, thread scheduling, etc. Unless you are talking directly to the metal, your deterministic measurements will not only be impractical, but the users of your program will never experience them (unless they are equally close to the metal as when you did the measurements.)
So my question is, why do you need such strict conditions for measuring your program? In the general case, just accept the fluctuations, as that is what the users will most likely see. If the speed up of a certain algorithm/implementation is so insignificant as to be indistinguishable from the background noise, that's more useful information to me than knowing the actual speedup fraction.

What is the definition of realtime, near realtime and batch? Give examples of each?

I'm trying to get a good definition of realtime, near realtime and batch? I am not talking about sync and async although to me, they are different dimensions. Here is what I'm thinking
Realtime is sync web services or async web services.
Near realtime could be JMS or messaging systems or most event driven systems.
Batch to me is more of an timed system that is processing when it wakes up.
Give examples of each and feel free to fix my assumptions.
https://stackoverflow.com/tags/real-time/info
Real-Time
Real-time means that the time of an activity's completion is part of its functional correctness. For example, the sqrt() function's correctness is something like
The sqrt() function is implemented
correctly if, for all x >=0, sqrt(x) =
y implies y^2 == x.
In this setting, the time it takes to execute the sqrt() procedure is not part of its functional correctness. A faster algorithm may be better in some qualitative sense, but no more or less correct.
Suppose we have a mythical function called sqrtrt(), a real-time version of square root. Imagine, for instance, we need to compute the square root of velocity in order to properly execute the next brake application in an anti-lock braking system. In this setting, we might say instead:
The sqrtrt() function is implemented
correctly if
for all x >=0, sqrtrt(x) =
y implies y^2 == x and
sqrtrt() returns a result in <= 275 microseconds.
In this case, the time constraint is not merely a performance parameter. If sqrtrt() fails to complete in 275 microseconds, you may be late applying the brakes, triggering either a skid or reduced braking efficiency, possibly resulting in an accident. The time constraint is part of the functional correctness of the routine. Lift this up a few layers, and you get a real-time system as one (at least partially) composed of activities that have timeliness as part of their functional correctness conditions.
Near Real-Time
A near real-time system is one in which activities completion times, responsiveness, or perceived latency when measured against wall clock time are important aspects of system quality. The canonical example of this is a stock ticker system -- you want to get quotes reasonably quickly after the price changes. For most of us non-high-speed-traders, what this means is that the perceived delay between data being available and our seeing it is negligible.
The difference between "real-time" and "near real-time" is both a difference in precision and magnitude. Real-time systems have time constraints that range from microseconds to hours, but those time constraints tend to be fairly precise. Near-real-time usually implies a narrower range of magnitudes -- within human perception tolerances -- but typically aren't articulated precisely.
I would claim that near-real-time systems could be called real-time systems, but that their time constraints are merely probabilistic:
The stock price will be displayed to the user within 500ms of its change at the exchange, with
probability p > 0.75.
Batch
Batch operations are those which are perceived to be large blocks of computing tasks with only macroscopic, human- or process-induced deadlines. The specific context of computation is typically not important, and a batch computation is usually a self-contained computational task. Real-time and near-real-time tasks are often strongly coupled to the physical world, and their time constraints emerge from demands from physical/real-world interactions. Batch operations, by contrast, could be computed at any time and at any place; their outputs are solely defined by the inputs provided when the batch is defined.
Original Post
I would say that real-time means that the time (rather than merely the correct output) to complete an operation is part of its correctness.
Near real-time is weasel words for wanting the same thing as real-time but not wanting to go to the discipline/effort/cost to guarantee it.
Batch is "near real-time" where you are even more tolerant of long response times.
Often these terms are used (badly, IMHO) to distinguish among human perceptions of latency/performance. People think real-time is real-fast, e.g., milliseconds or something. Near real-time is often seconds or milliseconds. Batch is a latency of seconds, minutes, hours, or even days. But I think those aren't particularly useful distinctions. If you care about timeliness, there are disciplines to help you get that.
I'm curious for feedback myself on this. Real-time and batch are well defined and covered by others (though be warned that they are terms-of-art with very specific technical meanings in some contexts). However, "near real-time" seems a lot fuzzier to me.
I favor (and have been using) "near real-time" to describe a signal-processing system which can 'keep up' on average, but lags sometimes. Think of a system processing events which only happen sporadically... Assuming it has sufficient buffering capacity and the time it takes to process an event is less than the average time between events, it can keep up.
In a signal processing context:
- Real-time seems to imply a system where processing is guaranteed to complete with a specified (short) delay after the signal has been received. A minimal buffer is needed.
- Near real-time (as I have been using it) means a system where the delay between receiving and completion of processing may get relatively large on occasion, but the system will not (except under pathological conditions) fall behind so far that the buffer gets filled up.
- Batch implies post-processing to me. The incoming signal is just saved (maybe with a bit of real-time pre-processing) and then analyzed later.
This gives the nice framework of real-time and near real-time being systems where they can (in theory) run forever while new data is being acquired... processing happens in parallel with acquisition. Batch processing happens after all the data has been collected.
Anyway, I could be conflicting with some technical definitions I'm unaware of... and I assume someone here will gleefully correct me if needed.
There are issues with all of these answers in that the definitions are flawed. For instance, "batch" simply means that transactions are grouped and sent together. Real Time implies transactional, but may also have other implications. So when you combine batch in the same attribute as real time and near real time, clarity in purpose for that attribute is lost. The definition becomes less cohesive, less clear. This would make any application created with the data more fragile. I would guess that practitioners would be better off w/ a clearly modeled taxonomy such as:
Attribute1: Batched (grouped) or individual transactions.
Attribute2: Scheduled (time-driven), event-driven.
Attribute3: Speed per transaction. For batch that would be the average speed/transaction.
Attribute4: Protocol/Technology: SOAP, REST, combination, FTP, SFTP, etc. for data movement.
Attributex: Whatever.
Attribute4 is more related to something I am doing right now, so you could throw that out or expand the list for what you are trying to achieve. For each of these attribute values, there would likely be additional, specific attributes. But to bring the information together, we need to think about what is needed to make the collective data useful. For instance, what do we need to know between batched & transactional flows, to make them useful together. For instance, you may consider attributes for each to provide the ability to understand total throughput for a given time period. Seems funny how we may create conceptual, logical, and physical data models (hopefully) for our business clients, but we don't always apply that kind of thought to how we define terminology in our discussions.
Any system in which time at which output is produced is significant. This is usually because the input corresponding to some movement in the physical environment or world and the output has to relate to the same movement. The lag from input to output time must be sufficiently small for acceptable timelines.