Logical Time, Lamport Timestamps and Vector Clocks in distributed systems - concurrency

I am currently studying Distributed Systems for an exam. I think I have understood all the principles so far, but I go crazy when it comes to the topics of logical time, lamport timestamps and vector clocks. I just can't get an overview and connections to the topics. Here is my index card, which I would like to write...
Logical Time
enables coordination of events without physical clock
achieved by: ordering events into "happened before" relations
each process knows order of local events, as timestamps are assigned to every event
these timestamps can be lamport timestamps:
or vector clocks:
I just don't know what to put behind lamport timestamps and vector clocks and generally what enables what, what implements what and why one can make competition visible and the other not. Can someone help me get an overview?
To look up definitions on wikipedia and reading scientific work has helped not that much...

In distributed systems it is very hard to keep wall clocks in sync. Different applications have different tolerance for clock drift. For example, some distributed systems use clock to resolve conflicts (last update wins) and they accept the risk that clocks might not be in sync. But in some cases there are no tolerance.
As you correctly said, logical clocks were invented to address timing in distributed systems. There are two main problems to resolve: a) events needs to be totally ordered and b) detect concurrent events. Lamport timestamp addresses the total order problem; and vector clock helps to detect concurrent events.
So if I would create an index card, I would actually create several of them:
Global/Wall clock
allows events to be ordered globally, but the risk is that clocks will get out of sync
worth exploring how Cassandra deals with Last Updates Win strategy
Lamport timestamp
this is a logical clock
it allows total order of events
(all other info on lamport timestamp)
Version clock
this is a logical clock
it allows detecting concurrent events
(...note what "concurrent" means)
all other info on vector clock
not to be confused with version vector
Version vector
...

Related

What strategies and practices are used, when running very intense and long calculations, to ensure that hardware isn't damaged?

I have many large Fortran programs to run at work. I have access to several desktop computers and the Fortran code runs over takes several consecutive days. It's essentially running the same master module many times (lets say N times) with different parameters, something akin to Monte Carlo on steroids. In that sense the code is parallelizable, however I don't have access to a cluster.
With the scientific computing community, what practices and strategies are used to minimise hardware damaged from heat? The machines of course have their own cooling system (fans and heat sinks), but even so running intense calculations non stop for half a week cannot be healthy for the life of the machines? Though maybe I'm over-thinking this?
I'm not aware of any intrinsic functions in Fortran that can pause the code to give components a break? Current I've written a small module that keeps an eye on system clock, with a do while loop that "wastes time" in between consecutive runs of the master module in order to discharge heat. Is this an acceptable way of doing this? The processor is, after all, still running a while loop.
Another way would be to use a shell scripts or a python code to import Fortran? Alternatively are there any intrinsic routines in the compile (gfortran) that could achieve this? What are the standard, effective and accepted practices for dealing with this?
Edit: I should mention that all machines run on Linux, specifically Ubuntu 12.04.
For MS-DOS application I would consider the following:
Reduce as much as possible I/O operations withHDD, that is, keep data in memory as much as you can,
or keep data on a RamDisk.A RamDisk driver is available on Microsoft's website.
Let me know if you won't be able to find and I look at my CD archives
-Try to use Extended Memory by using aDPMI driver
DPMI - DOS Protected Mode Interface
-Set CPU affinity for a second CPU
Boost a priority to High, butI wouldn't recommend toboost toReal-Time
I think you need a hardware solution here, not a software solution. You need to increase the rate of heat exchange in the computers (new fans, water cooling, etc) and in the room (turn the thermostat way down, get some fans running, etc).
To answer the post more directly, you can use the fortran SLEEP command to pause a computation for a given number of seconds. You could use some system calls in Fortran to set the argument on the fly. But I wouldn't recommend it - you might as well just run your simulations on fewer computers.
To keep the advantages of the multiple computers, you need better heat exchange.
As long as the hardware is adequately dissipating heat and components are not operating at or beyond their "safe" temperature limits, they * should be fine.
*Some video cards were known to run very hot; i.e. 65-105°C. Typically, electronic components have a maximum temperature rating of exactly this. Beyond it, reliability degrades very quickly. Even though the manufacturer made these cards this way, they ended up with a reputation for failing (i.e. older nVidia FX, Quadro series.)
*Ubuntu likely has a "Critical temperature reached" feature where the entire system will power off if it overheats, as explained here. Windows is "blissfully ignorant." :)
*Thermal stress (large, repeated temperature variations) may contribute to component failure of IC's, capacitors, and hard disks. Over three decades of computing has taught that adequate cooling and leaving the PC on 24/7 actually may save wear-and-tear in my experience. (A typical PC will cost around $200 USD/year in electricity, so it's more like a trade-off in terms of cost.)
*PC's must be cleaned twice a year (depending on airborne particulate constituency and concentration.) Compressed air is nice for removing dust. Dust traps heat and causes failures. Operate a shop-vac while "dusting" to prevent the dust from going everywhere. Wanna see a really dusty computer?
*The CPU should be "ok" with it's stock cooler. Check it's temperature at cold system boot-up, then again after running code for an hour or so. The fan is speed-controlled to limit temperature rise. CPU temperature rise shouldn't be much warmer than about 40°C and less would be better. But an aftermarket, better-performing CPU cooler never hurts, such as these. CPU's rarely fail unless there is a manufacturing flaw or they operate near or beyond their rated temperatures for too long, so as long as they stay cool, long calculations are fine. Typically, they stop functioning and/or reset the PC if too hot.
*Capacitors tend to fail very rapidly when overheated. It is a known issue that some cap vendors are "junk" and will fail prematurely, regardless of other factors. "Re-capping" is the art of fixing these components. For a full run-down on this topic, see badcaps.net. It used to be possible to re-cap a motherboard, but today's 12+ layer and ROHS (no lead) motherboards make it very difficult without specialty hot-air tools.

Linux Timing across Kernel & User Space

I'm writing a kernel module for a special camera, working through V4L2 to handle transfer of frames to userspace code.. Then I do lots of userspace stuff in the app..
Timing is very critical here, so I've been doing lots of performance profiling and plain old std::chrono::steady_clock stuff to track timing, but I've reached the point where I need to also collect timing data from the Kernel side of things so that I can analyze the entire path from hardware interrupt through V4L DQBuf to userspace...
Can anyone recommend a good way to get high-resolution timing data, that would be consistent with application userspace data, that I could use for such comparisons? Right now I'm measuring activity in microseconds..
Ubuntu 12.04 LTS
At the lowest level, there are the rdtsc and rdtscp instructions if you're on an x86/x86-64 processor. That should provide the lowest overhead, highest possible resolution across the kernel/userspace boundary.
However, there are things you need to worry about. You need to make sure you're executing across the same core/cpu, the process isn't being context switched, and the frequency isn't changing across invocations. If the cpu supports an invariant tsc, (constant_tsc in /proc/cpuinfo) it's a little more reliable across cpus/cores and frequencies.
This should provide roughly nanosecond accuracy.
There are lot of kernel level utilities available that can get the timing related traces for you. For eg ptrace, ftrace, LTTng, Kprobes. Check out this link for more information.
http://elinux.org/Kernel_Trace_Systems

Does "real-time" constraints prevent the use of a task scheduler?

By task scheduler I mean any implementation of worker threads pool which distribute work to the threads according to whatever algorithm they are designed with. (like Intel TBB)
I know that "real-time" constraints imply that work gets done in predictable time (I'm not talking about speed). So my guess is that using a task scheduler, which, as far as I know, can't guarantee that some task will be executed before a given time, makes the application impossible to use in these constraints.
Or am I missing something? Is there a way to have both? Maybe by forcing assumptions on the quantity of data that can be processed? Or maybe there are predictable task schedulers?
I'm talking about "hard" real time constraints, not soft real time (like video games).
To clarify:
It is known that there are features in C++ that are not possible to use in this kind of context: new, delete, throw, dynamic_cast. They are not predictable (you don't know how much time can be spent on one of these operations, it depends on too much parameters that are not even known before execution).
You can't really use them in real time contexts.
What I ask is, does task schedulers have the same unpredictability that would make them unusable in real-time applications?
Yes, it can be done, but no it's not trivial, and yes there are limits.
You can write the scheduler to guarantee (for example) that an interrupt handler, exception handler (etc.) is guaranteed to be invoked without a fixed period of time from when it occurs. You can guarantee that any given thread will (for example) get at least X milliseconds of CPU time out of any given second (or suitable fraction of a second).
To enforce the latter, you generally need admittance criteria -- an ability for the scheduler to say "sorry, but I can't schedule this as a real-time thread, because the CPU is already under too much load.
In other cases, all it does is guarantee that at least (say) 99% of CPU time will be given the real-time tasks (if any exist) and it's up to whomever designs the system on top of that to schedule few enough real-time tasks that this will ensure they all finish quickly enough.
I feel obliged to add that the "hardness" of real-time requirements is almost entirely orthogonal to the response speed needed. Rather, it's almost entirely about the seriousness of the consequences of being late.
Just for example, consider a nuclear power plant. For a lot of what happens, you're dealing with time periods on the order of minutes, or in some cases even hours. Filling a particular chamber with, say, half a million gallons of water just isn't going to happen in microseconds or milliseconds.
At the same time, the consequences of a later answer can be huge -- quite possibly causing not just a few deaths like hospital equipment could, but potentially hundreds or even thousands of deaths, hundreds of millions in damage, etc. As such, it's about as "hard" as real-time requirements get, even though the deadlines are unusually "loose" by most typical standards.
In the other direction, digital audio playback has much tighter limits. Delays or dropouts can be quite audible down to a fraction of a millisecond in some cases. At the same time, unless you're providing sound processing for a large concert (or something on that order) the consequences of a dropout will generally be a moment's minor annoyance on the part of a user.
Of course, it's also possible to combine the two -- for an obvious example, in high-frequency trading, deadlines may well be in the order of microseconds (or so) and the loss from missing a deadline could easily be millions or tens of millions of (dollars|euros|pounds|etc.)
The term real-time is quite flexible. "Hard real-time" tends to mean things where a few tens of microseconds make the difference between "works right" and "doesn't work right. Not all "real-time" systems require that sort of real-time-ness.
I once worked on a radio-base-station for mobile phones. One of the devices on the board had an interrupt that fired every 2-something milliseconds. For correct operation (not losing calls), we had to deal with the interrupt, that is, do the work inside the interrupt and write the hardware registers with the new values, within 100 microseconds - if we missed, there would be dropped calls. If the interrupt wasn't taken after 160 microseconds, the system would reboot. That is "hard real-time", especially as the processor was just running at a few tens of MHz.
If you produce a video-player, it requires real-time in the a few milliseconds range.
A "display stock prices" probably can be within the 100ms range.
For a webserver it is probably acceptable to respond within 1-2seconds without any big problems.
Also, there is a difference between "worst case worse than X means failure" (like the case above with 100 microseconds or dropped calls - that's bad if it happens more than once every few weeks - and even a few times a year is really something that should be fixed). This is called "Hard real-time".
But other systems, missing your deadline means "Oh, well, we have to do that over again" or "a frame of video flickered a bit", as long as it doesn't happen very often, it's probably OK. This is called "soft real-time".
A lot of modern hardware will make "hard real-time" (the 10s or 100 microsecond range) difficult, because the graphics processor will simply stop the processor from accessing memory, or if the processor gets hot, the stopclk pin is pulled for 100 microseconds...
Most modern OS's, such as Linux and Windows, aren't really meant to be "hard real-time". There are sections of code that does disable interrupt for longer than 100 microseconds in some parts of these OS's.
You can almost certainly get some good "soft real-time" (that is, missing the deadline isn't a failure, just a minor annoyance) out of a mainstream modern OS with modern hardware. It'll probably require either modifications to the OS or a dedicated real-time OS (and perhaps suitable special hardware) to make the system do hard real-time.
But only a few things in the world requires that sort of hard real-time. Often the hard real-time requirements are dealt with by hardware - for example, the next generation of radio-base-stations that I described above, had more clever hardware, so you just needed to give it the new values within the next 2-something milliseconds, and you didn't have the "mad rush to get it done in a few tens of microseconds". In a modern mobile phone, the GSM or UMTS protocol is largely dealt with by a dedicated DSP (digital signal processor).
A "hard real-time" requirement is where the system is really failing if a particular deadline (or set of deadlines) can't be met, even if the failure to meet deadlines happens only once. However, different systems have different systems have different sensitivity to the actual time that the deadline is at (as Jerry Coffin mentions). It is almost certainly possible to find cases where a commercially available general purpose OS is perfectly adequate in dealing with the real-time requirements of a hard real-time system. It is also absolutely sure that there are other cases where such hard real-time requirements are NOT possible to meet without a specialized system.
I would say that if you want sub-millisecond guarantees from the OS, then Desktop Windows or Linux are not for you. This is really down to the overall philosophy of the OS and scheduler design, and to build a hard real-time OS requires a lot of thought about locking and potential for one thread to block another thread, from running, etc.
I don't think there is ONE answer that applies to your question. Yes, you can certainly use thread-pools in a system that has hard real-time requirements. You probably can't do it on a sub-millisecond basis unless there is specific support for this in the OS. And you may need to have dedicated threads and processes to deal with the highest priority real-time behaviour, which is not part of the thread-pool itself.
Sorry if this isn't saying "Yes" or "No" to your answer, but I think you will need to do some research into the actual behaviour of the OS, and see what sort of guarantees it can give (worst case). You will also have to decide what is the worse case scenario, and what happens if you miss a deadline - are (lots of) people dying (plane falling out of the sky), or are some banker going to lose millions, is the green lights going to come on at the same time on two directions on a road crossing or is it some bad sound coming out of a speaker?
"Real time" doesn't just mean "fast", it means that the system can respond to meet deadlines in the real world. Those deadlines depend on what you're dealing with in the real world.
Whether or not a task finishes in a particular timeframe is a characteristic of the task, not the scheduler. The scheduler might decide which task gets resources, and if a task hasn't finished by a deadline it might be stopped or have its resource usage constrained so that other tasks can meet their deadlines.
So, the answer to your question is that you need to consider the workload, deadlines and the scheduler together, and construct your system to meet your requirements. There is no magic scheduler that will take arbitrary tasks and make them complete in a predictable time.
Update:
A task scheduler can be used in real-time systems if it provides the guarantees you need. As others have said, there are task schedulers that provide those guarantees.
On the comments: The issue is the upper bound on time taken.
You can use new and delete if you overload them to have the performance characteristics you are after; the problem isn't new and delete, it is dynamic memory allocation. There is no requirement that new and delete use a general-purpose dynamic allocator, you can use them to allocate out of a statically allocated pool sized appropriately for your workload with deterministic behaviour.
On dynamic_cast: I tend not to use it, but I don't think it's performance is non-deterministic to the extent that it should be banned in real-time code. This is an example of the same issue: understanding worst-case performance is important.

Measuring parallel computation time for interdependent threads

I have a question concerning runtime measurements in parallel programs (I used C++ but I think the question is more general).
Some short explanations: 3 threads are running parallel (pthread), solving the same problem in different ways. Each thread may pass information to the other thread (e.g. partial solutions obtained by the one thread but not by the other, yet) for speeding up the other threads, depending on his own status / available information in his own calculation. The whole process stops as soon as the first thread is ready.
Now I would like to have a unique time measurement for evaluating the runtime from start until the problem is solved. ( In the end, I want to determine if using synergy effects through a parallel calculation is faster then calculation on a single thread).
In my eyes, the problem is, that (because of the operating system pausing / unpausing the single threads), the point when information is passed in the process is not deterministic in each process' state. That means, a certain information is acquired after xxx units of cpu time on thread 1, but it can not be controlled, whether thread 2 receives this information after yyy or zzz units of cpu time spent in its calculations. Assumed that this information would have finished thread 2's calculation in any case, the runtime of thread 2 was either yyy or zzz, depending on the operating system's action.
What can I do for obtaining a deterministic behaviour for runtime comparisons? Can I order the operation system to run each thread "undisturbed" (on a multicore machine)? Is there something I can do on implementation (c++) - basis?
Or are there other concepts for evaluating runtime (time gain) of such implementations?
Best regards
Martin
Any time someone uses the terms 'deterministic' and 'multicore' in the same sentence, it sets alarm bells ringing :-)
There are two big sources of non-determinism in your program: 1) the operating system, which adds noise to thread timings through OS jitter and scheduling decisions; and 2) the algorithm, because the program follows a different path depending on the order in which communication (of the partial solutions) occurs.
As a programmer, there's not much you can do about OS noise. A standard OS adds a lot of noise even for a program running on a dedicated (quiescent) node. Special purpose operating systems for compute nodes go some way to reducing this noise, for example Blue Gene systems exhibit significantly less OS noise and therefore less variation in timings.
Regarding the algorithm, you can introduce determinism to your program by adding synchronisation. If two threads synchronise, for example to exchange partial solutions, then the ordering of the computation before and after the synchronisation is deterministic. Your current code is asynchronous, as one thread 'sends' a partial solution but does not wait for it to be 'received'. You could convert this to a deterministic code by dividing the computation into steps and synchronising between threads after each step. For example, for each thread:
Compute one step
Record partial solution (if any)
Barrier - wait for all other threads
Read partial solutions from other threads
Repeat 1-4
Of course, we would not expect this code to perform as well, because now each thread has to wait for all the other threads to complete their computation before proceeding to the next step.
The best approach is probably to just accept the non-determinism, and use statistical methods to compare your timings. Run the program many times for a given number of threads and record the range, mean and standard deviation of the timings. It may be enough for you to know e.g. the maximum computation time across all runs for a given number of threads, or you may need a statistical test such as Student's t-test to answer more complicated questions like 'how certain is it that increasing from 4 to 8 threads reduces the runtime?'. As DanielKO says, the fluctuations in timings are what will actually be experienced by a user, so it makes sense to measure these and quantify them statistically, rather than aiming to eliminate them altogether.
What's the use of such a measurement?
Suppose you can, by some contrived method, set up the OS scheduler in a way that the threads run undisturbed (even by indirect events such as other processes using caches, MMU, etc), will that be realistic for the actual usage of the parallel program?
It's pretty rare for a modern OS to let an application take control over general interrupts handling, memory management, thread scheduling, etc. Unless you are talking directly to the metal, your deterministic measurements will not only be impractical, but the users of your program will never experience them (unless they are equally close to the metal as when you did the measurements.)
So my question is, why do you need such strict conditions for measuring your program? In the general case, just accept the fluctuations, as that is what the users will most likely see. If the speed up of a certain algorithm/implementation is so insignificant as to be indistinguishable from the background noise, that's more useful information to me than knowing the actual speedup fraction.

What is the definition of realtime, near realtime and batch? Give examples of each?

I'm trying to get a good definition of realtime, near realtime and batch? I am not talking about sync and async although to me, they are different dimensions. Here is what I'm thinking
Realtime is sync web services or async web services.
Near realtime could be JMS or messaging systems or most event driven systems.
Batch to me is more of an timed system that is processing when it wakes up.
Give examples of each and feel free to fix my assumptions.
https://stackoverflow.com/tags/real-time/info
Real-Time
Real-time means that the time of an activity's completion is part of its functional correctness. For example, the sqrt() function's correctness is something like
The sqrt() function is implemented
correctly if, for all x >=0, sqrt(x) =
y implies y^2 == x.
In this setting, the time it takes to execute the sqrt() procedure is not part of its functional correctness. A faster algorithm may be better in some qualitative sense, but no more or less correct.
Suppose we have a mythical function called sqrtrt(), a real-time version of square root. Imagine, for instance, we need to compute the square root of velocity in order to properly execute the next brake application in an anti-lock braking system. In this setting, we might say instead:
The sqrtrt() function is implemented
correctly if
for all x >=0, sqrtrt(x) =
y implies y^2 == x and
sqrtrt() returns a result in <= 275 microseconds.
In this case, the time constraint is not merely a performance parameter. If sqrtrt() fails to complete in 275 microseconds, you may be late applying the brakes, triggering either a skid or reduced braking efficiency, possibly resulting in an accident. The time constraint is part of the functional correctness of the routine. Lift this up a few layers, and you get a real-time system as one (at least partially) composed of activities that have timeliness as part of their functional correctness conditions.
Near Real-Time
A near real-time system is one in which activities completion times, responsiveness, or perceived latency when measured against wall clock time are important aspects of system quality. The canonical example of this is a stock ticker system -- you want to get quotes reasonably quickly after the price changes. For most of us non-high-speed-traders, what this means is that the perceived delay between data being available and our seeing it is negligible.
The difference between "real-time" and "near real-time" is both a difference in precision and magnitude. Real-time systems have time constraints that range from microseconds to hours, but those time constraints tend to be fairly precise. Near-real-time usually implies a narrower range of magnitudes -- within human perception tolerances -- but typically aren't articulated precisely.
I would claim that near-real-time systems could be called real-time systems, but that their time constraints are merely probabilistic:
The stock price will be displayed to the user within 500ms of its change at the exchange, with
probability p > 0.75.
Batch
Batch operations are those which are perceived to be large blocks of computing tasks with only macroscopic, human- or process-induced deadlines. The specific context of computation is typically not important, and a batch computation is usually a self-contained computational task. Real-time and near-real-time tasks are often strongly coupled to the physical world, and their time constraints emerge from demands from physical/real-world interactions. Batch operations, by contrast, could be computed at any time and at any place; their outputs are solely defined by the inputs provided when the batch is defined.
Original Post
I would say that real-time means that the time (rather than merely the correct output) to complete an operation is part of its correctness.
Near real-time is weasel words for wanting the same thing as real-time but not wanting to go to the discipline/effort/cost to guarantee it.
Batch is "near real-time" where you are even more tolerant of long response times.
Often these terms are used (badly, IMHO) to distinguish among human perceptions of latency/performance. People think real-time is real-fast, e.g., milliseconds or something. Near real-time is often seconds or milliseconds. Batch is a latency of seconds, minutes, hours, or even days. But I think those aren't particularly useful distinctions. If you care about timeliness, there are disciplines to help you get that.
I'm curious for feedback myself on this. Real-time and batch are well defined and covered by others (though be warned that they are terms-of-art with very specific technical meanings in some contexts). However, "near real-time" seems a lot fuzzier to me.
I favor (and have been using) "near real-time" to describe a signal-processing system which can 'keep up' on average, but lags sometimes. Think of a system processing events which only happen sporadically... Assuming it has sufficient buffering capacity and the time it takes to process an event is less than the average time between events, it can keep up.
In a signal processing context:
- Real-time seems to imply a system where processing is guaranteed to complete with a specified (short) delay after the signal has been received. A minimal buffer is needed.
- Near real-time (as I have been using it) means a system where the delay between receiving and completion of processing may get relatively large on occasion, but the system will not (except under pathological conditions) fall behind so far that the buffer gets filled up.
- Batch implies post-processing to me. The incoming signal is just saved (maybe with a bit of real-time pre-processing) and then analyzed later.
This gives the nice framework of real-time and near real-time being systems where they can (in theory) run forever while new data is being acquired... processing happens in parallel with acquisition. Batch processing happens after all the data has been collected.
Anyway, I could be conflicting with some technical definitions I'm unaware of... and I assume someone here will gleefully correct me if needed.
There are issues with all of these answers in that the definitions are flawed. For instance, "batch" simply means that transactions are grouped and sent together. Real Time implies transactional, but may also have other implications. So when you combine batch in the same attribute as real time and near real time, clarity in purpose for that attribute is lost. The definition becomes less cohesive, less clear. This would make any application created with the data more fragile. I would guess that practitioners would be better off w/ a clearly modeled taxonomy such as:
Attribute1: Batched (grouped) or individual transactions.
Attribute2: Scheduled (time-driven), event-driven.
Attribute3: Speed per transaction. For batch that would be the average speed/transaction.
Attribute4: Protocol/Technology: SOAP, REST, combination, FTP, SFTP, etc. for data movement.
Attributex: Whatever.
Attribute4 is more related to something I am doing right now, so you could throw that out or expand the list for what you are trying to achieve. For each of these attribute values, there would likely be additional, specific attributes. But to bring the information together, we need to think about what is needed to make the collective data useful. For instance, what do we need to know between batched & transactional flows, to make them useful together. For instance, you may consider attributes for each to provide the ability to understand total throughput for a given time period. Seems funny how we may create conceptual, logical, and physical data models (hopefully) for our business clients, but we don't always apply that kind of thought to how we define terminology in our discussions.
Any system in which time at which output is produced is significant. This is usually because the input corresponding to some movement in the physical environment or world and the output has to relate to the same movement. The lag from input to output time must be sufficiently small for acceptable timelines.