It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have been looking online at what a thread is and I do not feel like I understand it. Could someone shed some light on this? In terms of programming languages for relating to C++, objective-C would be nice.
In objective-c, I encountered
#property(nonatomic, strong) NSString *name;
the explanation for nonatomic was it means to not be worried about multiple threads trying to access the object at the same time, and objective-c does not have to synthesize thread safe code. So what does that exactly mean as well.
A process can consist of multiple threads of execution, which logically can be thought of as running simultaneously alongside each other. Each thread runs independently but shares the same memory and process state. A single thread can "do one thing": perform computation, interact with the network, update a UI, decode a video, etc. But, a single thread cannot do all of these at once without a significant amount of extra work from the programmer. Having multiple threads in a process enables the programmer to easily enable an application to do multiple things at once (multitasking).
Using multiple threads does introduce some new challenges, though. For example, if you have two threads that access the same variable, you can end up with a concurrency hazard in which the variable might not be completely updated by one thread before the other thread accesses it, leading to failures in the program. Objective-C will generate thread-safe code by default, to avoid this situation. nonatomic tells the compiler that you will never access it from multiple threads simultaneously, so the compiler can skip the thread-safe code and make faster code. This can be useful if you are going to supply your own synchronization, anyway (e.g. to keep a group of properties in sync, which Objective-C itself cannot help you with).
If you violate the core nonatomic assumption and access a nonatomic variable from multiple threads at the same time, all hell will break loose.
explanation for nonatomic was it means to not be worried about multiple threads trying to access the object at the same time, and objective-c does not have to synthesize thread safe code. So what does that exactly mean as well.
Imagine you are asked to write your name on a piece of paper. You're given a list of instructions someone thought would work just fine:
you find a line that's current empty,
move your pen over it,
write your name.
All good.
Now imagine you're given a new piece of paper, but both you and someone else are asked to write your names on the same piece of paper, and you're given the old instructions, perhaps:
1) You both look at the paper and determine to write on the first line.
2) You put your pens down (maybe you can both do it comfortably enough - one left / one right handed).
3) You start to write an I but the other person is writing a J and it comes out looking like a U.
4) gets worse from here....
But equally, it might be that you're paying more attention, and finish writing your name before they start looking for an empty line, or vice versa.
Threading is a lot like this... in the above example, each thread/person is keeping track of how they're progressing at the task, following their instructions very literally. Notice that if you complete only step 1, then the other person does step 1, you've already set yourselves up to write over each others' name regardless of the ordering or concurrency of the remaining steps.
In all this, you don't even have to be doing things at the same instant in time, it's just that the tracking of your tasks is independent - you're independent people with your own memory of where you are in your task. Same with threads - they're ways of tracking what to do independently, and it's optional whether they actually do things in your program at the same instant (which is possible with multi-core CPUs and multi-CPU systems).
"atomic" is used in the sense of indivisible (think: you can't cut an atom of gold in half and still have gold). Similarly, if you say write your name atomically, it means any observer is guaranteed to either witness the instant before - when no name is there - or the instant after - when your name is completely written - but they'll never see just half your name. An atomic update on a string variable is like that.
Atomic string updates don't solve the problem above... you might still clash in finding "an empty line" (in a computing context - say finding the next empty position in a container). If that process of finding an empty line is atomic, and the line is somehow marked "used" even before you've written anything on it yourself, then it means you'll never get the same line as someone else. At that stage, multiple people writing their names won't clash on the same line, but only when both the line finding and the name writing are atomic can people looking at the paper know that they're seeing completely written non-clashing names.
Making these kind of guarantees is very useful but expensive. It means that threads must communicate and coordinate amongst themselves, agreeing "who" will go first with others waiting as necessary.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I used to play with Visual Basic 5.0 as a kid. It allowed me to place lots of 'timers' on a 'form' that would seemingly run simultaneously... whereas now I'm starting to learn lower-level programming languages and everything seems to run one-thing-at-a-time.
Can someone help my mind grasp this concept of simultaneity and why VB seemed to have it easily available but learning c++ so far I've not met anything that feels like I can replicate that simultaneous running of code?
Is most of the 'simultaneity' in simple Visual Basic programs actually an illusion that c++ code can easily recreate? Sorry for lacking the terminology.
edit: Thanks for the replies. They have clarified that it was indeed usually an illusion of simultaneity. To explain further what was in my mind, at my early stage in learning c++, I don't think I know how to write a program that, every 2 seconds, will increment the value of 'x'... while simultaneously every 5 seconds, incrementing 'y'.
This question isn't appropriate for Stack Overflow, but I really like answering people's questions, so I will anyway, even though the answer will be necessarily incomplete.
What Visual Basic was doing was providing you with a special environment in which things appeared to be running simultaneously. All those timers were just entries in a data structure that told Visual Basic when it had to go act like that timer had just expired. It used the current time to decide when it should run them.
In general, Visual Basic is what controlled the flow of your program. In between all the statements you wrote, the Visual Basic runtime was doing all kinds of things behind the scenes for you. Things like checking up on the data structures it was keeping of which timer to fire when and so on.
There are two main ways of controlling the flow of your program in such a way as to make it seem like several things are happening at the same time. What Visual Basic was doing is called an 'event loop'. That's a main loop in your program that keeps track of all the things that need doing and goes and runs the code that does them at the right times. When someone clicks on a window, your program gets an event, the event loop sees that event and runs the 'clicked on a window' code when it gets control. It sort of seems like your program just responds instantly to the click, but that's not what's really happening.
The other way is called 'multi-threading'. In that way your program may really do several things at the same time. The operating system itself decides how to allocate CPU resources to running 'threads'. If there is a lot of competition for CPU resources, the operating system may start running one program (or thread) for a little while (a thousandth of a second or so) and then switch to running a different program (aka process (threads and processes are strongly related, but definitely distinct concepts)). The switching happens so fast, and can happen at any instant, so it seems like several things are happening at once. But, if there are enough CPU resources available (multiple cores) the computer can actually just run several things at the same time.
There are advantages and disadvantages to each approach. Actually dealing with two things modifying the same data at the exact same time and making sure everything stays consistent can be very hard, and is usually accomplished by having sections of your program in only one thread is allowed to be running at a time so that those modifications can't actually happen in the same instant and they are 'serialized' so that one thread's modification happens before another. These are usually called 'critical sections' and access to these critical sections is typically controlled by something called a 'mutex' (aka mutual exclusion) lock.
Event driven code doesn't have that problem since typically one event must be fully processed before the code to process the next event can start. But this can lead to under-utilization of CPU resources, and it can also sometimes introduce noticeable delays in handling events as one event can't be processed until the code to process the preceding event is finished.
In short, Visual Basic was making event driven programming really easy and clean to use, and providing you with the illusion that several things were running at the same time.
These, in general, are fairly deep computer science topics, and people get their PhDs studying some of this stuff. But a working understanding of exactly how event driven code and how multi-threading code works isn't that hard.
There are also other models of concurrency out there, and hybrid models like 'co-routines' that look like threads but are really events.
I put all the most useful concept handles in quotes to emphasize them.
My OS textbook says the following in a chapter discussing concurrency:
Concurrent processes come into conflict with each other when they are competing for the use of the same resource. In its pure form, we can describe the situation as follows. Two or more processes need to access a resource during the course of their execution. Each process is unaware of the existence of other processes, and each is to be unaffected by the execution of the other processes. It follows from this that each process should leave the state of any resource that it uses unaffected.
My question specifically concerns the last sentence:
It follows from this that each process should leave the state of any resource that it uses unaffected.
This does not make sense to me. If a process is using a resource, then it must necessarily affect the state of that resource. This seems obvious, but it sounds like the sentence is disagrees?
I would greatly appreciate it if the members of this site could please take the time to clarify this.
While it is not clear to me in what context this was said, as you mentioned a small portion of the quote. And didn't even bother to mention the book you quoted. However, I can shoot in the dark and assume that what they meant is: A process using resource X should, once done using it, leave it unaffected. That is, if processY decides to use logical resource, i.e. a file, it should not write or change the file as this might affect processZ which needs to use the file with its original data.
When it comes to physical resources, the statement above makes no sense... unless you provide the full quote.
From my studies I know the concepts of starvation, deadlock, fairness and other concurrency issues. However, theory differs from practice, to an extent, and real engineering tasks often involve greater detail than academic blah blah...
As a C++ developer I've been concerned about threading issues for a while...
Suppose you have a shared variable x which refers to some larger portion of the program's memory. The variable is shared between two threads A and B.
Now, if we consider read/write operations on x from both A and B threads, possibly at the same time, there is a need to synchronize those operations, right? So the access to x needs some form of synchronization which can be achieved for example by using mutexes.
Now lets consider another scenario where x is initially written by thread A, then passed to thread B (somehow) and that thread only reads x. The thread B then produces a response to x called y and passes it back to the thread A (again, somehow). My question is: what synchronization primitives should I use to make this scenario thread-safe. I've read about atomics and, more importantly, memory fences - are these the tools I should rely on?
This is not a typical scenario in which there is a "critical section". Instead some data is passed between threads with no possibility of concurrent writes in the same memory location. So, after being written, the data should first be "flushed" somehow, so that the other threads could see it in a valid and consistent state before reading. How is it called in the literature, is it "visibility"?
What about pthread_once and its Boost/std counterpart i.e. call_once. Does it help if both x and y are passed between threads through a sort of "message queue" which is accessed by means of "once" functionality. AFAIK it serves as a sort of memory fence but I couldn't find any confirmation for this.
What about CPU caches and their coherency? What should I know about that from the engineering point of view? Does such knowledge help in the scenario mentioned above, or any other scenario commonly encountered in C++ development?
I know I might be mixing a lot of topics but I'd like to better understand what is the common engineering practice so that I could reuse the already known patterns.
This question is primarily related to the situation in C++03 as this is my daily environment at work. Since my project mainly involves Linux then I may only use pthreads and Boost, including Boost.Atomic. But I'm also interested if anything concerning such matters has changed with the advent of C++11.
I know the question is abstract and not that precise but any input could be useful.
you have a shared variable x
That's where you've gone wrong. Threading is MUCH easier if you hand off ownership of work items using some sort of threadsafe consumer-producer queue, and from the perspective of the rest of the program, including all the business logic, nothing is shared.
Message passing also helps prevent cache collisions (because there is no true sharing -- except of the producer-consumer queue itself, and that has trivial effect on performance if the unit of work is large -- and organizing the data into messages help reduce false sharing).
Parallelism scales best when you separate the problem into subproblems. Small subproblems are also much easier to reason about.
You seem to already be thinking along these lines, but no, threading primitives like atomics, mutexes, and fences are not very good for applications using message passing. Find a real queue implementation (queue, circular ring, Disruptor, they go under different names but all meet the same need). The primitives will be used inside the queue implementation, but never by application code.
I am trying to use google perf tools CPU profiler for debugging performance issues on a multi-threaded program. With single thread it take 250 ms while 4 threads take around 900ms.
My program has a mmap'ed file which is shared across threads and all operations are read only. Also my program creates large number of objects which are not shared across threads. (Specifically my program uses CRF++ library to do some querying). I am trying to figure out how to make my program perform better with multi threads. Call graph produced by CPU profiler of gperf tools shows that my program spends a lot of time (around 50%) of time in _L_unlock_16.
Searching web for _L_unlock_16 pointed to some bug reports with canonical suggesting that its associated with libpthread. But other than that I was not able to find any useful information for debugging.
A brief description of what my program does. I have few words in a file (4). In my program I have a processWord() which processes a single word using CRF++. This processWord() is what each thread executes. My main() reads words from the file and each threads runs processWord() in parallel. If I process a single word(hence only 1 thread) it takes 250ms and so if I process all 4 words(and hence 4 threads) I expected it to finish by same time 250 ms, however as I mentioned above it's taking around 900ms.
This is the callgraph of the execution - https://www.dropbox.com/s/o1mkh477i7e9s4m/cgout_n2.png
I want to understand why my program is spending lot of time at _L_unlock_16 and what I can do to mitigate it.
Yet again the _L_unlock_16 is not a function of your code. Have you looked at the stracktraces above that function? What are the callers of it when the program waits? You've said that the program wastes 50% waiting inside. But, which part of the program ordered that operation? Is it again from memory alloc/dealloc ops?
The function seems to come from libpthread. Does CRF+ handle threads/libpthread in any way? If yes, then maybe the library is illconfigured? Or maybe it implements some 'basic threadsafety' by adding locks everywhere and simply is not built well for multithreading? What does the docs say about that?
Personally, I'd guess that it ignores threads and that you have added all the threading. I may be wrong, but if that's true, then the CRF++ probably will not call that 'unlock' function at all, and the 'unlock' is somwhow called from your code that orchestrates the threads/locks/queues/messages etc? Halt the program a few times and look at who called the unlock. If it really spends 50% sitting in the unlock, you will very quickly know who causes the lock to be used and you will be able to either eliminate it or at least perform a more refined research..
EDIT #1:
Eh.. when I said "stacktrace" I meant stacktrace, not callgraph. Callgraph may look nice in trivial cases, but in more complex ones, it will be mangled and unreadable and will hide the precious details into "compacted" form.. But, fortunatelly, here the case looks simple enough.
Please observe the beginning: "Process word, 99x". I assume that the "99x" is the call count. Then, look at "tagger-parse": 97x. From that:
61x into rebuildFeatures from which 41x goes right into unlock and 20(13) indirectly into it
23x goes to buildLattice fro which 21x goes into unlock
I'd guess that it was the CRF++ uses locking quite heavily. For me, it seems that you simply observe the effects of CRF's internal locking. It certainly is not lockless internally.
It seems to lock at least once per "processWord". It's hard to say without looking at code (is it opensource? I've not checked..), from stacktraces it would be more obvious, but IF it really locks once per "processWord" that it could even be a sort of a "global lock" that protects "everything" from "all threads" and causes all jobs to serialize. Whatever. Anyways, clearly, it's the CRF++'s internals that lock and wait.
If your CRF objects are really (really) not shared across threads, then remove threading configuration flags from CRF, pray that they were sane enough to not use any static variables nor global objects, add some own locking (if needed) at the topmost job/result level and retry. You should have it now much faster.
If the CRF objects are shared, unshare them and see above.
But, if they are shared behind the scenes, then there's little doable. Change your library to a one that has a better threading support, OR fix the library, OR ignore and use it with current performance.
The last advice may sound strange (it works slowly, right? so why to ignore it?), but in fact is the most important one, and you should try it first. If the parallel tasks have similar "data profile", then there is very probable that they will try to hit the same locks in the same approximate moment of time. Imagine a medium-sized cache that holds words sorted by its first letter. At the toplevel there's array of, say, 26 entries. Each entry has a lock and a list of words inside. If you run 100 threads that will each first check "mom" then "dad" then "son" - then all of that 100 threads will first hit and wait for each other at "M", then at "D" then at "S". Well, approximately/probably of course. But you get the idea. If the data profile were more random then they'd block each other far less. Mind that processing ONE word is a .. small task and that you try to process the same word(s). Even if the internal CRF's locking is smart, it is just bound to hit the same areas. Try again with a more dispersed data.
Add to that the fact that threading costs. If something was guarded against races with use of locks, then every lock/unlock costs because at least they have to "halt and check if the lock is open" (sorry very imprecise wording). If the data-to-process is small in relation to the-amount-of-lockchecks, then adding more threads will not help and will just waste the time. For checking one word, it may even happen that the sole handling of a single lock takes longer than processing the word! But, if the amount of data to be processed were larger, then the cost of flipping a lock compared to processing the data might start being neglible.
Prepare a set of 100 or more words. Run and measure it on one thread. Then partition the words at random and run it on 2 and 4 threads. And measure. If not better, try at 1000 and 10000 words. The more the better, of course, keeping in mind that the test should not last till your next birthday ;)
If you notice that 10k words split over 4 threads (2500w per th) works about 40%-30%-or even 25% faster than on one thread - here you go! You simply gave it a too small job. It was tailored and optimized for bigger ones!
But, on the other hand, it may happen that 10k words split over 4 threads does not work faster, or worse, works slower - then it might indicate that the library handles multithreading very wrong. Now try the other things like stripping threading from it or repairing it.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
in this article What's all this fuss about Erlang?
it is said that:
" The world IS concurrent. It IS parallel. Things happen all over the
place at the same time. I could not drive my car on the highway if I
did not intuitively understand the notion of concurrency; pure
message-passing concurrency is what we do all the time."
i dont get it i dont think this is correct when i drive my car into a gas station i wait for the person before me to finish filling his gas tank as he is using (locking) the gas stand anyone thinks what im saying is incorrect?
The article never says "The world does not need locks." The article says, "In Erlang, given that there is no shared state, Erlang programs have no need for locks." Locks are one way of achieving concurrency by sharing mutable state. Erlang achieves concurrency by passing messages instead of sharing state.
A gas stand is just a place to get gas. How people decide to make sure only one person is using it at a time is a separate matter. In a shared state language, you might have one gas stand instance that you lock when you want to use it. In a message passing language, you could send a message to the gas stand process "Is someone using you?" and the gas stand will reply yes or no. You can achieve the same basic goal either way.
You might be wondering, "That sounds like a lock to me!" The important distinction is, there is exactly one process responsible for each piece of state in Erlang, but any number of threads can influence on piece of state with mutable locked data. If the gas stand state gets corrupted with locking semantics, you don't know what thread broke it. In Erlang, you can see every message that comes into the process responsible for that data, and see what messages are damaging it. It might sound like a useless distinction, but believe me, it makes concurrency much easier to deal with.
The simplest answer is that it's just an analogy. That particular paragraph is really about why concurrency matters, and why it's not as unintuitive as one might at first think, coming from a procedural programming world (see? I said 'world', but I really meant something like 'background' or 'context' it's easy to mix metaphors).
Anyway, I wouldn't read too far into that statement, I don't think it's meant to imply (nor does it explicitly say) that the world itself is lockless, just that the world is concurrent. That's where the analogy starts to veer left; just like you and I do not share state (which is mentioned), we also are not immutable. You can change your opinion, change your shirt, etc. without forking and creating a new entity with a new shirt. As mentioned elsewhere in the article, Erlang gets around some problems inherent in maintaining concurrent state by making everything immutable. We solve things by politely waiting for the guy in front of us at the gas station.
Gases etc are fluids only if you don't examine them too closely. You need a large sample if you want to describe things with continuous functions. If you look too closely the fluid approximation breaks down. Going the other way, if you zoom out far enough, you can treat (eg) grain as a fluid.
The fact that these things are in fact made up of indivisible units with quantised, deterministic behaviour does not stop fluid dynamics equations from decribing them accurately on a macro level.
Are they fluids, or not? The answer is "Yes, they are fluids. Or not, depending."
There are conditions under which a model applies, and other conditions under which it does not. Failure to apprehend this leads to belief in silver bullets, and dashed hopes.