Go is a concurrent lang What does this mean?
doesn't this mean that it is a C/C++/Java.. alternative?
A concurrent language is a language that has language constructs for concurrency.
Go is a concurrent language because it has "goroutines".
Concurrency
Go provides goroutines, small lightweight threads; the name alludes to coroutines. Goroutines are created with the go statement from anonymous or named functions.
Goroutines are executed in parallel with other goroutines, including their caller. They do not necessarily run in separate threads, but a group of goroutines are multiplexed onto multiple threads — execution control is moved between them by blocking them when sending or receiving messages over channels.
It means that it is a language with features suitable for concurrent (parallel, multithreaded, etc) programming. It has special languge constructs to support this type of programming. Concurrent programming can be done in other languages (C/C++, Java, etc) but it will (arguably) be harder and will probably result in more errors in the programs.
Here are some resources about concurrent programming from some of the principal authors of the Go programming language.
Introduction to Concurrent Programming
Resources about threaded programming in the Bell Labs CSP style
Related
Josuttis states ["Standard Library", 2nd ed, pg 1003]:
Futures allow you to block until data by another thread is provided or another thread is done. However, a future can pass data from one thread to another only once. In fact, a future's major purpose is to deal with return values or exceptions of threads.
On the other hand, a shared_future<void> can be used by multiple threads, to identify when another thread has done its job.
Also, in general, high-level concurrency features (such as futures) should be preferred to low-level ones (such as condition_variables).
Therefore, I'd like to ask: Is there any situation (requiring synchronization of multiple threads) in which a shared_future<void> won't suffice and a condition_variable is essential?
As already pointed out in the comments by #T.C. and #hlt, the use of futures/shared_futures is mostly limited in the sense that they can only be used once. So for every communication task you have to have a new future. The pros and cons are nicely explained by Scott Meyers in:
Item 39: Consider void futures for one-shot event
communication.
Scott Meyers: Effective Modern C++ (emphasis mine)
His conclusion is that using promise/future pairs dodges many of the problems with the use of condidition_variables, providing a nicer way of communicating one-shot events. The price to pay is that you are using dynamically allocated memory for the shared states and more importantly, that you have to have one promise/future pair for every event that you want to communicate.
While the notion of using high-level abstracts instead of low-level abstract is laudable, there is a misconception here. std::future is not a high-level replacement for std::conditional_variable. Instead, it is a specific high-level construct build for a specific use-case of std::condition_variable - namely, a one-time return of the value.
Obviously, not all uses of condition variable is for this scenario. For example, an message queue can not be implemented with std::future, no matter how much you try. Such a thread is another high-level construct built on low-level building block. So yes, shoot for high-level constructs, but do not expect a one-to-one map mapping between high and low level.
In standard C++, can I group dependent threads together, so the scheduler can avoid assignment of those threads off a single CCX in Zen architecture?
Due to drawback of slow inter-CCX communication speed in Zen architecture, I would like to group threads as described. Are there some kind of ISO C++ methodologies that I can keep threads that are dependent on one another in group so the scheduler can avoid putting them off a CCX?
In other words, how can I explicitly tell the scheduler the level of cohesion of threads in portable code.
I have asked a related question before Why OCaml's threading is considered as `not enough`?
No matter how "bad" ocaml's threading is, I notice some libraries say they can do real threading.
For example, Lwt
Lwt offers a new alternative. It provides very light-weight
cooperative threads; ``launching'' a thread is a very fast operation,
it does not require a new stack, a new process, or anything else.
Moreover context switches are very fast. In fact, it is so easy that
we will launch a thread for every system call. And composing
cooperative threads will allow us to write highly asynchronous
programs.
Also Jane Street's aync_core also provides similar things, if I am right.
But I am quite confused. Do Lwt or aync_core provide threading like Java threading?
If I use them, can I utilise multiple cpu?
In what way, can I get a "real threading" (just like in Java) in OCaml?
Edit
I am still confused.
Let me add a scenario:
I have a server (16 cpu cores) and a server application.
What the server application does are:
It listens to requests
For each request, it starts a computational task (let's say costs 2 minutes to finish)
When each task finishes, the task will either return the result back to the main or just send the result back to client directly
In Java, it is very easy. I create a thread pool, then for each request, I create a thread in that pool. that thread will run the computational task. This is mature in Java and it can utilize the 16 cpu cores. Am I right?
So my question is: can I do the same thing in OCaml?
The example of parallelized server that you cite is one of those embarassingly parallel problem that are well solved with a simple multiprocessing model, using fork. This has been doable in OCaml for decades, and yes, you will an almost linear speedup using all the cores of your machine if you need.
To do that using the simple primitives of the standard library, see this Chapter of the online book "Unix system programming in OCaml" (first released in 2003), and/or this chapter of the online book "Developing Applications with OCaml" (first released in 2000).
You may also want to use higher-level libraries such as Gerd Stolpmann's OCamlnet library mentioned by rafix, which provides a lot of stuff from direct helper for the usual client/server design, to lower-level multiprocess communication libraries; see the documentation.
The library Parmap is also interesting, but maybe for slightly different use case (it's more that you have a large array of data available all at the same time, that you want to process with the same function in parallel): a drop-in remplacement of Array.map or List.map (or fold) that parallelizes computations.
The closest thing you will find to real (preemptive) threading is the built in threading library. By that mean I mean that your programming model will be the same but with 2 important differences:
OCaml's native threads are not lightweight like Java's.
Only a single thread executes at a time, so you cannot take advantage of multiple processes.
This makes OCaml's threads a pretty bad solution to either concurrency or parallelism so in general people avoid using them. But they still do have their uses.
Lwt and Async are very similar and provide you with a different flavour of threading - a cooperative style. Cooperative threads differ from preemptive ones in the fact context switching between threads is explicit in the code and blocking calls are always apparent from the type signature. The cooperative threads provided are very cheap so very well suited for concurrency but again will not help you with parallelilsm (due to the limitations of OCaml's runtime).
See this for a good introduction to cooperative threading: http://janestreet.github.io/guide-async.html
EDIT: for your particular scenario I would use Parmap, if the tasks are so computationally intensive as in your example then the overhead of starting the processes from parmap should be negligible.
I heard that there are 3 kind of concurrency.
Deterministic concurrency
Message-passing concurrency
Shared-state concurrency
I know #2 (=actor model) and #3 (=general threading), but not #1. What's that?
Deterministic concurrency is a concurrent programming model such that programs written in this model have the following property: for a given set of inputs, the output values of a program are the same for any execution schedule. This means that the outputs of the program depend solely on the inputs of the program.
There are ways to ensure this property. One of the ways is the so-called single-assignment programming where variables don't have to be initialized, but may be assigned at most once. Reading an uninitialized variable stalls until it's assigned a value (possibly by some other thread). The Mozart programming language has support for these.
Another way is to use ownership analysis to determine which threads 'own' different references, and to ensure that no 2 threads write to the reference at the same 'time', so there are no data races.
I haven't heard the term before, but coroutines come to mind. They don't provide "true" concurrency, in the sense that only one routine is executing at any particular moment, but they're concurrent in the sense that a group of interacting coroutines can all make progress without having to wait for each other to finish.
I have never come across multithreading but I hear about it everywhere. What should I know about it and when should I use it? I code mainly in c++.
Mostly, you will need to learn about MT libraries on OS on which your application needs to run. Until and unless C++0x becomes a reality (which is a long way as it looks now), there is no support from the language proper or the standard library for threads. I suggest you take a look at the POSIX standard pthreads library for *nix and Windows threads to get started.
This is my opinion, but the biggest issue with multithreading is that it is difficult. I don't mean that from an experienced programmer point of view, I mean it conceptually. There really are a lot of difficult concurrency problems that appear once you dive into parallel programming. This is well known, and there are many approaches taken to make concurrency easier for the application developer. Functional languages have become a lot more popular because of their lack of side effects and idempotency. Some vendors choose to hide the concurrency behind API's (like Apple's Core Animation).
Multitheaded programs can see some huge gains in performance (both in user perception and actual amount of work done), but you do have to spend time to understand the interactions that your code and data structures make.
MSDN Multithreading for Rookies article is probably worth reading. Being from Microsoft, it's written in terms of what Microsoft OSes support(ed in 1993), but most of the basic ideas apply equally to other systems, with suitable renaming of functions and such.
That is a huge subject.
A few points...
With multi-core, the importance of multi-threading is now huge. If you aren't multithreading, you aren't getting the full performance capability of the machine.
Multi-threading is hard. Communicating and synchronization between threads is tricky to get right. Problems are often intermittent, hard to diagnose, and if the design isn't right for multi-threading, hard to fix.
Multi-threading is currently mostly non-portable and platform specific.
There are portable libraries with wrappers around threading APIs. Boost is one. wxWidgets (mainly a GUI library) is another. It can be done reasonably portably, but you won't have all the options you get from platform-specific APIs.
I've got an introduction to multithreading that you might find useful.
In this article there isn't a single
line of code and it's not aimed at
teaching the intricacies of
multithreaded programming in any given
programming language but to give a
short introduction, focusing primarily
on how and especially why and when
multithreaded programming would be
useful.
Here's a link to a good tutorial on POSIX threads programming (with diagrams) to get you started. While this tutorial is pthread specific, many of the concepts transfer to other systems.
To understand more about when to use threads, it helps to have a basic understanding of parallel programming. Here's a link to a tutorial on the very basics of parallel computing intended for those who are just becoming acquainted with the subject.
The other replies covered the how part, I'll briefly mention when to use multithreading.
The main alternative to multithreading is using a timer. Consider for example that you need to update a little label on your form with the existence of a file. If the file exists, you need to draw a special icon or something. Now if you use a timer with a low timeout, you can achieve basically the same thing, a function that polls if the file exists very frequently and updates your ui. No extra hassle.
But your function is doing a lot of unnecessary work, isn't it. The OS provides a "hey this file has been created" primitive that puts your thread to sleep until your file is ready. Obviously you can't use this from the ui thread or your entire application would freeze, so instead you spawn a new thread and set it to wait on the file creation event.
Now your application is using as little cpu as possible because of the fact that threads can wait on events (be it with mutexes or events). Say your file is ready however. You can't update your ui from different threads because all hell would break loose if 2 threads try to change the same bit of memory at the same time. In fact this is so bad that windows flat out rejects your attempts to do it at all.
So now you need either a synchronization mechanism of sorts to communicate with the ui one after the other (serially) so you don't step on eachother's toes, but you can't code the main thread part because the ui loop is hidden deep inside windows.
The other alternative is to use another way to communicate between threads. In this case, you might use PostMessage to post a message to the main ui loop that the file has been found and to do its job.
Now if your work can't be waited upon and can't be split nicely into little bits (for use in a short-timeout timer), all you have left is another thread and all the synchronization issues that arise from it.
It might be worth it. Or it might bite you in the ass after days and days, potentially weeks, of debugging the odd race condition you missed. It might pay off to spend a long time first to try to split it up into little bits for use with a timer. Even if you can't, the few cases where you can will outweigh the time cost.
You should know that it's hard. Some people think it's impossibly hard, that there's no practical way to verify that a program is thread safe. Dr. Hipp, author of sqlite, states that thread are evil. This article covers the problems with threads in detail.
The Chrome browser uses processes instead of threads, and tools like Stackless Python avoid hardware-supported threads in favor of interpreter-supported "micro-threads". Even things like web servers, where you'd think threading would be a perfect fit, and moving towards event driven architectures.
I myself wouldn't say it's impossible: many people have tried and succeeded. But there's no doubt writting production quality multi-threaded code is really hard. Successful multi-threaded applications tend to use only a few, predetermined threads with just a few carefully analyzed points of communication. For example a game with just two threads, physics and rendering, or a GUI app with a UI thread and background thread, and nothing else. A program that's spawning and joining threads throughout the code base will certainly have many impossible-to-find intermittent bugs.
It's particularly hard in C++, for two reasons:
the current version of the standard doesn't mention threads at all. All threading libraries and platform and implementation specific.
The scope of what's considered an atomic operation is rather narrow compared to a language like Java.
cross-platform libraries like boost Threads mitigate this somewhat. The future C++0x will introduce some threading support. But boost also has good interprocess communication support you could use to avoid threads altogether.
If you know nothing else about threading than that it's hard and should be treated with respect, than you know more than 99% of programmers.
If after all that, you're still interested in starting down the long hard road towards being able to write a multi-threaded C++ program that won't segfault at random, then I recommend starting with Boost threads. They're well documented, high level, and work cross platform. The concepts (mutexes, locks, futures) are the same few key concepts present in all threading libraries.