Using multi core support in C / C++

Using multi core support in C / C++ - c++

I have seen in some posts it has been said that to use multiple cores of processor use Boost thread (use multi-threading) library. Usually threads are not visible to operating system. So how can we sure that multi-threading will support usage of multi-cores. Is there a difference between Java threads and Boost threads?

The operating system is also called a "supervisor" because it has access to everything. Since it is responsible for managing preemptive threads, it knows exactly how many a process has, and can inspect what they are doing at any time.
Java may add a layer of indirection (green threads) to make many threads look like one, depending on JVM and configuration. Boost does not do this, but instead only wraps the POSIX interface which usually communicates directly with the OS kernel.
Massively multithreaded applications may benefit from coalescing threads, so that the number of ready-to-run threads matches the number of logical CPU cores. Reducing everything to one thread may be going too far, though :v) and #Voo says that green threads are only a legacy technology. A good JVM should support true multithreading; check your configuration options. On the C++ side, there are libraries like Intel TBB and Apple GCD to help manage parallelism.

Related

Boost thread, Posix thread and STD thread, why do they offer different performances?

As Far as I know,
In computer science, a thread of execution is the smallest sequence of
programmed instructions that can be managed independently by a
scheduler, which is typically a part of the operating system. The
implementation of threads and processes differs between operating
systems, but in most cases a thread is a component of a process.
Multiple threads can exist within one process, executing concurrently
and sharing resources such as memory, while different processes do not
share these resources. In particular, the threads of a process share
its executable code and the values of its variables at any given time.[1]
When I decided to write a multi thread program in c++, i faced with many choices like boost thread, posix thread and std thread.
A simple search on internet shows a performance measurement taken by boost.org website here.
My question is a bit more basic and performance related as well.
Basically, Why do they differ in performance ? Why, for example thread type A, is faster than the others? The are written by most professional programmers, are ran by powerful OSs ,yet they offer different performance.
What makes them faster or slower?

The Boost documentation refers to the Fiber library, which are not actually threads. Creating what the library calls a fiber (essentially a user-space thread or coroutine, sometimes also referred to as green threads) does not create a separate schedulable entity on the kernel side, so it can be much more efficient at creation time. Other things could be less efficient because I/O operations necessarily become much more involved under this model (because a fiber doing I/O should not block the operating system thread it runs on if other fibers could do work there).
Note that some of the coroutine implementations out there are well out of the conceptual limits of the de-facto GNU/Linux ABI and other POSIX-like operating systems, so they should be considered ugly hacks at best.

Can fibers migrate between threads?

Can a fiber created in thread A switch to another fiber created in thread B? To make the question more specific, some operating systems have fibers natively implemented (windows fibers),
other need to implement it themselves (using setjump longjump in linux etc.).
Libcoro for example wraps this all up in a single API (for windows it’s just a wrapper for native fibers, for Linux it implements it itself etc.)
So, if it's possible to migrate fibers between threads, can you give me an example usage in windows (linux) in c/c++?
I found something about fiber migration in the boost library documentation, but it's not specific enough about it's implementation and platform dependence. I still want to understand how to do it myself using only windows fibers for example (or using Libcoro on linux).
If it's not possible in a general way, why so?
I understand that fibers are meant to be used as lightweight threads for cooperative multitasking over a single thread, they have cheap context switching compared to regular threads, and they simplify the programming.
An example usage is a system with several threads, each having several fibers doing some kind of work hierarchy on their parent thread (never leaving the parent thread).
Even though it's not the intended use I still want to learn how to do it if it's possible in a general way, because I think I can optimize the work load on my job system by migrating fibers between threads.

The mentioned boost.fiber uses boost.context (callcc/continuation) to implement context switching.
Till boost-1.64 callcc was implemented in assembler only, boost-1.65 enables you to choose between assembler, Windows Fibers (Windows) or ucontext (POSIX if available; deprecated API by POSIX).
The assembler implementation is faster that the other two (2 orders of magnitude compared to ucontext).
boost.fiber uses callcc to implement lightweight threads/fibers - the library provides fiber schedulers that allow to migrate fibers between threads.
For instance one provided scheduler steals fibers from other threads if its run-queue goes out of work (fibers that are ready/that can be resumed).
(so you can choose Windows Fibers that get migrated between threads).

How to ensure that std::thread are created in multi core?

I am using visual studio 2012. I have a module, where, I have to read a huge set of files from the hard disk after traversing their corresponding paths through an xml. For this i am doing
std::vector<std::thread> m_ThreadList;
In a while loop I am pushing back a new thread into this vector, something like
m_ThreadList.push_back(std::thread(&MyClass::Readfile, &MyClassObject, filepath,std::ref(polygon)));
My C++11 multi threading knowledge is limited.The question that I have here , is , how do create a thread on a specific core ? I know of parallel_for and parallel_for_each in vs2012, that make optimum use of the cores. But, is there a way to do this using standard C++11?

As pointed out in other comments, you cannot create a thread "on a specific core", as C++ has no knowledge of such architectural details. Moreover, in the majority of cases, the operating system will be able to manage the distribution of threads among cores/processors well enough.
That said, there exist cases in which forcing a specific distribution of threads among cores can be beneficial for performance. As an example, by forcing a thread to execute onto a one specific core it might be possible to minimise data movement between different processor caches (which can be critical for performance in certain memory-bound scenarios).
If you want to go down this road, you will have to look into platform-specific routines. E.g., for GNU/linux with POSIX threads you will want pthread_setaffinity_np(), in FreeBSD cpuset_setaffinity(), in Windows SetThreadAffinityMask(), etc.
I have some relevant code snippets here if you are interested:
http://gitorious.org/piranhapp0x/mainline/blobs/master/src/thread_management.cpp

I'm fairly certain that core affinity isn't included in std::thread. The assumption is that the OS is perfectly capable of making best possible use of the cores available. In all but the most extreme of cases you're not to going to beat the OS's decision, so the assumption is a fair one.
If you do go down that route then you have to add some decision making to your code to take account of machine architecture to ensure that your decision is better than the OSes on every machine you run on. That takes a lot of effort! For starters you'll be wanting to limit the number of threads to match the number of cores on the computer. And you don't have any knowledge of what else is going on in the machine; the OS does!
Which is why thread pools exist. They tend by default to have as many threads as there are cores, automatically set up by the language runtime. AFAIK C++11 doesn't have one of those. So the one good thing you can do to get the optimum performance is to find out how many cores there are and limit the number of threads you have to that number. Otherwise it's probably just best to trust the OS.
Joachim Pileborg's comment is well worth paying attention to, unless the work done by each thread outweighs the I/O overhead.

As a quick overview of threading in the context of dispatching threads to cores:
Most modern OS's make use of kernel level threads, or hybrid. With kernel level threading, the OS "sees" all the threads in each process; in contrast to user level threads, which are employed in Java, where the OS sees a single process, and has no knowledge of threading. Now, because, with kernel level threading, the OS can recognise the separate threads of a process, and manages their dispatch onto a given core, there is the potential for true parallelism - where multiple threads of the same process are run on different cores. You, as the programmer, will have no control over this however, when employing std::thread; the OS decides. With user level threading, all the management of threads are done at the user level, with Java, a library manages the "dispatch". In the case of hybrid threading, kernel threading is used, where each kernel thread is actually a set of user level threads.

How make a threading mechanism in C++?

I know there are some threading libraries for C++, like Pthread, Boost etc out there, but how are they working? There must be an implementation of the logic somewhere.
Let's say that I would like to write my own threading mechanism in C++, not using any library, how would I start? What should I have in mind when writing it?

You'd directly call the underlying API calls in the operating system. For example, CreateThread. Naturally, this is cumbersome and platform-specific, which is why we like to use portable C++ threading libraries...

In C++98/03, there is no notion of a "thread", so the question cannot be answered within the language. In C++11, the answer is to use <thread>.
On the implementation side, threading is an operating system feature. The operating system already has to schedule multiple processes (i.e. separate programs), and a multi-threading OS adds to that the ability to schedule multiple threads within one process. A the very heart, the OS may or may not take advantage of having physically more than one CPU (though that also applies to simple multi-processing; and conversely you can schedule multiple threads on a single CPU). At the heart of the programming, you will need hardware support for synchronisation primitives like atomic read/writes and atomic compare-and-swap to implement correct memory access. (This is not needed for only multi-processing, because separate processes have distinct memory; although it will be needed by the OS itself if there are multiple physical CPUs in use.)

Well, you need something which is able to run several threads.
If you are working on developing an operating system kernel on the bare metal, I think that current multi-core processors have only one core working after their power-on reset. Even the BIOS on most PCs probably keep only one core working (and the other cores idle). So you'll need to write (assembly, non-portable) code to start other cores.
And (as James reminded you), most of the time you are using some operating system kernel. For instance, on Linux (I don't know about Windows), threads are known by the kernel (because the tasks it is scheduling are threads) and they need to be initiated by the Linux clone(2) system call.
Often, kernel threads are quite heavy, and the system has a library (NPTL for Linux Posix threads) which may use fewer kernel threads than user threads (actually Linux NPTL is a 1:1 mapping between kernel and user threads, but on some other systems, like probably Solaris, things are different).

You can't write your own threading mechanism, unless you mean pseudo-threads like co-routines and not actual concurrently executing threads. This is because the fundamental thread mechanism is defined by the kernel and you can't change it nor implement your own. Any library you write must fall back, eventually, to the operating system.

Multithreading vs multiprocessing

I am new to this kind of programming and need your point of view.
I have to build an application but I can't get it to compute fast enough. I have already tried Intel TBB, and it is easy to use, but I have never used other libraries.
In multiprocessor programming, I am reading about OpenMP and Boost for the multithreading, but I don't know their pros and cons.
In C++, when is multi threaded programming advantageous compared to multiprocessor programming and vice versa?Which is best suited to heavy computations or launching many tasks...? What are their pros and cons when we build an application designed with them? And finally, which library is best to work with?

Multithreading means exactly that, running multiple threads. This can be done on a uni-processor system, or on a multi-processor system.
On a single-processor system, when running multiple threads, the actual observation of the computer doing multiple things at the same time (i.e., multi-tasking) is an illusion, because what's really happening under the hood is that there is a software scheduler performing time-slicing on the single CPU. So only a single task is happening at any given time, but the scheduler is switching between tasks fast enough so that you never notice that there are multiple processes, threads, etc., contending for the same CPU resource.
On a multi-processor system, the need for time-slicing is reduced. The time-slicing effect is still there, because a modern OS could have hundred's of threads contending for two or more processors, and there is typically never a 1-to-1 relationship in the number of threads to the number of processing cores available. So at some point, a thread will have to stop and another thread starts on a CPU that the two threads are sharing. This is again handled by the OS's scheduler. That being said, with a multiprocessors system, you can have two things happening at the same time, unlike with the uni-processor system.
In the end, the two paradigms are really somewhat orthogonal in the sense that you will need multithreading whenever you want to have two or more tasks running asynchronously, but because of time-slicing, you do not necessarily need a multi-processor system to accomplish that. If you are trying to run multiple threads, and are doing a task that is highly parallel (i.e., trying to solve an integral), then yes, the more cores you can throw at a problem, the better. You won't necessarily need a 1-to-1 relationship between threads and processing cores, but at the same time, you don't want to spin off so many threads that you end up with tons of idle threads because they must wait to be scheduled on one of the available CPU cores. On the other hand, if your parallel tasks requires some sequential component, i.e., a thread will be waiting for the result from another thread before it can continue, then you may be able to run more threads with some type of barrier or synchronization method so that the threads that need to be idle are not spinning away using CPU time, and only the threads that need to run are contending for CPU resources.

There are a few important points that I believe should be added to the excellent answer by #Jason.
First, multithreading is not always an illusion even on a single processor - there are operations that do not involve the processor. These are mainly I/O - disk, network, terminal etc. The basic form for such operation is blocking or synchronous, i.e. your program waits until the operation is completed and then proceeds. While waiting, the CPU is switched to another process/thread.
if you have anything you can do during that time (e.g. background computation while waiting for user input, serving another request etc.) you have basically two options:
use asynchronous I/O: you call a non-blocking I/O providing it with a callback function, telling it "call this function when you are done". The call returns immediately and the I/O operation continues in the background. You go on with the other stuff.
use multithreading: you have a dedicated thread for each kind of task. While one waits for the blocking I/O call, the other goes on.
Both approaches are difficult programming paradigms, each has its pros and cons.
with async I/O the logic of the program's logic is less obvious and is difficult to follow and debug. However you avoid thread-safety issues.
with threads, the challange is to write thread-safe programs. Thread safety faults are nasty bugs that are quite difficult to reproduce. Over-use of locking can actually lead to degrading instead of improving the performance.
(coming to the multi-processing)
Multithreading made popular on Windows because manipulating processes is quite heavy on Windows (creating a process, context-switching etc.) as opposed to threads which are much more lightweight (at least this was the case when I worked on Win2K).
On Linux/Unix, processes are much more lightweight. Also (AFAIK) threads on Linux are implemented actually as a kind of processes internally, so there is no gain in context-switching of threads vs. processes. However, you need to use some form of IPC (inter-process communications), as shared memory, pipes, message queue etc.
On a more lite note, look at the SQLite FAQ, which declares "Threads are evil"! :)

To answer the first question:
The best approach is to just use multithreading techniques in your code until you get to the point where even that doesn't give you enough benefit. Assume the OS will handle delegation to multiple processors if they're available.
If you actually are working on a problem where multithreading isn't enough, even with multiple processors (or if you're running on an OS that isn't using its multiple processors), then you can worry about discovering how to get more power. Which might mean spawning processes across a network to other machines.
I haven't used TBB, but I have used IPP and found it to be efficient and well-designed. Boost is portable.

Just wanted to mention that the Flow-Based Programming ( http://www.jpaulmorrison.com/fbp ) paradigm is a naturally multiprogramming/multiprocessing approach to application development. It provides a consistent application view from high level to low level. The Java and C# implementations take advantage of all the processors on your machine, but the older C++ implementation only uses one processor. However, it could fairly easily be extended to use BOOST (or pthreads, I assume) by the use of locking on connections. I had started converting it to use fibers, but I'm not sure if there's any point in continuing on this route. :-) Feedback would be appreciated. BTW The Java and C# implementations can even intercommunicate using sockets.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js