Is concurrent programming the same as parallel programming? - concurrency

Are they both the same thing? Looking just at what concurrent or parallel means in geometry, I'd definetely say no:
In geometry, two or more lines are said to be concurrent if they intersect at a single point.
and
Two lines in a plane that do not
intersect or meet are called parallel
lines.
Again, in programming, do they have the same meaning? If yes...Why?
Thanks

I agree that the geometry vocabulary is in conflict. Think of train tracks instead: Two trains which are on parallel tracks can run independently and simultaneously with little or no interaction. These trains run concurrently, in parallel.
The basic usage difficulty is that "concurrent" can mean "at the same time" (with the trains or code) or "at the same place" (with the geometric lines). For many practical purposes (trains, thread resources) these two notions are directly in conflict.
Natural language is supposed to be silly, ambiguous, and confusing. But we're programmers. We can take refuge in the clarity, simplicity, and elegance of our formal programming languages. Like perl.

From Wikipedia:
Concurrent computing is a form of
computing in which programs are
designed as collections of interacting
computational processes that may be
executed in parallel.
Basically, programs can be written as concurrent programs if they are made up of smaller interacting processes. Parallel programming is actually doing these processes at the same time.
So I suppose that concurrent programming is really a style that lends itself to processes being executed in parallel to improve performance.

No, definitely concurrent is different from parallel. here is exactly how.
Concurrency refers to the sharing of resources in the same time frame. As an example, several processes may share the same CPU or share memory or an I/O device.
Now, by definition two processes are concurrent if an only if the second starts execution before the first has terminated (on the same CPU). If the two processes both run on the same - say for now - single-core CPU the processes are concurrent but not parallel: in this case, parallelism is only virtual and refers to the OS doing timesharing. The OS seems to be executing several processes simultaneously. If there is only one single-core CPU, only one instruction from only one process can be executing at any particular time. Since the human time scale is billions of times slower than that of modern computers, the OS can rapidly switch between processes to give the appearance of several processes executing at the same time.
If you instead run the two processes on two different CPUs, the processes are parallel: there is no sharing in the same time frame, because each process runs on its own CPU. The parallelism in this case is not virtual but physical. It is worth noting here that running on different cores of the same multi-core CPU still can not be classified as fully parallel, because the processes will share the same CPU caches and will even contend for them.

Related

Finding out how many threads make sense to create

I have some calculation task on a large amount of the data - so it can be quite easily to parallel. Next question is how many threads does it make sense to create. Of course I can measure time for different number of thread on my machine, but what if a program will be run on different machines, so I can't really make manual measurement. Is just get number of threads from std::thread::hardware_concurrency() good enough, or there are some other ways?
That function (std::thread::hardware_concurrency()) will give you the total core count, including hyperthreading.
If your program does intensive number crunching I would say using only physical cores and setting processor affinity is the best choice.
You can know the current processor topology with hwloc library which works in most platforms.
You may find an comprehensible explanation (though a bit old) here.
If there is lot of I/O then you may run two threads for processor to allow one to process data while other is waiting for input, or one extra thread without affinity so it can take processor time while others are waiting for I/O, but this is a very rough estimation: better measure in your machine.
If you can test in other processors, you may have different strategy for each processor.

When is better to use CPU or I/O intensive code in child processes [ C++ ]

I got an exam question and didn't know answer.
The task was:
A programmer would like to create a very fast application, hence it organizes its software in 13 processes (a parent and 12 children), all running in parallel.
Child threads are very :
I/O-intensive, hence make very frequent use of system calls (read/write from files/pipes/sockets, write on standard output, etc.)
CPU intensive, hence make very frequent use of system calls:
Describe when this would be a good choice, when this would be a bad choice, and motivate your answer.
Number 1 and 2, are different questions. So answer should be for both. Good and bad sides of I/O intensive, good and bad sides of CPU intensive.
* Sorry for inappropriate topic, I changed it.
* "Child Threads" was on exam paper. So I copied it. I think, my professor wanted to write "process"
Thank you
Multi-threading and multi-processing is always best when it is embarrassingly parallel so each thread or process does unrelated to other thread work and is worst when threads try to share same resource.
I/O-intensive, hence make very frequent use of system calls
(read/write from files/pipes/sockets, write on standard output,
etc.)
Good idea to have such processes is when each process does I/O (reads from and/or writes to) with different media. Bad idea is when these try to use same media and so are waiting after each other.
On older media where the device has to move reading/writing heads between several tracks it can seriously hinder the performance.
CPU intensive, hence make very frequent use of system calls
Here is limitation how lot of processor cores the system has. Best case is to have one CPU intensive process per processor core. So if these processes share same core then it is worst and when these each run on different core then it is best.
The overheads from creating processes, frequent context switching between processes and communication between processes on same core are actually causing multiprocessing on single core to perform worse than same calculations done by single process (I assume in hands of master on both cases).
Often multiprocessing is not done only because of performance considerations. The more frequent reason is that the architecture consisting of smaller modules is cleaner and quality of such modules is simpler to test and ensure in separation.
My professor helped me with answer.
Actually, it was better to have 13 cores in PC, when it was using CPU intensive code
The problem was in time delay, when it has I/O intensive code. So it is better to have less cores to use them perfectly.

the meaning of multiple threading on single core cpu

Once I thought the only occasion multiple threads should be used is when IO processing is needed.
But I heard it's also useful without IO processing. Because it helps to occupy more CPU resources.
In my understanding, this would be
the process with more threads are given more CPU time.
Is this why multiple threads help improve performance even on single core?
One possible reason you can see greater performance from multiple threads on a single CPU is that CPUs tend to be really good at instruction reordering and making use of instruction-level parallelism. Threads have fewer data and control dependencies with respect to one another than any two sequential instructions within a single thread, and therefore they offer more possibilities for the CPU and OS-level schedulers and re-ordering mechanisms to be very clever.
Don't forget that things like "reads and writes in memory" are still "I/O" when viewed in a particular way. These are relatively slow operations, and much of the pipelining in modern CPUs is used to hide memory latency - having multiple threads executing at once can be useful for filling up time that would otherwise have to be filled with delay slots where there are data hazards within a single thread.
That said, threads are often not a good solution to increase performance, and can have precisely the opposite effect. It can be very easy to saturate all available memory bandwidth using a single thread on some problems.

Multithreading efficiency in C++

I am trying to learn threading in C++, and just had a few questions about it (more specifically <thread>.
Let's say the machine this code will run on has 4 cores, should I split up an operation into 4 threads? If I were to create 8 threads instead of 4, would this run slower on a 4 core machine? What if the processor has hyperthreading, should I try and make the threads match the number of physical cores or logical cores?
Should I just not worry about the number of cores a machine has, and try to create as many threads as possible?
I apologize if these questions have been already answered; I've been looking for information about threading with <thread>, which was introduced in c11 so I haven't been able to find too much about it.
The program in question is going to run many independent simulations.
If anybody has any insight about <thread> or just multithreading in general, I would be glad to hear it.
If you are performing pure calculations with no I/O - and those calculations are freestanding and not relying on results from other calculations happening in another thread, the maximum number of such threads should be the number of cores (possibly one or two less if the system is also loaded with other tasks).
If you are doing network I/O or similar, more threads are certainly a possibility.
If you are doing disk-I/O, a single thread reading from the disk is often best, because disk reads from multiple threads leads to moving the read/write head around on the disk, which just makes things slower.
If you're using threads for to make the code simpler, then the number of threads will probably depend on what you are doing.
It also depends on how "freestanding" each thread is. If they need to share data in complex ways, the sharing/waiting for other thread/etc, may well make it slower with more threads.
And as others have said, try to make your framework for this flexible and test different options. Preferably on multiple machines (unless you only have one kind of machine that you will ever run your code on).
There is no such thing as <threads.h>, you mean <thread>, the thread support library introduced in C++11.
The only answer to your question is "test and see". You can make your code flexible enough, so that it can be run by passing an N parameter (where N is the desired number of threads).
If you are CPU-bound, the answer will be very different from the case when you are IO bound.
So, test and see! For your reference, this link can be helpful. And if you are serious, then go ahead and get this book. Multithreading, concurrency, and the like are hairy topics.
Let's say the machine this code will run on has 4 cores, should I split up an operation into 4 threads?
If some portions of your code can be run in parallel, then yes it can be made to go faster, but this is very tricky to do since loading threads and switching data between them takes a ton of time.
If I were to create 8 threads instead of 4, would this run slower on a 4 core machine?
It depends on the context switching it has to do. Sometimes the execution will switch between threads very often and sometimes it will not but this is very difficult to control. It will not in any case run faster than 4 threads doing the same work.
What if the processor has hyperthreading, should I try and make the threads match the number of physical cores or logical cores?
Hyperthreading works nearly the same as having more cores. When you will notice the differences between a real core and an execution core, you will have enough knowledge to work around the caveats.
Should I just not worry about the number of cores a machine has, and try to create as many threads as possible?
NO, threads are hard to manage, avoid them as much as you can.
The program in question is going to run many independent simulations.
You should look into openmp. It is a library in C made to parallelize computation when your program can be split up. Do not confuse parallel with concurrent. Concurrent is simply multiple threads working together while parallel is made specifically to speed up your application. Maybe openmp is overkill for your thing, but it is a good thing to know when you are approaching parallel computing
Don't think of the number of threads you need as in comparison to the machine you're running on. Threading is valuablue any time you have a process that:
A: There is some very slow operation, that the rest of the process need not wait for.
B: Certain functions can run faster than one another and don't need to be executed inline.
C: There is a lot of non-order dependant I/O going on(web servers).
These are just a few of the obvious examples when launching a thread makes sense. The number of threads you launch is therefore more dependant on the number of these scenarios that pop up in your code, than the architecture you expect to run on. In fact unless you're running a process that really really needs to be optimized, it is likely that you can only eek out a few percentage points of additional performance by benchmarking for your architecture in comparison to the number of threads that you launch, and in modern computers this number shouldn't vary much at all.
Let's take the I/O example, as it is the scenario that will see the most benefit. Let's assume that some program needs to interract with 200 users over the network. Network I/O is very very slow. Thousands of times slower than the CPU. If we were to handle each user in turn we would waste thousands of processor cycles just waiting for data to come from the first user. Could we not have been processing information from more than one user at a time? In this case since we have roughly 200 users, and the data that we're waiting for we know to be 1000s of times slower than what we can handle(assuming we have a minimal amount of processing to do on this data), we should launch as many threads as the operating system allows. A web server that takes advantage of threading can serve hundreds of more people per second than one that does not.
Now, let's consider a less I/O intensive example, where say we have several functions that execute in turn, but are independant of one another and some of them might run faster, say because there is disk I/O in one, and no disk I/O in another. In this case, our I/O is still fairly fast, but we will certainly waste processing time waiting for the disk to catch up. As such we can launch a few threads, just to take advantage of our processing power, and minimize wasted cycles. However, if we launch as many threads as the operating system allows we are likely to cuase memory management issues for branch predictors, etc... and launching too many threads in this case is actually sub optimal and can slow the program down. Note that in this, I never mentioned how many cores the machine has! NOt that optimizing for different architectures isn't valuable, but if you optimize for one architecture you are likely very close to optimal for most. Assuming, again, that you're dealing with all reasonably modern processors.
I think most people would say that large scale threading projects are better supported by languages other than c++ (go, scala,cuda). Task parallelism as opposed to data parallelism works better in c++. I would say that you should create as many threads as you have tasks to dole out but if data parallelism is more related to your problem consider maybe using cuda and linking to the rest of your project at a later time
NOTE: if you look at some sort of system monitor you will notice that there are likely far more than 8 threads running, I looked at my computer and it had hundreds of threads running at once so don't worry too much about the overhead. The main reason I choose to mention the other languages is that managing many threads in c++ or c tends to be very difficult and error prone, I did not mention it because the c++ program will run slower(which unless you use cuda it probably won't)
In regards to hyper-threading let me comment on what I have found from experience.
In large dense matrix multiplication hyper-threading actually gives worse performance. For example Eigen and MKL both use OpenMP (at least the way I have used them) and get better results on my system which has four cores and hyper-threading using only four threads instead of eight. Also, in my own GEMM code which gets better performance than Eigen I also get better results using four threads instead of eight.
However, in my Mandelbrot drawing code I get a big performance increase using hyper-threading with OpenMP (eight threads instead of four). The general trend (so far) seems to be that if the code works well using schedule(static) in OpenMP then hyper-threading does not help and may even be worse. If the code works better using schedule(dynamic) then hyper-threading may help.
In other words, my observation so far is that if the run time of each thread can vary a lot hyper-threading can help. If the run time of each thread is constant then it may even make performance worse. But YOU have to test and see for each case.

Executing C++ program on multiple processor machine

I developed a program in C++ for research purpose. It takes several days to complete.
Now i executing it on our lab 8core server machine to get results quickly, but i see machine assigns only one processor to my program and it remains at 13% processor usage(even i set process priority at high level and affinity for 8 cores).
(It is a simple object oriented program without any parallelism or multi threading)
How i can get true benefit from the powerful server machine?
Thanks in advance.
Partition your code into chunks you can execute in parallel.
You need to go read about data parallelism
and task parallelism.
Then you can use OpenMP or
MPI
to break up your program.
(It is a simple object oriented program without any parallelism or
multi threading)
How i can get true benefit from the powerful server machine?
By using more threads. No matter how powerful the computer is, it cannot spread a thread across more than one processor. Find independent portions of your program and run them in parallel.
C++0x threads
Boost threads
OpenMP
I personally consider OpenMP a toy. You should probably go with one of the other two.
You have to exploit multiparallelism explicitly by splitting your code into multiple tasks that can be executed independently and then either use thread primitives directly or a higher level parallelization framework, such as OpenMP.
If you don't want to make your program itself use multithreaded libraries or techniques, you might be able to try breaking your work up into several independent chunks. Then run multiple copies of your program...each being assigned to a different chunk, specified by getting different command-line parameters.
As for just generally improving a program's performance...there are profiling tools that can help you speed up or find the bottlenecks in memory usage, I/O, CPU:
https://stackoverflow.com/questions/tagged/c%2b%2b%20profiling
Won't help split your work across cores, but if you can get an 8x speedup in an algorithm that might be able to help more than multithreading would on 8 cores. Just something else to consider.