Just got started on multithreading (and multithreading in general) using C++11 threading library, and and wrote small short snipped of code.
#include <iostream>
#include <thread>
int x = 5; //variable to be effected by race
//This function will be called from a thread
void call_from_thread1() {
for (int i = 0; i < 5; i++) {
x++;
std::cout << "In Thread 1 :" << x << std::endl;
}
}
int main() {
//Launch a thread
std::thread t1(call_from_thread1);
for (int j = 0; j < 5; j++) {
x--;
std::cout << "In Thread 0 :" << x << std::endl;
}
//Join the thread with the main thread
t1.join();
std::cout << x << std::endl;
return 0;
}
Was expecting to get different results every time (or nearly every time) I ran this program, due to race between two threads. However, output is always: 0, i.e. two threads run as if they ran sequentially. Why am I getting same results and is there any ways to simulate or force race between two threads ?
Your sample size is rather small, and somewhat self-stalls on the continuous stdout flushes. In short, you need a bigger hammer.
If you want to see a real race condition in action, consider the following. I purposely added an atomic and non-atomic counter, sending both to the threads of the sample. Some test-run results are posted after the code:
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
void racer(std::atomic_int& cnt, int& val)
{
for (int i=0;i<1000000; ++i)
{
++val;
++cnt;
}
}
int main(int argc, char *argv[])
{
unsigned int N = std::thread::hardware_concurrency();
std::atomic_int cnt = ATOMIC_VAR_INIT(0);
int val = 0;
std::vector<std::thread> thrds;
std::generate_n(std::back_inserter(thrds), N,
[&cnt,&val](){ return std::thread(racer, std::ref(cnt), std::ref(val));});
std::for_each(thrds.begin(), thrds.end(),
[](std::thread& thrd){ thrd.join();});
std::cout << "cnt = " << cnt << std::endl;
std::cout << "val = " << val << std::endl;
return 0;
}
Some sample runs from the above code:
cnt = 4000000
val = 1871016
cnt = 4000000
val = 1914659
cnt = 4000000
val = 2197354
Note that the atomic counter is accurate (I'm running on a duo-core i7 macbook air laptop with hyper threading, so 4x threads, thus 4-million). The same cannot be said for the non-atomic counter.
There will be significant startup overhead to get the second thread going, so its execution will almost always begin after the first thread has finished the for loop, which by comparison will take almost no time at all. To see a race condition you will need to run a computation that takes much longer, or includes i/o or other operations that take significant time, so that the execution of the two computations actually overlap.
Related
I'm messing around with multithreading in c++ and here is my code:
#include <iostream>
#include <vector>
#include <string>
#include <thread>
void read(int i);
bool isThreadEnabled;
std::thread threads[100];
int main()
{
isThreadEnabled = true; // I change this to compare the threaded vs non threaded method
if (isThreadEnabled)
{
for (int i = 0;i < 100;i++) //this for loop is what I'm confused about
{
threads[i] = std::thread(read,i);
}
for (int i = 0; i < 100; i++)
{
threads[i].join();
}
}
else
{
for (int i = 0; i < 100; i++)
{
read(i);
}
}
}
void read(int i)
{
int w = 0;
while (true) // wasting cpu cycles to actually see the difference between the threaded and non threaded
{
++w;
if (w == 100000000) break;
}
std::cout << i << std::endl;
}
in the for loop that uses threads the console prints values in a random order ex(5,40,26...) which is expected and totally fine since threads don't run in the same order as they were initiated...
but what confuses me is that the values printed are sometimes more than the maximum value that int i can reach (which is 100), values like 8000,2032,274... are also printed to the console even though i will never reach that number, I don't understand why ?
This line:
std::cout << i << std::endl;
is actually equivalent to
std::cout << i;
std::cout << std::endl;
And thus while thread safe (meaning there's no undefined behaviour), the order of execution is undefined. Given two threads the following execution is possible:
T20: std::cout << 20
T32: std::cout << 32
T20: std::cout << std::endl
T32: std::cout << std::endl
which results in 2032 in console (glued numbers) and an empty line.
The simplest (not necessarily the best) fix for that is to wrap this line with a shared mutex:
{
std::lock_guard lg { mutex };
std::cout << i << std::endl;
}
(the brackets for a separate scope are not needed if the std::cout << i << std::endl; is the last line in the function)
I think my program has a bug because sometimes when I run my program, it outputs a lower number than 30000, such as 29999. But sometimes it runs correctly and gets to the 30000. My question is how can I fix this and why is it happening.
#include <iostream>
#include <thread>
using namespace std;
int counter;
int i;
void increment()
{
counter++;
}
int main()
{
counter = 0;
cout << "The value in counter is : " << counter << endl;
thread tarr[30000];
for (i = 0; i < 30000; i++)
{
tarr[i] = thread(increment);
}
for (i = 0; i < 30000; i++)
{
tarr[i].join(); //main thread waits for tarr to finish
}
cout << "After running 30,000 threads ";
cout << "the value in counter is : " << counter << endl;
return 0;
}
The problem is that counter++ can be broken down into three operations:
Load initial value to register
Increment the value
Store the new value back to memory
A single thread may do the first two steps, then pass up control to another thread to do the same. What this can mean is:
Thread one reads counter as 5
Thread one increments its internal copy to 6
Thread two reads counter as 5
Thread two increments its internal copy to 6
Thread two writes back 6 to counter
Thread one writes back 6 to counter
You should make counter std::atomic, or guard it with a std::mutex:
std::atomic<int> counter;
I want to know how to properly implement a program in C++, in which I have a function func that I want to be executed in a single thread. I want to do this, because I want to test the Single Core Speed of my CPU. I will loop this function(func) for about 20 times, and record the execution time of each repetition, then I will sum the results and get the average execution time.
#include <thread>
int func(long long x)
{
int div = 0;
for(long i = 1; i <= x / 2; i++)
if(x % i == 0)
div++;
return div + 1;
}
int main()
{
std::thread one_thread (func,100000000);
one_thread.join();
return 0;
}
So , in this program, does the func is executed on a single particular core ?
Here is the source code of my program:
#include <iostream>
#include <thread>
#include <iomanip>
#include <windows.h>
#include "font.h"
#include "timer.h"
using namespace std;
#define steps 20
int func(long long x)
{
int div = 0;
for(long i = 1; i <= x / 2; i++)
if(x % i == 0)
div++;
return div + 1;
}
int main()
{
SetFontConsolas(); // Set font consolas
ShowConsoleCursor(false); // Turn off the cursor
timer t;
short int number = 0;
cout << number << "%";
for(int i = 0 ; i < steps ; i++)
{
t.restart(); // start recording
std::thread one_thread (func,100000000);
one_thread.join(); // wait function return
t.stop(); // stop recording
t.record(); // save the time in vector
number += 5;
cout << "\r ";
cout << "\r" << number << "%";
}
double time = 0.0;
for(int i = 0 ; i < steps ; i++)
time += t.times[i]; // sum all recorded times
time /= steps; // get the average execution time
cout << "\nExecution time: " << fixed << setprecision(4) << time << '\n';
double score = 0.0;
score = (1.0 * 100) / time; // calculating benchmark score
cout << "Score: ";
SetColor(12);
cout << setprecision(2) << score << " pts";
SetColor(15);
cout << "\nPress any key to continue.\n";
cin.get();
return 0;
}
No, your program has at least two treads: main, and the one you've created to run func. Moreover, neither of these threads is guaranteed to get executed on particular core. Depending on OS scheduler they may switch cores in unpredictable manner. Though main thread will mostly just wait. If you want to lock thread execution on particular core then you need to set thread core affinity by some platform-specific method such as SetThreadAffinityMask on Windows. But you don't really need to go that deep because there is no core switch sensitive code in your example. There is even no need to spawn separate thread dedicated to perform calculations.
If your program doesn't have multiple threads in the source and if the compiler does not insert automatic parallelization, the program should run on a single core (at a time).
Now depending on your compiler you can use appropriate optimization levels to ensure that it doesn't parallelize.
On the other hand what might happen is that the compiler can completely eliminate the loop in the function if it can statically compute the result. That however doesn't seem to be the issue with your case.
I don't think any C++ compiler makes use of multiple core, behind your back. There would be large language issues in doing that. If you neither spawn threads nor use a parallel library such as MPI, the program should execute on only one core.
Can iterating over unsorted data structure like array, tree with multiple thread make it faster?
For example I have big array with unsorted data.
int array[1000];
I'm searching array[i] == 8
Can running:
Thread 1:
for(auto i = 0; i < 500; i++)
{
if(array[i] == 8)
std::cout << "found" << std::endl;
}
Thread 2:
for(auto i = 500; i < 1000; i++)
{
if(array[i] == 8)
std::cout << "found" << std::endl;
}
be faster than normal iteration?
#update
I've written simple test witch describe problem better:
For searching int* array = new int[100000000];
and repeating it 1000 times
I got the result:
a
Number of threads = 2
End of multithread iteration
End of normal iteration
Time with 2 threads 73581
Time with 1 thread 154070
Bool values:0
0
0
Process returned 0 (0x0) execution time : 256.216 s
Press any key to continue.
What's more when program was running with 2 threads cpu usage of the process was around ~90% and when iterating with 1 thread it was never more than 50%.
So Smeeheey and erip are right that it can make iteration faster.
Of course it can be more tricky for not such trivial problems.
And as I've learned from this test is that compiler can optimize main thread (when i was not showing boolean storing results of search loop in main thread was ignored) but it will not do that for other threads.
This is code I have used:
#include<cstdlib>
#include<thread>
#include<ctime>
#include<iostream>
#define SIZE_OF_ARRAY 100000000
#define REPEAT 1000
inline bool threadSearch(int* array){
for(auto i = 0; i < SIZE_OF_ARRAY/2; i++)
if(array[i] == 101) // there is no array[i]==101
return true;
return false;
}
int main(){
int i;
std::cin >> i; // stops program enabling to set real time priority of the process
clock_t with_multi_thread;
clock_t normal;
srand(time(NULL));
std::cout << "Number of threads = "
<< std::thread::hardware_concurrency() << std::endl;
int* array = new int[SIZE_OF_ARRAY];
bool true_if_found_t1 =false;
bool true_if_found_t2 =false;
bool true_if_found_normal =false;
for(auto i = 0; i < SIZE_OF_ARRAY; i++)
array[i] = rand()%100;
with_multi_thread=clock();
for(auto j=0; j<REPEAT; j++){
std::thread t([&](){
if(threadSearch(array))
true_if_found_t1=true;
});
std::thread u([&](){
if(threadSearch(array+SIZE_OF_ARRAY/2))
true_if_found_t2=true;
});
if(t.joinable())
t.join();
if(u.joinable())
u.join();
}
with_multi_thread=(clock()-with_multi_thread);
std::cout << "End of multithread iteration" << std::endl;
for(auto i = 0; i < SIZE_OF_ARRAY; i++)
array[i] = rand()%100;
normal=clock();
for(auto j=0; j<REPEAT; j++)
for(auto i = 0; i < SIZE_OF_ARRAY; i++)
if(array[i] == 101) // there is no array[i]==101
true_if_found_normal=true;
normal=(clock()-normal);
std::cout << "End of normal iteration" << std::endl;
std::cout << "Time with 2 threads " << with_multi_thread<<std::endl;
std::cout << "Time with 1 thread " << normal<<std::endl;
std::cout << "Bool values:" << true_if_found_t1<<std::endl
<< true_if_found_t2<<std::endl
<<true_if_found_normal<<std::endl;// showing bool values to prevent compiler from optimization
return 0;
}
The answer is yes, it can make it faster - but not necessarily. In your case, when you're iterating over pretty small arrays, it is likely that the overhead of launching a new thread will be much higher than the benefit gained. If you array was much bigger then this would be reduced as a proportion of the overall runtime and eventually become worth it. Note you will only get speed up if your system has more than 1 physical core available to it.
Additionally, you should note that whilst that the code that reads the array in your case is perfectly thread-safe, writing to std::cout is not (you will get very strange looking output if your try this). Instead perhaps your thread should do something like return an integer type indicating the number of instances found.
I am sort of new to C++, but I was wondering if this basic (and possibly sloppy) program is actually running multiple threads at a time or if it its just pooling:
This is a console application running in visual c++ 2015:
#include <string>
#include "stdafx.h"
#include <iostream>
#include <sstream>
#include <thread>
using namespace std;
#include <stdio.h>
int temp1 = 0;
int num1 = 0;
int temp2 = 0;
int num2 = 0;
void math1() {
int running_total = 23;
for (int i = 0; i < 999999999; i++)
{
running_total = 58 * running_total + i;
}
}
int math2() {
int running_total = 23;
for (int i = 0; i < 999999999; i++)
{
running_total = 58 * running_total + i;
}
return 0;
}
int main()
{
unsigned concurentThreadsSupported = std::thread::hardware_concurrency();
cout << "Current Number of CPU threads: " << concurentThreadsSupported << endl;
thread t1(math1);
thread t2(math2);
t1.join();
t2.join();
cout << "1: " << num1 << endl;
cout << "2: " << num2 << endl;
system("pause");
return 0;
}
I notice when I run the code with thread t1(math1);thread t2(math2);t1.join();t2.join();, it uses 25% total of my cpu for 3.5 seconds, but when I use
thread t1(math1);
t1.join();
thread t2(math2);
t2.join();
it uses ~13% of the CPU for almost 7 seconds.
Is this actually multithreading?
thread t1(math1); thread t2(math2); t1.join(); t2.join(); waits for t1 to finish while also running t2. The math1 and math2 functions do the same thing, so they'll finish approx. at once, which is optimal (it could be just one function as well).
To the numbers you're seeing, you clearly have a CPU with 8 logical cores. The multithreaded version uses two hardware threads (2 / 8) = 25%, while single-threaded just one (1 / 8) = 12,5%. It also runs two times slower, simple.