I'm messing around with multithreading in c++ and here is my code:
#include <iostream>
#include <vector>
#include <string>
#include <thread>
void read(int i);
bool isThreadEnabled;
std::thread threads[100];
int main()
{
isThreadEnabled = true; // I change this to compare the threaded vs non threaded method
if (isThreadEnabled)
{
for (int i = 0;i < 100;i++) //this for loop is what I'm confused about
{
threads[i] = std::thread(read,i);
}
for (int i = 0; i < 100; i++)
{
threads[i].join();
}
}
else
{
for (int i = 0; i < 100; i++)
{
read(i);
}
}
}
void read(int i)
{
int w = 0;
while (true) // wasting cpu cycles to actually see the difference between the threaded and non threaded
{
++w;
if (w == 100000000) break;
}
std::cout << i << std::endl;
}
in the for loop that uses threads the console prints values in a random order ex(5,40,26...) which is expected and totally fine since threads don't run in the same order as they were initiated...
but what confuses me is that the values printed are sometimes more than the maximum value that int i can reach (which is 100), values like 8000,2032,274... are also printed to the console even though i will never reach that number, I don't understand why ?
This line:
std::cout << i << std::endl;
is actually equivalent to
std::cout << i;
std::cout << std::endl;
And thus while thread safe (meaning there's no undefined behaviour), the order of execution is undefined. Given two threads the following execution is possible:
T20: std::cout << 20
T32: std::cout << 32
T20: std::cout << std::endl
T32: std::cout << std::endl
which results in 2032 in console (glued numbers) and an empty line.
The simplest (not necessarily the best) fix for that is to wrap this line with a shared mutex:
{
std::lock_guard lg { mutex };
std::cout << i << std::endl;
}
(the brackets for a separate scope are not needed if the std::cout << i << std::endl; is the last line in the function)
Related
I am trying to create a sparse matrix from Triplets. Somehow the program below is either printing a matrix containing only zeros or I get
^[[Bterminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc Aborted.
When I add the code at the buttom which is commented out, it starts to work sometimes and sometimes I get again only zeros.
I don't understand this behavior. It seems to be memory related but why should printing out stuff (after commenting in the bottom code) change what happens to the top?
Is there something wrong with my approach? Or is there maybe something wrong with my setup? I just downloaded the latest stable release from http://eigen.tuxfamily.org.
#include <iostream>
#include <Eigen/SparseCore>
#include <Eigen/Dense>
int main()
{
int n = 10;
std::vector<Eigen::Triplet<double> > ijv;
for(int i; i < n; i++)
{
ijv.push_back(Eigen::Triplet<double>(i,i,1));
if(i < n-1)
{
ijv.push_back(Eigen::Triplet<double>(i+1,i,-0.9));
}
}
Eigen::SparseMatrix<double> X(n,n);
X.setFromTriplets(ijv.begin(), ijv.end());
std::cout << Eigen::MatrixXd(X) << std::endl;
/* std::cout << "Row\tCol\tVal" <<std::endl;
for (int k=0; k < X.outerSize(); ++k)
{
for (Eigen::SparseMatrix<double>::InnerIterator it(X,k); it; ++it)
{
std::cout << it.row() << "\t"; // row index
std::cout << it.col() << "\t"; // col index (here it is equal to k)
std::cout << it.value() << std::endl;
}
}
*/
}
Can iterating over unsorted data structure like array, tree with multiple thread make it faster?
For example I have big array with unsorted data.
int array[1000];
I'm searching array[i] == 8
Can running:
Thread 1:
for(auto i = 0; i < 500; i++)
{
if(array[i] == 8)
std::cout << "found" << std::endl;
}
Thread 2:
for(auto i = 500; i < 1000; i++)
{
if(array[i] == 8)
std::cout << "found" << std::endl;
}
be faster than normal iteration?
#update
I've written simple test witch describe problem better:
For searching int* array = new int[100000000];
and repeating it 1000 times
I got the result:
a
Number of threads = 2
End of multithread iteration
End of normal iteration
Time with 2 threads 73581
Time with 1 thread 154070
Bool values:0
0
0
Process returned 0 (0x0) execution time : 256.216 s
Press any key to continue.
What's more when program was running with 2 threads cpu usage of the process was around ~90% and when iterating with 1 thread it was never more than 50%.
So Smeeheey and erip are right that it can make iteration faster.
Of course it can be more tricky for not such trivial problems.
And as I've learned from this test is that compiler can optimize main thread (when i was not showing boolean storing results of search loop in main thread was ignored) but it will not do that for other threads.
This is code I have used:
#include<cstdlib>
#include<thread>
#include<ctime>
#include<iostream>
#define SIZE_OF_ARRAY 100000000
#define REPEAT 1000
inline bool threadSearch(int* array){
for(auto i = 0; i < SIZE_OF_ARRAY/2; i++)
if(array[i] == 101) // there is no array[i]==101
return true;
return false;
}
int main(){
int i;
std::cin >> i; // stops program enabling to set real time priority of the process
clock_t with_multi_thread;
clock_t normal;
srand(time(NULL));
std::cout << "Number of threads = "
<< std::thread::hardware_concurrency() << std::endl;
int* array = new int[SIZE_OF_ARRAY];
bool true_if_found_t1 =false;
bool true_if_found_t2 =false;
bool true_if_found_normal =false;
for(auto i = 0; i < SIZE_OF_ARRAY; i++)
array[i] = rand()%100;
with_multi_thread=clock();
for(auto j=0; j<REPEAT; j++){
std::thread t([&](){
if(threadSearch(array))
true_if_found_t1=true;
});
std::thread u([&](){
if(threadSearch(array+SIZE_OF_ARRAY/2))
true_if_found_t2=true;
});
if(t.joinable())
t.join();
if(u.joinable())
u.join();
}
with_multi_thread=(clock()-with_multi_thread);
std::cout << "End of multithread iteration" << std::endl;
for(auto i = 0; i < SIZE_OF_ARRAY; i++)
array[i] = rand()%100;
normal=clock();
for(auto j=0; j<REPEAT; j++)
for(auto i = 0; i < SIZE_OF_ARRAY; i++)
if(array[i] == 101) // there is no array[i]==101
true_if_found_normal=true;
normal=(clock()-normal);
std::cout << "End of normal iteration" << std::endl;
std::cout << "Time with 2 threads " << with_multi_thread<<std::endl;
std::cout << "Time with 1 thread " << normal<<std::endl;
std::cout << "Bool values:" << true_if_found_t1<<std::endl
<< true_if_found_t2<<std::endl
<<true_if_found_normal<<std::endl;// showing bool values to prevent compiler from optimization
return 0;
}
The answer is yes, it can make it faster - but not necessarily. In your case, when you're iterating over pretty small arrays, it is likely that the overhead of launching a new thread will be much higher than the benefit gained. If you array was much bigger then this would be reduced as a proportion of the overall runtime and eventually become worth it. Note you will only get speed up if your system has more than 1 physical core available to it.
Additionally, you should note that whilst that the code that reads the array in your case is perfectly thread-safe, writing to std::cout is not (you will get very strange looking output if your try this). Instead perhaps your thread should do something like return an integer type indicating the number of instances found.
As you can see in the main function I've created a group of threads that execute the exact same function yet with different parameters. The function simply prints out vector's values. Now the problem is that these threads interfere with one another. What I mean is that one thread does not finish printing (cout) before another starts, and it goes like sdkljasjdkljsad. I want some sort of chaotic order, such as, for example:
Thread 1 Vector[0]
Thread 2 Vector[0]
Thread 1 Vector[1]
Thread 3 Vector[0]
Thread 4 Vector[0]
Thread 2 Vector[1]
Rather than:
Thread 1 Thread 2 Vector[0] Vector[0]
Thread 2 Vector[1]
Thread 1 Thread 4 Vector[1] Thread 3 Vector[0] Vector[1]
How can I solve this problem? P.S. Data file is simply a list of player names, weight and bench-press per line. Transforming these to strings and placing in a vector (yeah, sounds dumb, but I'm just fulfilling a task).
#include "stdafx.h"
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <string>
#include <thread>
#include <sstream>
#include <iomanip>
#include <boost/thread.hpp>
#include <boost/bind.hpp>
using namespace std;
vector<string> Kategorijos;
vector< vector<string> > Zaidejai;
ifstream duom("duom.txt");
string precision(double a) {
ostringstream out;
out << setprecision(6) << a;
return out.str();
}
void read() {
string tempKat;
int tempZaidSk;
vector<string> tempZaid;
string vardas;
int svoris;
double pakeltasSvoris;
while (duom >> tempKat >> tempZaidSk) {
Kategorijos.push_back(tempKat);
for (int i = 0; i < tempZaidSk; i++) {
duom >> vardas >> svoris >> pakeltasSvoris;
tempZaid.push_back(vardas + " " + to_string(svoris) + " " + precision(pakeltasSvoris));
}
Zaidejai.push_back(tempZaid);
tempZaid.clear();
}
duom.close();
}
void writethreads(int a) {
int pNr = a+1;
for (int i = 0; i < (int)Zaidejai[a].size(); i++) {
cout << endl << "Proceso nr: " << pNr << " " << i << ": " << Zaidejai[a][i] ;
}
}
void print() {
for (int i = 0; i < (int)Kategorijos.size(); i++) {
cout << "*** " << Kategorijos[i] << " ***" << endl;
for (int j = 0; j < (int)Zaidejai[i].size(); j++) {
cout << j+1<<") "<< Zaidejai[i][j] << endl;
}
cout << endl;
}
cout << "-------------------------------------------------------------------" << endl;
}
int main()
{
read();
print();
boost::thread_group threads
;
for (int i = 0; i < (int)Kategorijos.size(); i++) {
threads.create_thread(boost::bind(writethreads, i));
}
threads.join_all();
system("pause");
return 0;
}
Welcome to the problem of thread synchronization! When only one thread can use a resource at a time, the lock you use to control that resource is a mutex. You can also store the data for one thread to output at the end, or you can have the threads synch up at a barrier.
You can synchronise them, the console writes, with an appropriate mutex. But in this case, with the console output, maybe don't use threads at all. Else send the printing to a dedicated thread that deals with it.
The alternative to using the usual cout overloaded operator << is to write the content to a local buffer or stringsteam (including the new line) and then, with a single function call, write that to the console. The single function call will assist in the console writer only writing one buffer's contents at a time.
Just got started on multithreading (and multithreading in general) using C++11 threading library, and and wrote small short snipped of code.
#include <iostream>
#include <thread>
int x = 5; //variable to be effected by race
//This function will be called from a thread
void call_from_thread1() {
for (int i = 0; i < 5; i++) {
x++;
std::cout << "In Thread 1 :" << x << std::endl;
}
}
int main() {
//Launch a thread
std::thread t1(call_from_thread1);
for (int j = 0; j < 5; j++) {
x--;
std::cout << "In Thread 0 :" << x << std::endl;
}
//Join the thread with the main thread
t1.join();
std::cout << x << std::endl;
return 0;
}
Was expecting to get different results every time (or nearly every time) I ran this program, due to race between two threads. However, output is always: 0, i.e. two threads run as if they ran sequentially. Why am I getting same results and is there any ways to simulate or force race between two threads ?
Your sample size is rather small, and somewhat self-stalls on the continuous stdout flushes. In short, you need a bigger hammer.
If you want to see a real race condition in action, consider the following. I purposely added an atomic and non-atomic counter, sending both to the threads of the sample. Some test-run results are posted after the code:
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
void racer(std::atomic_int& cnt, int& val)
{
for (int i=0;i<1000000; ++i)
{
++val;
++cnt;
}
}
int main(int argc, char *argv[])
{
unsigned int N = std::thread::hardware_concurrency();
std::atomic_int cnt = ATOMIC_VAR_INIT(0);
int val = 0;
std::vector<std::thread> thrds;
std::generate_n(std::back_inserter(thrds), N,
[&cnt,&val](){ return std::thread(racer, std::ref(cnt), std::ref(val));});
std::for_each(thrds.begin(), thrds.end(),
[](std::thread& thrd){ thrd.join();});
std::cout << "cnt = " << cnt << std::endl;
std::cout << "val = " << val << std::endl;
return 0;
}
Some sample runs from the above code:
cnt = 4000000
val = 1871016
cnt = 4000000
val = 1914659
cnt = 4000000
val = 2197354
Note that the atomic counter is accurate (I'm running on a duo-core i7 macbook air laptop with hyper threading, so 4x threads, thus 4-million). The same cannot be said for the non-atomic counter.
There will be significant startup overhead to get the second thread going, so its execution will almost always begin after the first thread has finished the for loop, which by comparison will take almost no time at all. To see a race condition you will need to run a computation that takes much longer, or includes i/o or other operations that take significant time, so that the execution of the two computations actually overlap.
I'm trying to write a small piece of code that just makes using CreateThread() slightly more clean looking. I can't say that I'm really intending on using it, but I thought it would be a fun, small project for a newer programmer like myself. Here is what I have so far:
#include <iostream>
#include <windows.h>
using namespace std;
void _noarg_thread_create( void(*f)() )
{
cout << "Thread created...\n" << endl;
Sleep(10);
CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)f, NULL, 0, NULL);
}
template <typename T>
void _arg_thread_create( void(*f)(T), T* parameter)
{
cout << "Thread created...\n" << endl;
Sleep(10);
CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)*f, parameter, 0, NULL);
}
void printnums(int x)
{
for(int i = x; i < 100; i++)
{
cout << i << endl;
}
}
void printnumsnoarg()
{
for(int i = 0; i < 100; i++)
{
cout << i << endl;
}
}
int main()
{
cin.get();
_noarg_thread_create( &printnumsnoarg );
cin.get();
int x = 14;
_arg_thread_create( &printnums, &x );
cin.get();
}
Basically I have two functions that will call two different presets for CreateThread: one for when a parameter is needed in the thread, and one for when no parameter is needed in the thread. I can compile this with the g++ compiler (cygwin), and it runs without any errors. The first thread gets created properly and it prints out the numbers 0-99 as expected. However, the second thread does not print out any numbers (with this code, it should print 14-99). My output looks like this:
<start of output>
$ ./a.exe
Thread created...
0
1
2
3
.
.
.
97
98
99
Thread Created...
<end of output>
Any ideas why the second thread isn't working right?
you actually seem to miss you're passing a pointer to your printnums(int x) function. As the storage location of x in your main function will be a lot bigger than 100 your loop never runs.
You should try to change printnums to:
void printnums(int *x)
{
for(int i = *x; i < 100; i++)
{
cout << i << endl;
}
}
I guess everything will work as expected then.