In the following code threadCount is one of 1,2,3,4 . But in the output, though the string part getting printed perfectly the num value getting missed randomly and it's getting appended after a few lines at times.
void *SPWork(void *t)
{
int* threadC = (int*)t;
int threadCount = *threadC;
cout<<"\n Thread count" << threadCount << endl;
cout << flush;
long long int i, adjustedIterationCount;
adjustedIterationCount = 100/(threadCount);
for (i=0; i< adjustedIterationCount; i++)
{
i++ ;
}
pthread_exit((void*) t);
}
Output
......
.....
Thread count1
Thread count1
Thread count2
Thread count1
Thread count
Thread count
Thread count234
.....
.....
Notice in the last line thread value is 234. But that value will never be 234.In the previous 2 line that value didn't get appended and so 2,3 got added to this line.
I know it got to do with flush or appending "\n", tried many combinations. But still, the issue persists.
N.B. This is a worker method of a pthread, compiler flags are "-g -Wall -O3 -lpthread"
While the standard streams are guaranteed to be thread-safe, there is no guarantee that the output won't be interleaved. If you want to print to a standard stream from multiple threads in a predictable way, you will need to do some synchronization yourself:
std::mutex cout_mutex;
void *SPWork(void *t)
{
//...
{
std::lock_guard<std::mutex> guard(cout_mutex);
std::cout << "\n Thread count" << threadCount << std::endl;
}
//...
}
There is no requirement that your calls to cout are an atomic operation. If you need them to be so, you can simply protect the code (just the output code) with a mutex.
In addition, injecting std::endl into the stream already flushes the data so there's little point in following that with a std::flush.
So, in its simplest form:
pthread_mutex_lock(&myMutex);
std::cout << "\n Thread count" << threadCount << std::endl;
pthread_mutex_unlock(&myMutex);
Note that, for recent C++ implementations, it's probably better to use std::mutex and std::lock_guard since they can guarantee correct clean up (see other answers for this). Since you have pthread_exit() in your code, I assume your limited to the POSIX threading model.
Related
I know that the std::map class is thread unsafe in read and write in two threads. But is it OK to insert in multiple threads?
void writeMap()
{
for (int i = 0; i < 1000; i++)
{
long long random_variable = (std::rand()) % 1000;
std::cout << "Thread ID -> " << std::this_thread::get_id() << " with looping index " << i << std::endl;
k1map.insert(std::make_pair(i, new p(i)));
}
}
int main()
{
std::srand((int)std::time(0));
for (int i = 0; i < 1000; ++i)
{
long long random_variable = (std::rand()) % 1000;
std::thread t(writeMap);
std::cout << "Thread created " << t.get_id() << std::endl;
t.detach();
}
return 0;
}
Like such code is running normal no matter how many times I try.
program is complex,to some extent,like magic(LOL).
The code run results are different on various IDE.
Before, I used VS2013, it's always right.
But on vs19 and linux,the result of the same code is wrong.
Maybe on vs2013,the implement of MAP has special way.
No, std::map::insert is not thread-safe.
Most standard library types are thread safe only if you are using separate object instances in separate threads. Take a look at the thread safety part of container's docs.
As #NutCracker has mentioned, std::map::insert is not thread-safe.
But, if the posted code works fine, I think the reason is that the map fills very fast by one thread and as a result, other threads are not modifying the map anymore.
I am trying to create a proof of concept for inter-thread communication by meanings of shared state: the main thread creates worker threads giving each a separate vector by reference, lets each do its work and fill its vector with results, and finally collects the results.
However, weird things are happening for which I can't find an explanation other than some race between the initialization of the vectors and the launch of the worker threads. Here is the code.
#include <iostream>
#include <vector>
#include <thread>
class Case {
public:
int val;
Case(int i):val(i) {}
};
void
run_thread (std::vector<Case*> &case_list, int idx)
{
std::cout << "size in thread " << idx <<": " << case_list.size() << '\n';
for (int i=0; i<10; i++) {
case_list.push_back(new Case(i));
}
}
int
main(int argc, char **argv)
{
int nthrd = 3;
std::vector<std::thread> threads;
std::vector<std::vector<Case*>> case_lists;
for (int i=0; i<nthrd; i++) {
case_lists.push_back(std::vector<Case*>());
std::cout << "size of " << i << " in main:" << case_lists[i].size() << '\n';
threads.push_back( std::thread( run_thread, std::ref(case_lists[i]), i) );
}
std::cout << "All threads lauched.\n";
for (int i=0; i<nthrd; i++) {
threads[i].join();
for (const auto cp:case_lists[i]) {
std::cout << cp->val << '\n';
}
}
return 0;
}
Tested on repl.it (gcc 4.6.3), the program gives the following result:
size of 0 in main:0
size of 1 in main:0
size of 2 in main:0
All threads lauched.
size in thread 0: 18446744073705569740
size in thread 2: 0
size in thread 1: 0
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
exit status -1
On my computer, besides something like the above, I also get:
Segmentation fault (core dumped)
It appears thread 0 is getting a vector that hasn't been initialized, although the vector appears properly initialized in main.
To isolate the problem, I have tried going single threaded by changing the line:
threads.push_back( std::thread( run_thread, std::ref(case_lists[i]), i) );
to
run_thread(case_lists[i], i);
and commenting out:
threads[i].join();
Now the program runs as expected, with the "threads" running one after another before the main collects the results.
My question is: what is wrong with the multi-threaded version above?
References (and iterators) for a vector are invalidated any time the capacity of the vector changes. The exact rules for overallocation vary by implementation, but odds are, you've got at least one capacity change between the first push_back and the last, and all the references made before that final capacity increase are garbage the moment it occurs, invoking undefined behavior.
Either reserve your total vector size up front (so push_backs don't cause capacity increases), initialize the whole vector to the final size up front (so no resizes occur at all), or have one loop populate completely, then launch the threads (so all resizes occur before you extract any references). The simplest fix here would be to initialize it to the final size, changing:
std::vector<std::vector<Case*>> case_lists;
for (int i=0; i<nthrd; i++) {
case_lists.push_back(std::vector<Case*>());
std::cout << "size of " << i << " in main:" << case_lists[i].size() << '\n';
threads.push_back( std::thread( run_thread, std::ref(case_lists[i]), i) );
}
to:
std::vector<std::vector<Case*>> case_lists(nthrd); // Default initialize nthrd elements up front
for (int i=0; i<nthrd; i++) {
// No push_back needed
std::cout << "size of " << i << " in main:" << case_lists[i].size() << '\n';
threads.push_back( std::thread( run_thread, std::ref(case_lists[i]), i) );
}
You might be thinking that vectors would overallocate fairly aggressively, but at least on many popular compilers, this is not the case; both gcc and clang follow a strict doubling pattern, so the first three insertions reallocate every time (capacity goes from 1, to 2, to 4); the reference to the first element is invalidated by the insertion of the second, and the reference to the second is invalidated by the insertion of the third.
I'm beginner in threads usage in c++. I've read basics about std::thread and mutex, and it seems I understand the purpose of using mutexes.
I decided to check if threads are really so dangerous without mutexes (Well I believe books but prefer to see it with my own eyes). As a testcase of "what I shouldn't do in future" I created 2 versions of the same concept: there are 2 threads, one of them increments a number several times (NUMBER_OF_ITERATIONS), another one decrements the same number the same number of times, so we expect to see the same number after the code is executed as before it. The code is attached.
At first I run 2 threads which do it in unsafe manner - without any mutexes, just to see what can happen. And after this part is finished I run 2 threads which do the same thing but in safe manner (with mutexes).
Expected results: without mutexes a result can differ from initial value, because data could be corrupted if two threads works with it simultaneously. Especially it's usual for huge NUMBER_OF_ITERATIONS - because the probability to corrupt data is higher. So this result I can understand.
Also I measured time spent by both "safe" and "unsafe" parts. For huge number of iterations the safe part spends much more time, than unsafe one, as I expected: there is some time spent for mutex check. But for small numbers of iterations (400, 4000) the safe part execution time is less than unsafe time. Why is that possible? Is it something which operating system does? Or is there some optimization by compiler which I'm not aware of? I spent some time thinking about it and decided to ask here.
I use windows and MSVS12 compiler.
Thus the question is: why the safe part execution could be faster than unsafe part one (for small NUMBER_OF_ITERATIONS < 1000*n)?
Another one: why is it related to NUMBER_OF_ITERATIONS: for smaller ones (4000) "safe" part with mutexes is faster, but for huge ones (400000) the "safe" part is slower?
main.cpp
#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
#include <windows.h>
//
///change number of iterations for different results
const long long NUMBER_OF_ITERATIONS = 400;
//
/// time check counter
class Counter{
double PCFreq_ = 0.0;
__int64 CounterStart_ = 0;
public:
Counter(){
LARGE_INTEGER li;
if(!QueryPerformanceFrequency(&li))
std::cerr << "QueryPerformanceFrequency failed!\n";
PCFreq_ = double(li.QuadPart)/1000.0;
QueryPerformanceCounter(&li);
CounterStart_ = li.QuadPart;
}
double GetCounter(){
LARGE_INTEGER li;
QueryPerformanceCounter(&li);
return double(li.QuadPart-CounterStart_)/PCFreq_;
}
};
/// "dangerous" functions for unsafe threads: increment and decrement number
void incr(long long* j){
for (long long i = 0; i < NUMBER_OF_ITERATIONS; i++) (*j)++;
std::cout << "incr finished" << std::endl;
}
void decr(long long* j){
for (long long i = 0; i < NUMBER_OF_ITERATIONS; i++) (*j)--;
std::cout << "decr finished" << std::endl;
}
///class for safe thread operations with incrment and decrement
template<typename T>
class Safe_number {
public:
Safe_number(int i){number_ = T(i);}
Safe_number(long long i){number_ = T(i);}
bool inc(){
if(m_.try_lock()){
number_++;
m_.unlock();
return true;
}
else
return false;
}
bool dec(){
if(m_.try_lock()){
number_--;
m_.unlock();
return true;
}
else
return false;
}
T val(){return number_;}
private:
T number_;
std::mutex m_;
};
///
template<typename T>
void incr(Safe_number<T>* n){
long long i = 0;
while(i < NUMBER_OF_ITERATIONS){
if (n->inc()) i++;
}
std::cout << "incr <T> finished" << std::endl;
}
///
template<typename T>
void decr(Safe_number<T>* n){
long long i = 0;
while(i < NUMBER_OF_ITERATIONS){
if (n->dec()) i++;
}
std::cout << "decr <T> finished" << std::endl;
}
using namespace std;
// run increments and decrements of the same number
// in threads in "safe" and "unsafe" way
int main()
{
//init numbers to 0
long long number = 0;
Safe_number<long long> sNum(number);
Counter cnt;//init time counter
//
//run 2 unsafe threads for ++ and --
std::thread t1(incr, &number);
std::thread t2(decr, &number);
t1.join();
t2.join();
//check time of execution of unsafe part
double time1 = cnt.GetCounter();
cout <<"finished first thr" << endl;
//
// run 2 safe threads for ++ and --, now we expect final value 0
std::thread t3(incr<long long>, &sNum);
std::thread t4(decr<long long>, &sNum);
t3.join();
t4.join();
//check time of execution of safe part
double time2 = cnt.GetCounter() - time1;
cout << "unsafe part, number = " << number << " time1 = " << time1 << endl;
cout << "safe part, Safe number = " << sNum.val() << " time2 = " << time2 << endl << endl;
return 0;
}
You should not draw conclusions about the speed of any given algorithm if the input size is very small. What defines "very small" can be kind of arbitrary, but on modern hardware, under usual conditions, "small" can refer to any collection size less than a few hundred thousand objects, and "large" can refer to any collection larger than that.
Obviously, Your Milage May Vary.
In this case, the overhead of constructing threads, which, while usually slow, can also be rather inconsistent and could be a larger factor in the speed of your code than what the actual algorithm is doing. It's possible that the compiler has some kind of powerful optimizations it can do on smaller input sizes (which it can definitely know about due to the input size being hard-coded into the code itself) that it cannot then perform on larger inputs.
The broader point being that you should always prefer larger inputs when testing algorithm speed, and to also have the same program repeat its tests (preferably in random order!) to try to "smooth out" irregularities in the timings.
I have a search problem, which I want to parallelize. If one thread has found a solution, I want all other threads to stop. Otherwise, if all threads exit regularly, I know, that there is no solution.
The following code (that demonstrates my cancelling strategy) seems to work, but I'm not sure, if it is safe and the most efficient variant:
#include <iostream>
#include <thread>
#include <cstdint>
#include <chrono>
using namespace std;
struct action {
uint64_t* ii;
action(uint64_t *ii) : ii(ii) {};
void operator()() {
uint64_t k = 0;
for(; k < *ii; ++k) {
//do something useful
}
cout << "counted to " << k << " in 2 seconds" << endl;
}
void cancel() {
*ii = 0;
}
};
int main(int argc, char** argv) {
uint64_t ii = 1000000000;
action a{&ii};
thread t(a);
cout << "start sleeping" << endl;
this_thread::sleep_for(chrono::milliseconds(2000));
cout << "finished sleeping" << endl;
a.cancel();
cout << "cancelled" << endl;
t.join();
cout << "joined" << endl;
}
Can I be sure, that the value, to which ii points, always gets properly reloaded? Is there a more efficient variant, that doesn't require the dereferenciation at every step? I tried to make the upper bound of the loop a member variable, but since the constructor of thread copies the instance of action, I wouldn't have access to that member later.
Also: If my code is exception safe and does not do I/O (and I am sure, that my platform is Linux), is there a reason not to use pthread_cancel on the native thread?
No, there's no guarantee that this will do anything sensible. The code has one thread reading the value of ii and another thread writing to it, without any synchronization. The result is that the behavior of the program is undefined.
I'd just add a flag to the class:
std::atomic<bool> time_to_stop;
The constructor of action should set that to false, and the cancel member function should set it to true. Then change the loop to look at that value:
for(; !time_to_stop && k < *ii; ++k)
You might, instead, make ii atomic. That would work, but it wouldn't be as clear as having a named member to look at.
First off there is no reason to make ii a pointer. You can have it just as a plain uint64_t.
Secondly if you have multiple threads and at least one of them writes to a shared variable then you are going to have to have some sort of synchronization. In this case you could just use std::atomic<uint64_t> to get that synchronization. Otherwise you would have to use a mutex or some sort of memory fence.
My code:
#include <iostream>
#include <thread>
void function_1()
{
std::cout << "Thread t1 started!\n";
for (int j=0; j>-100; j--) {
std::cout << "t1 says: " << j << "\n";
}
}
int main()
{
std::thread t1(function_1); // t1 starts running
for (int i=0; i<100; i++) {
std::cout << "from main: " << i << "\n";
}
t1.join(); // main thread waits for t1 to finish
return 0;
}
I create a thread that prints numbers in decreasing order while main prints in increasing order.
Sample output here. Why is my code printing garbage ?
Both threads are outputting at the same time, thereby scrambling your output.
You need some kind of thread synchronization mechanism on the printing part.
See this answer for an example using a std::mutex combined with std::lock_guard for cout.
It's not "garbage" — it's the output you asked for! It's just jumbled up, because you have used a grand total of zero synchronisation mechanisms to prevent individual std::cout << ... << std::endl lines (which are not atomic) from being interrupted by similar lines (which are still not atomic) in the other thread.
Traditionally we'd lock a mutex around each of those lines.