Why is my program printing garbage? - c++

My code:
#include <iostream>
#include <thread>
void function_1()
{
std::cout << "Thread t1 started!\n";
for (int j=0; j>-100; j--) {
std::cout << "t1 says: " << j << "\n";
}
}
int main()
{
std::thread t1(function_1); // t1 starts running
for (int i=0; i<100; i++) {
std::cout << "from main: " << i << "\n";
}
t1.join(); // main thread waits for t1 to finish
return 0;
}
I create a thread that prints numbers in decreasing order while main prints in increasing order.
Sample output here. Why is my code printing garbage ?

Both threads are outputting at the same time, thereby scrambling your output.
You need some kind of thread synchronization mechanism on the printing part.
See this answer for an example using a std::mutex combined with std::lock_guard for cout.

It's not "garbage" — it's the output you asked for! It's just jumbled up, because you have used a grand total of zero synchronisation mechanisms to prevent individual std::cout << ... << std::endl lines (which are not atomic) from being interrupted by similar lines (which are still not atomic) in the other thread.
Traditionally we'd lock a mutex around each of those lines.

Related

threaded for loop reaching values it's not supposed to reach

I'm messing around with multithreading in c++ and here is my code:
#include <iostream>
#include <vector>
#include <string>
#include <thread>
void read(int i);
bool isThreadEnabled;
std::thread threads[100];
int main()
{
isThreadEnabled = true; // I change this to compare the threaded vs non threaded method
if (isThreadEnabled)
{
for (int i = 0;i < 100;i++) //this for loop is what I'm confused about
{
threads[i] = std::thread(read,i);
}
for (int i = 0; i < 100; i++)
{
threads[i].join();
}
}
else
{
for (int i = 0; i < 100; i++)
{
read(i);
}
}
}
void read(int i)
{
int w = 0;
while (true) // wasting cpu cycles to actually see the difference between the threaded and non threaded
{
++w;
if (w == 100000000) break;
}
std::cout << i << std::endl;
}
in the for loop that uses threads the console prints values in a random order ex(5,40,26...) which is expected and totally fine since threads don't run in the same order as they were initiated...
but what confuses me is that the values printed are sometimes more than the maximum value that int i can reach (which is 100), values like 8000,2032,274... are also printed to the console even though i will never reach that number, I don't understand why ?
This line:
std::cout << i << std::endl;
is actually equivalent to
std::cout << i;
std::cout << std::endl;
And thus while thread safe (meaning there's no undefined behaviour), the order of execution is undefined. Given two threads the following execution is possible:
T20: std::cout << 20
T32: std::cout << 32
T20: std::cout << std::endl
T32: std::cout << std::endl
which results in 2032 in console (glued numbers) and an empty line.
The simplest (not necessarily the best) fix for that is to wrap this line with a shared mutex:
{
std::lock_guard lg { mutex };
std::cout << i << std::endl;
}
(the brackets for a separate scope are not needed if the std::cout << i << std::endl; is the last line in the function)

how to generate many threads by std::thread?

As we all known, we can generate one thread by std::thread t1(func); link
But how can we create 20 threads by vector?
An example solution would be:
std::vector<std::thread> my_threads{};
my_threads.reserve(20);
for(int i = 0; i < 20; i++)
my_threads.emplace_back([i]{
std::cout << "[" << i << "] Going to sleep\n";
this_thread::sleep_for(std::chrono::seconds{1});
std::cout << "[" << i << "] Hey I'm back :)\n";
});
for(auto& thread : my_threads)
if(thread.joinable())
thread.join();
Pay attention to the last tree lines.
If you don't join or detach your threads you'll get an abort.
This prevents your application from leaking unmanaged threads.

C++11 vector argument to thread appears uninitialized

I am trying to create a proof of concept for inter-thread communication by meanings of shared state: the main thread creates worker threads giving each a separate vector by reference, lets each do its work and fill its vector with results, and finally collects the results.
However, weird things are happening for which I can't find an explanation other than some race between the initialization of the vectors and the launch of the worker threads. Here is the code.
#include <iostream>
#include <vector>
#include <thread>
class Case {
public:
int val;
Case(int i):val(i) {}
};
void
run_thread (std::vector<Case*> &case_list, int idx)
{
std::cout << "size in thread " << idx <<": " << case_list.size() << '\n';
for (int i=0; i<10; i++) {
case_list.push_back(new Case(i));
}
}
int
main(int argc, char **argv)
{
int nthrd = 3;
std::vector<std::thread> threads;
std::vector<std::vector<Case*>> case_lists;
for (int i=0; i<nthrd; i++) {
case_lists.push_back(std::vector<Case*>());
std::cout << "size of " << i << " in main:" << case_lists[i].size() << '\n';
threads.push_back( std::thread( run_thread, std::ref(case_lists[i]), i) );
}
std::cout << "All threads lauched.\n";
for (int i=0; i<nthrd; i++) {
threads[i].join();
for (const auto cp:case_lists[i]) {
std::cout << cp->val << '\n';
}
}
return 0;
}
Tested on repl.it (gcc 4.6.3), the program gives the following result:
size of 0 in main:0
size of 1 in main:0
size of 2 in main:0
All threads lauched.
size in thread 0: 18446744073705569740
size in thread 2: 0
size in thread 1: 0
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
exit status -1
On my computer, besides something like the above, I also get:
Segmentation fault (core dumped)
It appears thread 0 is getting a vector that hasn't been initialized, although the vector appears properly initialized in main.
To isolate the problem, I have tried going single threaded by changing the line:
threads.push_back( std::thread( run_thread, std::ref(case_lists[i]), i) );
to
run_thread(case_lists[i], i);
and commenting out:
threads[i].join();
Now the program runs as expected, with the "threads" running one after another before the main collects the results.
My question is: what is wrong with the multi-threaded version above?
References (and iterators) for a vector are invalidated any time the capacity of the vector changes. The exact rules for overallocation vary by implementation, but odds are, you've got at least one capacity change between the first push_back and the last, and all the references made before that final capacity increase are garbage the moment it occurs, invoking undefined behavior.
Either reserve your total vector size up front (so push_backs don't cause capacity increases), initialize the whole vector to the final size up front (so no resizes occur at all), or have one loop populate completely, then launch the threads (so all resizes occur before you extract any references). The simplest fix here would be to initialize it to the final size, changing:
std::vector<std::vector<Case*>> case_lists;
for (int i=0; i<nthrd; i++) {
case_lists.push_back(std::vector<Case*>());
std::cout << "size of " << i << " in main:" << case_lists[i].size() << '\n';
threads.push_back( std::thread( run_thread, std::ref(case_lists[i]), i) );
}
to:
std::vector<std::vector<Case*>> case_lists(nthrd); // Default initialize nthrd elements up front
for (int i=0; i<nthrd; i++) {
// No push_back needed
std::cout << "size of " << i << " in main:" << case_lists[i].size() << '\n';
threads.push_back( std::thread( run_thread, std::ref(case_lists[i]), i) );
}
You might be thinking that vectors would overallocate fairly aggressively, but at least on many popular compilers, this is not the case; both gcc and clang follow a strict doubling pattern, so the first three insertions reallocate every time (capacity goes from 1, to 2, to 4); the reference to the first element is invalidated by the insertion of the second, and the reference to the second is invalidated by the insertion of the third.

cout not printing string and variable value consistently , misaligning the output

In the following code threadCount is one of 1,2,3,4 . But in the output, though the string part getting printed perfectly the num value getting missed randomly and it's getting appended after a few lines at times.
void *SPWork(void *t)
{
int* threadC = (int*)t;
int threadCount = *threadC;
cout<<"\n Thread count" << threadCount << endl;
cout << flush;
long long int i, adjustedIterationCount;
adjustedIterationCount = 100/(threadCount);
for (i=0; i< adjustedIterationCount; i++)
{
i++ ;
}
pthread_exit((void*) t);
}
Output
......
.....
Thread count1
Thread count1
Thread count2
Thread count1
Thread count
Thread count
Thread count234
.....
.....
Notice in the last line thread value is 234. But that value will never be 234.In the previous 2 line that value didn't get appended and so 2,3 got added to this line.
I know it got to do with flush or appending "\n", tried many combinations. But still, the issue persists.
N.B. This is a worker method of a pthread, compiler flags are "-g -Wall -O3 -lpthread"
While the standard streams are guaranteed to be thread-safe, there is no guarantee that the output won't be interleaved. If you want to print to a standard stream from multiple threads in a predictable way, you will need to do some synchronization yourself:
std::mutex cout_mutex;
void *SPWork(void *t)
{
//...
{
std::lock_guard<std::mutex> guard(cout_mutex);
std::cout << "\n Thread count" << threadCount << std::endl;
}
//...
}
There is no requirement that your calls to cout are an atomic operation. If you need them to be so, you can simply protect the code (just the output code) with a mutex.
In addition, injecting std::endl into the stream already flushes the data so there's little point in following that with a std::flush.
So, in its simplest form:
pthread_mutex_lock(&myMutex);
std::cout << "\n Thread count" << threadCount << std::endl;
pthread_mutex_unlock(&myMutex);
Note that, for recent C++ implementations, it's probably better to use std::mutex and std::lock_guard since they can guarantee correct clean up (see other answers for this). Since you have pthread_exit() in your code, I assume your limited to the POSIX threading model.

Forcing race between threads using C++11 threads

Just got started on multithreading (and multithreading in general) using C++11 threading library, and and wrote small short snipped of code.
#include <iostream>
#include <thread>
int x = 5; //variable to be effected by race
//This function will be called from a thread
void call_from_thread1() {
for (int i = 0; i < 5; i++) {
x++;
std::cout << "In Thread 1 :" << x << std::endl;
}
}
int main() {
//Launch a thread
std::thread t1(call_from_thread1);
for (int j = 0; j < 5; j++) {
x--;
std::cout << "In Thread 0 :" << x << std::endl;
}
//Join the thread with the main thread
t1.join();
std::cout << x << std::endl;
return 0;
}
Was expecting to get different results every time (or nearly every time) I ran this program, due to race between two threads. However, output is always: 0, i.e. two threads run as if they ran sequentially. Why am I getting same results and is there any ways to simulate or force race between two threads ?
Your sample size is rather small, and somewhat self-stalls on the continuous stdout flushes. In short, you need a bigger hammer.
If you want to see a real race condition in action, consider the following. I purposely added an atomic and non-atomic counter, sending both to the threads of the sample. Some test-run results are posted after the code:
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
void racer(std::atomic_int& cnt, int& val)
{
for (int i=0;i<1000000; ++i)
{
++val;
++cnt;
}
}
int main(int argc, char *argv[])
{
unsigned int N = std::thread::hardware_concurrency();
std::atomic_int cnt = ATOMIC_VAR_INIT(0);
int val = 0;
std::vector<std::thread> thrds;
std::generate_n(std::back_inserter(thrds), N,
[&cnt,&val](){ return std::thread(racer, std::ref(cnt), std::ref(val));});
std::for_each(thrds.begin(), thrds.end(),
[](std::thread& thrd){ thrd.join();});
std::cout << "cnt = " << cnt << std::endl;
std::cout << "val = " << val << std::endl;
return 0;
}
Some sample runs from the above code:
cnt = 4000000
val = 1871016
cnt = 4000000
val = 1914659
cnt = 4000000
val = 2197354
Note that the atomic counter is accurate (I'm running on a duo-core i7 macbook air laptop with hyper threading, so 4x threads, thus 4-million). The same cannot be said for the non-atomic counter.
There will be significant startup overhead to get the second thread going, so its execution will almost always begin after the first thread has finished the for loop, which by comparison will take almost no time at all. To see a race condition you will need to run a computation that takes much longer, or includes i/o or other operations that take significant time, so that the execution of the two computations actually overlap.