unpredictable C++ sleep/wait behavior - c++

I must be doing something stupid because I am getting the weirdest behavior from this simple sleep code. Originally I was using std::this_thread::sleep_for and got the same results but assumed it must have been some thread strangeness. However, I am getting the same seemingly out-of-order waiting with the code below. Same results with clang++ or g++. I am on Debian and compiling at the command line.
Expected behavior:
Shutting down in 3... [wait one second] 2... [wait one second] 1... [wait one second; program exit]
Actual behavior:
[3 second long wait] Shutting down in 3... 2... 1... [program exit]
#include<chrono>
#include<iostream>
void Sleep(int i) {
auto start = std::chrono::high_resolution_clock::now();
auto now = std::chrono::high_resolution_clock::now();
while (std::chrono::duration_cast<std::chrono::seconds>(now-start).count() < i)
now = std::chrono::high_resolution_clock::now();
}
void ShutdownCountdown(int i) {
if (i <= 0) return;
std::cout << "Shutting down in ";
for (; i != 0; --i) {
std::cout << i << "... ";
Sleep(1);
}
std::cout << std::endl << std::endl;
}
int main (int argc, char *argv[]) {
ShutdownCountdown(3);
return 0;
}

Normally iostreams do not flush until a newline is encountered. Since you don't output an EOL character, you need to explicitly flush to get the output printed:
std::cout << i << "... " << std::flush;
Unrelated, but note also the CPU getting a bit hot when you run your program. To save energy, consider changing the busy loop back to a real sleep:
for (; i != 0; --i) {
std::cout << i << "... " << std::flush;
std::this_thread::sleep_for(1s);
}
The nifty "1s" syntax is possible with using namespace std::chrono_literals; at the beginning of the program.

#include<chrono>
#include<iostream>
void Sleep(int i) {
auto start = std::chrono::high_resolution_clock::now();
auto now = std::chrono::high_resolution_clock::now();
while (std::chrono::duration_cast<std::chrono::seconds>(now-start).count() < i)
now = std::chrono::high_resolution_clock::now();
}
void ShutdownCountdown(int i) {
if (i <= 0) return;
std::cout << "Shutting down in "<<std::flush;
for (; i != 0; --i) {
std::cout << i << "... "<<std::flush;
Sleep(1);
}
std::cout << std::endl << std::endl;
}
int main (int argc, char *argv[]) {
ShutdownCountdown(3);
return 0;
}

Related

The function called by std::async is not executed immediately?

#include <iostream>
#include <future>
auto gClock = clock();
char threadPool(char c) {
std::cout << "enter thread :" << c << " cost time:" << clock() - gClock << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(2));
for (int i = 0; i < 10; i++)
std::cout << c;
std::cout << std::endl;
return c;
}
void fnTestAsync(){
auto begin = clock();
std::future<char> futures[10];
for (int i = 0; i < 10; ++i){
futures[i] = std::async(std::launch::async,threadPool, 'a' + i);
}
for (int i = 0; i < 10; ++i){
std::cout << futures[i].get() << " back ,cost time: " << clock() - begin << std::endl;
}
std::cout << "fnTestAsync: " << clock() - begin << std::endl;
}
int main(){
std::thread testAsync(fnTestAsync);
testAsync.detach();
std::this_thread::sleep_for(std::chrono::seconds(10));
return 0;
}
run result
I'm trying to get these 10 threads to execute together and all return immediately after a two second delay, but I output the time spent and find that it takes about 2900ms, much larger than the 2000ms I expected.
What is the cause of this increase?
How should he fix it?

Pinning pthread to certain core on Odroid XU3 running Ubunutu in C++

I am dividing workload between big and LITTLE cores on the ARM based Odroid XU3 platform by assigning varying workloads to pthreads and then pin these pthreads to certain cores.
So far I have used lstopo in the terminal to check for available cores. Then I wrote a small test program in C++ based on this blog post by Eli Bendersky
https://eli.thegreenplace.net/2016/c11-threads-affinity-and-hyperthreading/
and the answear
how to set CPU affinity of a particular pthread?
where I tried to pin pthreads to big or LITTLE cores with pthread_setaffinity_np(). I then use sched_getcpu() to verify this by spot checking.
On my homogeneous Intel based labtop the testprogram reported that threads was pinned as expected, but on the Odroid XU3 platform the same pthread disregarded the call to pthread_setaffinity_np(). I suppose the pthreads where assigned to cores as the linux scheduler seemed fit. Right now I am stuck with Ubuntu 16.04.6 but this might change.
The end goal is to find an optimal static workload distribution between big and LITTLE cores. Before I conclude that my only option is to start digging in the Linux kernel source and write new modules, as suggested here for someone who wanted to prevent preemptions pinning a pthread to a single core is there some other API or some other method I could try (using pthreads)?
The testprogram:
#include <algorithm>
#include <chrono>
#include <iostream>
#include <iomanip>
#include <mutex>
#include <pthread.h>
#include <vector>
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>
pthread_mutex_t cout_mutex;
struct thread_args
{
int i;
const char** argv;
};
int stick_this_thread_to_core(int core_id) {
int num_cores = sysconf(_SC_NPROCESSORS_ONLN);
if (core_id < 0 || core_id >= num_cores)
return EINVAL;
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(core_id, &cpuset);
pthread_t current_thread = pthread_self();
return pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset);
}
void print_vector(std::vector<int> cpus)
{
std::cout << "Executed on CPUs: ";
for(const auto& cpu : cpus)
{
std::cout << cpu << ", ";
}
}
void log_cpu(int current_cpu, std::vector<int> &cpus)
{
bool flag_exist {false};
for(int old_cpu : cpus)
{
if(old_cpu == current_cpu)
flag_exist = true;
}
if(!flag_exist)
cpus.push_back(current_cpu);
}
void* thread_func(void* args)
{
thread_args* arguments = (thread_args*) args;
const char** argv = arguments->argv;
int i = arguments->i;
int err = stick_this_thread_to_core(i);
if(err != 0)
std::cout << "Set affinity error: " << err << "\n";
std::vector<int> cpus {};
using hires_clock = std::chrono::high_resolution_clock;
using duration_ms = std::chrono::duration<double, std::milli>;
auto t1 = hires_clock::now();
long long int count = 0;
while(count != std::stoll(argv[2]))
{
log_cpu(sched_getcpu(), cpus);
count++;
}
auto t2 = hires_clock::now();
pthread_mutex_lock(&cout_mutex);
pid_t x = syscall(__NR_gettid);
std::cout << "Thread #" << x << ": on CPU " << std::setw(2);
std::cout << sched_getcpu();
std::cout << std::setw(0) << " ";
std::cout << "elapsed: " << std::setw(12) << duration_ms(t2 - t1).count();
std::cout << std::setw(0) << " ms " << "requested cpu: ";
std::cout << std::setw(2) << i << std::setw(0) << " ";
print_vector(cpus);
std::cout << "\n";
pthread_mutex_unlock(&cout_mutex);
return 0;
}
int main(int argc, const char** argv) {
if(argc != 3)
{
std::cout << "Usage: ./bin num_cpus num_iterations" << std::endl;
return 0;
}
int num_threads = std::stoi(argv[1]);
pthread_t threads[num_threads];
thread_args arguments[num_threads];
pthread_mutex_init(&cout_mutex, NULL);
for(int i = 0; i < num_threads; i++)
{
arguments[i].i = i;
arguments[i].argv = argv;
}
for(int i = 0; i < num_threads; i++)
{
void* arg = &arguments[i];
if(pthread_create(&threads[i], NULL, thread_func, arg) != 0)
std::cout << "pthread create error\n";
}
for(int i = 0; i < num_threads; i++)
{
if(pthread_join(threads[i], NULL) != 0)
std::cout << "pthread join error\n";
}
pthread_mutex_destroy(&cout_mutex);
return 0;
}

C++ thread request

I'm new to C++. In my application, there is a method getOnlineStatus():
int getOnlineStatus(int num);
This method is from third party DLL, it can't be modified.
I call this method to check number status, like this:
int num = 123456;
for (int i = 0; i < 10000000; i++) {
num = num + 1;
int nRet = getOnlineStatus(num);
if (nRet > 0) {
cout << num << "status online" << endl;
}
else if (nRet == 0) {
cout << num << "status offline" << endl;
}
else {
cout << num << "check fail" << endl;
}
}
But every time, it will take 2 seconds to return the nRet. So, if I check lots of number, it will take a long time.
Also, I tried to use async, but it's not working, it still takes 2 seconds to return a result one by one.
int num = 123456;
for (int i = 0; i < 10000000; i++) {
num = num + 1;
future<int> fuRes = std::async(std::launch::async, getOnlineStatus, num);
int result = fuRes.get();
if (result > 0) {
cout << num << "status online" << endl;
}
else if (result == 0) {
cout << num << "status offline" << endl;
}
else {
cout << num << "check fail" << endl;
}
}
Is there any way to open multiple threads to make it show results faster?
This largely depends on your third party DLL - does it even support requests from multiple threads? And if it does - do those requests use shared resources? Like the same internet connection / socket?
If you simplify your question and assume that the getOnlineStatus() sleeps for 2 seconds - then yes, you can greatly benefit from issuing multiple requests on different threads and wait in parallel.
Here is how you can simply setup reasonable number of threads to share the workload:
#include <iostream>
#include <vector>
#include <thread>
#include <chrono>
int status[10'000]{};
int getOnlineStatus(int n) {
std::this_thread::sleep_for(std::chrono::seconds(1));
return rand();
}
void getStatus(int low, int high) {
for (int i = low; i < high; i++) {
status[i] = getOnlineStatus(i);
}
}
int main()
{
srand(0);
const int count = std::thread::hardware_concurrency();
auto start = std::chrono::high_resolution_clock::now();
std::vector<std::thread> threads;
for (int i = 0, low = 0, high = 10; i < count; ++i, low += 10, high += 10)
threads.emplace_back(std::thread(getStatus, low, high));
for (auto& thread : threads)
thread.join();
auto stop = std::chrono::high_resolution_clock::now();
std::cout << count << " threads: " << std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count() << " ms" << std::endl;
start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 10 * count; ++i)
status[i] = getOnlineStatus(i);
stop = std::chrono::high_resolution_clock::now();
std::cout << "single thread: " << std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count() << " ms" << std::endl;
}
I get this result:
12 threads: 10075 ms
single thread: 120720 ms
NOTE: if those worker threads really do nothing, you can run many more of those, reducing total time significantly.

C++ Condition variable to signal end of detached thread execution stalls

I have some code which I'm working on where a detached thread is spawned, does some work, and then should wait for a signal from main() before sending another signal back to main indicating that the thread has quit.
I'm fairly new to condition variables, however I have worked with some multi thread code before. (Mostly mutexes.)
This is what I tried to implement, but it doesn't behave the way I would have expected. (Likely I misunderstood something.)
The idea behind this is to pass a struct containing two flags to each detached thread. The first flag indicates that main() says "it is ok to exit, and drop off the end of the thread function". The second flag is set by the thread itself and signals to main() that the thread has indeed exited. (It's just to confirm the signal from main() is recieved ok and to send something back.)
#include <cstdlib> // std::atoi
#include <iostream>
#include <thread>
#include <vector>
#include <random>
#include <future>
#include <condition_variable>
#include <mutex>
struct ThreadStruct
{
int id;
std::condition_variable cv;
std::mutex m;
int ok_to_exit;
int exit_confirm;
};
void Pause()
{
std::cout << "Press enter to continue" << std::endl;
std::cin.get();
}
void detachedThread(ThreadStruct* threadData)
{
std::cout << "START: Detached Thread " << threadData->id << std::endl;
// Performs some arbitrary amount of work.
for(int i = 0; i < 100000; ++ i);
std::cout << "FINISH: Detached thread " << threadData->id << std::endl;
std::unique_lock<std::mutex> lock(threadData->m);
std::cout << "WAIT: Detached thread " << threadData->id << std::endl;
threadData->cv.wait(lock, [threadData]{return threadData->ok_to_exit == 1;});
std::cout << "EXIT: Detached thread " << threadData->id << std::endl;
threadData->exit_confirm = 1;
}
int main(int argc, char** argv)
{
int totalThreadCount = 1;
ThreadStruct* perThreadData = new ThreadStruct[totalThreadCount];
std::cout << "Main thread starting " << totalThreadCount << " thread(s)" << std::endl;
for(int i = totalThreadCount - 1; i >= 0; --i)
{
perThreadData[i].id = i;
perThreadData[i].ok_to_exit = 0;
perThreadData[i].exit_confirm = 0;
std::thread t(detachedThread, &perThreadData[i]);
t.detach();
}
for(int i{0}; i < totalThreadCount; ++i)
{
ThreadStruct *threadData = &perThreadData[i];
std::cout << "Waiting for lock - main() thread" << std::endl;
std::unique_lock<std::mutex> lock(perThreadData[i].m);
std::cout << "Lock obtained - main() thread" << std::endl;
perThreadData[i].cv.wait(lock);
threadData->ok_to_exit = 1;
// added after comment from Sergey
threadData->cv.notify_all();
std::cout << "Done - main() thread" << std::endl;
}
for(int i{0}; i < totalThreadCount; ++i)
{
std::size_t thread_index = i;
ThreadStruct& threadData = perThreadData[thread_index];
std::unique_lock<std::mutex> lock(threadData.m);
std::cout << "i=" << i << std::endl;
int &exit_confirm = threadData.exit_confirm;
threadData.cv.wait(lock, [exit_confirm]{return exit_confirm == 1;});
std::cout << "i=" << i << " finished!" << std::endl;
}
Pause();
return 0;
}
This runs to the line:
WAIT: Detached thread 0
but the detached thread never quits. What have I done wrong?
Edit: Further experimentation - is this helpful?
I thought it might be helpful to simplify things by removing a step. In the example below, main() does not signal to the detached thread, it just waits for a signal from the detached thread.
But again, this code hangs - after printing DROP... This means the detached thread exits ok, but main() doesn't know about it.
#include <cstdlib> // std::atoi
#include <iostream>
#include <thread>
#include <vector>
#include <random>
#include <future>
#include <condition_variable>
#include <mutex>
struct ThreadStruct
{
int id;
std::condition_variable cv;
std::mutex m;
int ok_to_exit;
int exit_confirm;
};
void Pause()
{
std::cout << "Press enter to continue" << std::endl;
std::cin.get();
}
void detachedThread(ThreadStruct* threadData)
{
std::cout << "START: Detached Thread " << threadData->id << std::endl;
// Performs some arbitrary amount of work.
for(int i = 0; i < 100000; ++ i);
std::cout << "FINISH: Detached thread " << threadData->id << std::endl;
std::unique_lock<std::mutex> lock(threadData->m);
std::cout << "EXIT: Detached thread " << threadData->id << std::endl;
threadData->exit_confirm = 1;
threadData->cv.notify_all();
std::cout << "DROP" << std::endl;
}
int main(int argc, char** argv)
{
int totalThreadCount = 1;
ThreadStruct* perThreadData = new ThreadStruct[totalThreadCount];
std::cout << "Main thread starting " << totalThreadCount << " thread(s)" << std::endl;
for(int i = totalThreadCount - 1; i >= 0; --i)
{
perThreadData[i].id = i;
perThreadData[i].ok_to_exit = 0;
perThreadData[i].exit_confirm = 0;
std::thread t(detachedThread, &perThreadData[i]);
t.detach();
}
for(int i{0}; i < totalThreadCount; ++i)
{
std::size_t thread_index = i;
ThreadStruct& threadData = perThreadData[thread_index];
std::cout << "Waiting for mutex" << std::endl;
std::unique_lock<std::mutex> lock(threadData.m);
std::cout << "i=" << i << std::endl;
int &exit_confirm = threadData.exit_confirm;
threadData.cv.wait(lock, [exit_confirm]{return exit_confirm == 1;});
std::cout << "i=" << i << " finished!" << std::endl;
}
Pause();
return 0;
}
Your lambda is capturing by-value so it will never see the changes made to exit_confim.
Capture by-reference instead:
int& exit_confirm = threadData.exit_confirm;
threadData.cv.wait(lock, [&exit_confirm] { return exit_confirm == 1; });
// ^
// | capture by-reference
You also need to delete[] what you new[] so do
delete[] ThreadStruct;
when you're done with the the structs.
I also noticed some heap usage after free but that magically went away when I made some simplifications to the code. I didn't investigate that further.
Some suggestions:
Move code into the ThreadStruct class that deals with ThreadStruct member variables and locks. It usually makes it simpler to read and maintain.
Remove unused variables and headers.
Don't use new[]/delete[]. For this example, you could use a std::vector<ThreadStruct> instead.
Don't detach() at all - I haven't done anything about that below, but I suggest using join() (on attached threads) to do the final synchronization. That's what it's there for.
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>
#include <vector>
struct ThreadStruct {
int id;
// move this function into the ThreadStruct class
void detachedThread() {
std::cout << "START: Detached Thread " << id << std::endl;
// Performs some arbitrary amount of work (optimized away here)
std::cout << "FINISH: Detached thread " << id << std::endl;
std::lock_guard<std::mutex> lock(m);
std::cout << "EXIT: Detached thread " << id << std::endl;
exit_confirm = 1;
cv.notify_all();
std::cout << "DROP" << std::endl;
}
// add support functions instead of doing these things in your normal code
void wait_for_exit_confirm() {
std::unique_lock<std::mutex> lock(m);
cv.wait(lock, [this] { return exit_confirm == 1; });
}
void spawn_detached() {
std::thread(&ThreadStruct::detachedThread, this).detach();
}
private:
std::condition_variable cv;
std::mutex m;
int exit_confirm = 0; // initialize
};
With the above, main becomes a little cleaner:
int main() {
int totalThreadCount = 1;
std::vector<ThreadStruct> perThreadData(totalThreadCount);
std::cout << "Main thread starting " << perThreadData.size() << " thread(s)\n";
int i = 0;
for(auto& threadData : perThreadData) {
threadData.id = i++;
threadData.spawn_detached();
}
for(auto& threadData : perThreadData) {
std::cout << "Waiting for mutex" << std::endl;
std::cout << "i=" << threadData.id << std::endl;
threadData.wait_for_exit_confirm();
std::cout << "i=" << threadData.id << " finished!" << std::endl;
}
std::cout << "Press enter to continue" << std::endl;
std::cin.get();
}
For future interest: fixed the origional MWE posted in the question. There was two issues
not capturing local variable in lambda by reference (see other answer)
1 too many wait() calls
#include <cstdlib> // std::atoi
#include <iostream>
#include <thread>
#include <vector>
#include <random>
#include <future>
#include <condition_variable>
#include <mutex>
struct ThreadStruct
{
int id;
std::condition_variable cv;
std::mutex m;
int ok_to_exit;
int exit_confirm;
};
void Pause()
{
std::cout << "Press enter to continue" << std::endl;
std::cin.get();
}
void detachedThread(ThreadStruct* threadData)
{
std::cout << "START: Detached Thread " << threadData->id << std::endl;
// Performs some arbitrary amount of work.
for (int i = 0; i < 100000; ++i);
std::cout << "FINISH: Detached thread " << threadData->id << std::endl;
std::unique_lock<std::mutex> lock(threadData->m);
std::cout << "WAIT: Detached thread " << threadData->id << std::endl;
threadData->cv.wait(lock, [&threadData]{return threadData->ok_to_exit == 1;});
std::cout << "EXIT: Detached thread " << threadData->id << std::endl;
threadData->exit_confirm = 1;
threadData->cv.notify_all();
std::cout << "DROP" << std::endl;
}
int main(int argc, char** argv)
{
int totalThreadCount = 1;
ThreadStruct* perThreadData = new ThreadStruct[totalThreadCount];
std::cout << "Main thread starting " << totalThreadCount << " thread(s)" << std::endl;
for (int i = totalThreadCount - 1; i >= 0; --i)
{
perThreadData[i].id = i;
perThreadData[i].ok_to_exit = 0;
perThreadData[i].exit_confirm = 0;
std::thread t(detachedThread, &perThreadData[i]);
t.detach();
}
for(int i{0}; i < totalThreadCount; ++ i)
{
ThreadStruct *threadData = &perThreadData[i];
std::cout << "Waiting for lock - main() thread" << std::endl;
std::unique_lock<std::mutex> lock(perThreadData[i].m);
std::cout << "Lock obtained - main() thread" << std::endl;
//perThreadData[i].cv.wait(lock, [&threadData]{return threadData->ok_to_exit == 1;});
std::cout << "Wait complete" << std::endl;
threadData->ok_to_exit = 1;
threadData->cv.notify_all();
std::cout << "Done - main() thread" << std::endl;
}
for (int i{ 0 }; i < totalThreadCount; ++i)
{
std::size_t thread_index = i;
ThreadStruct& threadData = perThreadData[thread_index];
std::cout << "Waiting for mutex" << std::endl;
std::unique_lock<std::mutex> lock(threadData.m);
std::cout << "i=" << i << std::endl;
int& exit_confirm = threadData.exit_confirm;
threadData.cv.wait(lock, [&exit_confirm] {return exit_confirm == 1; });
std::cout << "i=" << i << " finished!" << std::endl;
}
Pause();
return 0;
}

How to run multiple threads created by loop simultaneous using boost.thread?

I'm using learning the basic of boost.thread. So far, I can create each thread one by one manually to let them run at the same time. However, when creating by loop, it runs sequentially not concurrency anymore.
#include <iostream>
#include <boost/thread.hpp>
void workerFunc()
{
boost::posix_time::seconds workTime(3);
std::cout << "Worker: Running" << '\n';
boost::this_thread::sleep(workTime);
std::cout<< "Worker: Finished" << '\n';
}
int main()
{
std::cout << "main: startup" << '\n';
boost::thread workerThread(workerFunc);
std::cout << "main: waiting for thread" << '\n';
//these are ok
boost::thread t(workerFunc), t2(workerFunc), t3(workerFunc), t4(workerFunc);
t.join();
t2.join();
t3.join();
t4.join();
//these are not
for (int i = 0; i < 2; ++i)
{
boost::thread z(workerFunc);
z.join();
}
std::cout << "main:done" << '\n';
return 0;
}
for (int i = 0; i < 2; ++i)
{
boost::thread z(workerFunc);
z.join();
}
You are starting your thread and then immediately waiting for it to complete!
EDIT
One of several alternative hacks besides thread groups.
std::vector<boost::thread *> z;
for (int i = 0; i < 2; ++i)
z.push_back(new boost::thread(workerFunc));
for (int i = 0; i < 2; ++i)
{
z[i]->join();
delete z[i];
}
Ok I found the answer through the problem of someone else, as well as learn their problem:
How to make boost::thread_group execute a fixed number of parallel threads
Use shared_ptr
#include <iostream>
#include <boost/thread.hpp>
void workerFunc()
{
boost::posix_time::seconds workTime(3);
std::cout << "Worker: Running" << '\n';
boost::this_thread::sleep(workTime);
std::cout << "Worker: Finished" << '\n';
}
int main()
{
std::cout << "main: startup" << '\n';
std::vector<std::shared_ptr<boost::thread>> z;
for (int i = 0; i < 2; ++i) {
z.push_back(std::make_shared<boost::thread>(workerFunc));
}
for (auto t : z) {
t->join();
}
std::cout << "main:done" << '\n';
return 0;
}
Execute it
# g++ e.cpp -lboost_thread && ./a.out
main: startup
Worker: Running
Worker: Running
Worker: Finished
Worker: Finished
main:done