I am trying mutex lock with independent threads. The requirement is, I have many threads which will run independently and access/update a common recourse. To ensure that the recourse is updated via a single task, I used mutex. However this is not working.
I have pasted code, a representation of what I am trying to do below:
#include <iostream>
#include <map>
#include <string>
#include <chrono>
#include <thread>
#include <mutex>
#include <unistd.h>
std::mutex mt;
static int iMem = 0;
int maxITr = 1000;
void renum()
{
// Ensure that only 1 task will update the variable
mt.lock();
int tmpMem = iMem;
usleep(100); // Make the system sleep/induce delay
iMem = tmpMem + 1;
mt.unlock();
printf("iMem = %d\n", iMem);
}
int main()
{
for (int i = 0; i < maxITr; i++) {
std::thread mth(renum);
mth.detach(); // Run each task in an independent thread
}
return 0;
}
but this is terminating with the below error:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
I want to know if the usage of <thread>.detach() is correct above? If I use .join() it works, but I want each thread to run independently and not wait for the thread to finish.
I also want to know what is the best way to achieve the above logic.
Try this:
int main()
{
std::vector<std::thread> mths;
mths.reserve(maxITr);
for (int i = 0; i < maxITr; i++) {
mths.emplace_back(renum);
}
for (auto& mth : mths) {
mth.join();
}
}
This way, you retain control of the threads (by not calling detach()), and you can join them all at the end, so you know they have completed their tasks.
Related
I am trying to set the name of my threads for ease of profiling (in ps, top, etc.).
I usually use pthread_setname_np(pthread_self(), <THREAD_NAME>) and it works like a charm.
But the one place I cannot understand is how to do it when I'm using lambda threads.
Here's an example on what I'm trying to do.
#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
#include <pthread.h>
int main()
{
// vector container stores threads
std::vector<std::thread> workers;
for (int i = 0; i < 5; i++) {
workers.push_back(std::thread([]()
{
std::cout << "thread function\n";
}));
}
std::cout << "main thread\n";
std::for_each(workers.begin(), workers.end(), [](std::thread& t)
{
//I want to set the thread name for profiling
std::string s = "mythread";
auto handle = t.native_handle();
pthread_setname_np(handle, s.c_str());
t.join();
});
return 0;
}
This does not work. If I do top -H -p <pid> in Linux I cannot see those threads named.
Thanks
I have a question about boost::io_service.
I have a set of tasks that I can run concurrently. After running all of them, I need to run another set of tasks concurrently. However first set has to be completed before starting to run the second set. This means I need to make sure that all the jobs submitted to io_service is completed before starting to schedule to second set.
I can implement it by keeping some kind of counter and add a busy loop but it does not look very efficient. So, I wanted to checked whether someone has a better idea or not. Following is a dummy code that I was using to experiment.
Thank you in advance!
#include <cstdio>
#include <iostream>
#include <unistd.h>
#include <boost/asio/io_service.hpp>
#include <boost/bind.hpp>
#include <boost/thread/thread.hpp>
const size_t numTasks = 100000;
void print_counter(const size_t id)
{
if (id + 1 == numTasks) {
printf("sleeping for %ld\n", id);
sleep(15);
}
printf("%ld\n", id);
}
int main(int argc, char** argv)
{
using namespace std;
using namespace boost;
asio::io_service io_service;
asio::io_service::work work(io_service);
const size_t numWorker = 4;
boost::thread_group workers;
for(size_t i = 0; i < numWorker; ++i) {
workers.create_thread(boost::bind(&asio::io_service::run, &io_service));
}
for(size_t i = 0; i < numTasks; ++i) {
io_service.post(boost::bind(print_counter, i));
}
// TODO: wait until all the tasks are done above
for(size_t i = 0; i < numTasks; ++i) {
io_service.post(boost::bind(print_counter, i));
}
// TODO: wait until all the tasks are done above
// ...
// Finally stop the service
io_service.stop();
workers.join_all();
return 0;
}
Your main problem is that all sets of your tasks are processed by the same instance of io_service. Function io_service::run returns where there is no tasks to be processed. Destructor of io_service::work informs io_service object that run can return where there are no pending tasks in queue to be performed. You can post all tasks from first set, then destroyed work and wait until io_service::run returns, then create again work object, post tasks from the next set and delete work, and so on. To do it just write helper class which may look like something below:
class TasksWaiter
{
public:
TasksWaiter(int numOfThreads)
{
work = std::make_unique<boost::asio::io_service::work>(io_service);
for(size_t i = 0; i < numOfThreads; ++i) {
workers.create_thread(boost::bind(&boost::asio::io_service::run, &io_service));
}
}
~TasksWaiter() {
work.reset();
workers.join_all();
}
template<class F>
void post(F f) {
io_service.post(f);
}
boost::thread_group workers;
boost::asio::io_service io_service;
std::unique_ptr<boost::asio::io_service::work> work;
};
int main()
{
{
TasksWaiter w1{4};
for (int i = 0; i < numTasks; ++i)
w1.post(boost::bind(print_counter,i));
// work in w1 is destroyed, then io_service::run ends
// when there are no tasks to be performed
}
printf("wait here");
{
TasksWaiter w1{4};
for (int i = 0; i < numTasks; ++i)
w1.post(boost::bind(print_counter,i));
}
}
a few remarks:
in constructor pool of threads are created
in destructor work is deleted, so io_service::run returns only if there are no pending tasks
functionality of destructor can be wrapped into a member function - e.g. wait, then you don't have to use {} scope to wait for your tasks.
From the io_service::run documentation:
The run() function blocks until all work has finished and there are no more handlers to be dispatched, or until the io_context has been stopped.
Also, from the io_context::work constructor documentation:
The constructor is used to inform the io_context that some work has begun. This ensures that the io_context object's run() function will not exit while the work is underway.
[Emphasis mine]
In short, if the run function returns and stopped returns false, then all work has been finished.
My code acquires images and processes them. Performance is critical for my code, so I've tried my hand at multi-threading. Currently, I've only made the acquiring part a separate thread. I'm implementing a simple FIFO buffer using std::queue that stores the acquired images. The acquisition function AcquireImages writes raw image data to this buffer indefinitely until user interruption. Processing function, ProcessImages reads the buffer and processes the image data (currently in the main thread but I'm planning to make this a separate thread as well once I've ironed out issues). Here's my code (modified to form an MCV example):
#include <iostream>
#include <vector>
#include <queue>
#include <atomic>
#include <thread>
#define NUM_CAMERAS 2
void AcquireImages(std::queue<unsigned char*> &rawImageQueue, std::atomic<bool> &quit)
{
unsigned char* rawImage{};
while (!quit)
{
for (int camera = 0; camera < NUM_CAMERAS; camera++)
{
switch (camera)
{
case 0:
rawImage = (unsigned char*)"Cam0Image";
break;
case 1:
rawImage = (unsigned char*)"Cam1Image";
break;
default:
break;
}
rawImageQueue.push(std::move(rawImage));
}
}
}
int ProcessImages(const std::vector<unsigned char*> &rawImageVec, const int count)
{
// Do something to the raw image vector
if (count > 10)
{
return 1;
}
else
{
return 0;
} // In my application, this function only returns non-zero upon user interception.
}
int main()
{
// Preparation
std::vector<unsigned char*> rawImageVec;
rawImageVec.reserve(NUM_CAMERAS);
std::queue<unsigned char*> rawImageQueue;
int count{};
const unsigned int nThreads = 1; // this might grow later
std::atomic<bool> loopFlags[nThreads];
std::thread threads[nThreads];
// Start threads
for (int i = 0; i < nThreads; i++) {
loopFlags[i] = false;
threads[i] = std::thread(AcquireImages, rawImageQueue, ref(loopFlags[i]));
}
// Process images
while (true)
{
// Process the images
for (int cam{}; cam < NUM_CAMERAS; ++cam)
{
rawImageVec.push_back(rawImageQueue.front());
rawImageQueue.pop();
}
int processResult = ProcessImages(move(rawImageVec), count);
if (processResult)
{
std::cout << "Leaving while loop.\n"; // In my application this is triggered by the user
break;
}
rawImageVec.clear();
++count;
}
// Shutdown other threads
for (auto & flag : loopFlags) {
flag = true;
}
// Wait for threads to actually finish.
for (auto& thread : threads) {
thread.join();
}
return 0;
}
Some of you may have already noticed my blunder. What I know is that this program throws an exception atrawImageVec.push_back(rawImageQueue.front());.
The output after throwing the exception reads as follows:
Debug Assertion Failed!
Program: C:\WINDOWS\SYSTEM32\MSVCP140D.dll
File: c:\program files (x86)\microsoft visual studio 14.0\vc\include\deque
Line: 329
Expression: deque iterator not dereferencable
I understand the cause of the issue is probably that I'm reading something that is shared with another thread (Am I correct?). How do I resolve this?
I followed Praetorian's advice in the comments, after checking to see if rawImageQueue is empty, I see that it's always empty. I'm not sure what's causing this.
Here is a generalized example of producer/consumer on a shared queue. The idea is that if you're writing and reading from a data structure, you need some kind of protection around accesses.
For this, the below example uses condition variables and a mutex.
#include <thread>
#include <iostream>
#include <chrono>
#include <queue>
#include <mutex>
#include <vector>
#include <condition_variable>
using namespace std::chrono_literals;
using std::vector;
using std::thread;
using std::unique_lock;
using std::mutex;
using std::condition_variable;
using std::queue;
class WorkQueue
{
condition_variable work_available;
mutex work_mutex;
queue<int> work;
public:
void push_work(int item)
{
unique_lock<mutex> lock(work_mutex);
bool was_empty = work.empty();
work.push(item);
lock.unlock();
if (was_empty)
{
work_available.notify_one();
}
}
int wait_and_pop()
{
unique_lock<mutex> lock(work_mutex);
while (work.empty())
{
work_available.wait(lock);
}
int tmp = work.front();
work.pop();
return tmp;
}
};
int main() {
WorkQueue work_queue;
auto producer = [&]() {
while (true) {
work_queue.push_work(10);
std::this_thread::sleep_for(2ms);
}
};
vector<thread> producers;
producers.push_back(std::thread(producer));
producers.push_back(std::thread(producer));
producers.push_back(std::thread(producer));
producers.push_back(std::thread(producer));
std::thread consumer([&]() {
while (true)
{
int work_to_do = work_queue.wait_and_pop();
std::cout << "Got some work: " << work_to_do << std::endl;
}
});
std::for_each(producers.begin(), producers.end(), [](thread &p) {
p.join();
});
consumer.join();
}
Your case is relatively simple as seems you have just one producer and one consumer. Also image processing sounds quite slow (slow enough to not worry about threads contention) and you're switching from single-threaded version so probably no need to bother with highly efficient lock-free implementations.
I'd recommend to study this pseudo code: https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem#Using_monitors, then to learn about condition variables if you need: http://en.cppreference.com/w/cpp/thread/condition_variable.
I've been doing pretty basic stuff with std::thread without any particular reason, simply in order to learn it. I thought that the simple example I created, where few threads are operating on the same data, locking each other before doing so, worked just fine, until I realized that every time I run it the returned value is different, while very close to each other, I am pretty sure they should equal each other. Some of the values I have received:
21.692524
21.699258
21.678871
21.705947
21.685744
Am I doing something wrong or maybe there is underlying reason for that behaviour?
#include <string>
#include <iostream>
#include <thread>
#include <math.h>
#include <time.h>
#include <windows.h>
#include <mutex>
using namespace std;
mutex mtx;
mutex mtx2;
int currentValue = 1;
double suma = 0;
int assignPart() {
mtx.lock();
int localValue = currentValue;
currentValue+=10000000;
mtx.unlock();
return localValue;
}
void calculatePart()
{
int value;
double sumaLokalna = 0;
while(currentValue<1500000000){
value = assignPart();
for(double i=value;i<(value+10000000);i++){
sumaLokalna = sumaLokalna + (1/(i));
}
mtx2.lock();
suma+=sumaLokalna;
mtx2.unlock();
sumaLokalna = 0;
}
}
int main()
{
clock_t startTime = clock();
// Constructs the new thread and runs it. Does not block execution.
thread watek(calculatePart);
thread watek2(calculatePart);
thread watek3(calculatePart);
thread watek4(calculatePart);
while(currentValue<1500000000){
Sleep(100);
printf("%-12d %-12lf \n",currentValue, suma);
}
watek.join();
watek2.join();
watek3.join();
watek4.join();
cout << double( clock() - startTime ) / (double)CLOCKS_PER_SEC<< " seconds." << endl;
//Makes the main thread wait for the new thread to finish execution, therefore blocks its own execution.
}
Your loop
while(currentValue<1500000000){
Sleep(100);
printf("%-12d %-12lf \n",currentValue, suma);
}
is printing intermediate results, but you're not printing the final result.
To print the final result, add the line
printf("%-12d %-12lf \n",currentValue, suma);
after joining the threads.
I'm experimenting with locking data on Windows vs Linux.
The code I'm using for testing looks something like this:
#include <mutex>
#include <time.h>
#include <iostream>
#include <vector>
#include <thread>
using namespace std;
mutex m;
unsigned long long dd = 0;
void RunTest()
{
for(int i = 0; i < 100000000; i++)
{
unique_lock<mutex> lck{m};
//boost::mutex::scoped_lock guard(m1);
dd++;
}
}
int main(int argc, char *argv[])
{
clock_t tStart = clock();
int tCount = 0;
vector<shared_ptr<thread>> threads;
for(int i = 0; i < 10;i++)
{
threads.push_back(shared_ptr<thread>{new thread(RunTest)});
}
RunTest();
for(auto t:threads)
{
t->join();
}
cout << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << endl;
return 0; //v.size();
}
I'm testing g++ -O3 vs Visual Studio 2013 compiled release mode.
When I use unique_lock<mutex> for sync, Linux beats Windows in most scenarios, sometimes significantly.
But when I use Windows' CRITICAL_SECTION, the situation reverses, and windows code becomes much faster than that on Linux, especially as thread count increases.
Here's the code I'm using for windows' critical section testing:
#include <stdafx.h>
#include <mutex>
#include <time.h>
#include <iostream>
//#include <boost/mutex>
#include <vector>
#include <thread>
#include<memory>
#include <Windows.h>
using namespace std;
mutex m;
unsigned long long dd = 0;
CRITICAL_SECTION critSec;
void RunTest()
{
for (int i = 0; i < 100000000; i++)
{
//unique_lock<mutex> lck{ m };
EnterCriticalSection(&critSec);
dd++;
LeaveCriticalSection(&critSec);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
InitializeCriticalSection(&critSec);
clock_t tStart = clock();
int tCount = 0;
vector<shared_ptr<thread>> threads;
for (int i = 0; i < 10; i++)
{
threads.push_back(shared_ptr<thread>{new thread(RunTest)});
}
RunTest();
for (auto t : threads)
{
t->join();
}
cout << ((double)(clock() - tStart) / CLOCKS_PER_SEC) << endl;
DeleteCriticalSection(&critSec);
return 0;
}
The way I understand why this is happening is that critical sections are process-specific.
Most of sync I'll be doing will be inside a single process.
Is there anything on Linux, which is faster than mutex or windows' critical section?
First, your code have a huge race problem thus does not reflect any sane situation, and is very not suitable for benchmark. This is because, most mutex implementation are optimized in case where a lock can be acquired without waiting, in other case, ie high contention, which involve blocking a thread, then the mutex overhead become insignificant and you should redesign the system to get decent improvement, like split into multiple locks, use lockless algorithm or use transactional memory (available as TSX extension in some haswell processor, or software implementation).
Now, to explain the difference, CriticalSection on windows actually do a short time spinlock before resolving to thread-blocking mutex. Since blocking a thread involve order of magnitude overhead, in low contention situation a spinlock may greatly reduce the chance of getting into such overhead (Note that in high contention situation a spinlock actually make it worst).
On linux, you may want to look into fast userspace mutex, or futex, which adopt a similar idea.