Trouble with multiple std::threads and main program execution

Trouble with multiple std::threads and main program execution - c++

I have been struggling for days to come up with a mechanism for launching a few timers and not having it clock the main program execution. Combinations of .join() and .detach(), wait_until(), etc
What I have is a vector of std::thread and I want to:
execute the first position
wait for it to finish
execute the next position
wait for it to finish
meanwhile the rest of my app is running along, users clicking things, etc. Everything I come up with seems to either:
block the main program from running while the timers are going
or
detach from the main thread but then the timers run concurrently, how I want one after the previous one has finished.
I even posted: C++11 std::threads and waiting for threads to finish but no resolution that I can seem to make sense of either.
should I be using std::launch::async maybe?
EDIT: I am not sure why this is so hard for me to grasp. I mean video games do this all the time. Take Tiny Tower for example. You stock your floors and each one of those operations has a delay from when you start the stock, until when that item is stocked and it triggers a HUD that pops up and says, "Floor is now stocked". Meanwhile the whole game stays running for you to do other things. I must be dense because I cannot figure this out.

This snippet of code will execute a std::vector of nullary tasks in a separate thread.
typedef std::vector<std::function< void() >> task_list;
typedef std::chrono::high_resolution_clock::duration timing;
typedef std::vector< timing > timing_result;
timing_result do_tasks( task_list list ) {
timing_result retval;
for (auto&& task: list) {
std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
task();
std::chrono::high_resolution_clock::time_point end = std::chrono::high_resolution_clock::now();
retval.push_back( end-start );
}
return retval;
}
std::future<timing_result> execute_tasks_in_order_elsewhere( task_list list ) {
return std::async( std::launch::async, do_tasks, std::move(list) );
}
this should run each of the tasks in series outside the main thread, and return a std::future that contains the timing results.
If you want the timing results in smaller chunks (ie, before they are all ready), you'll have to do more work. I'd start with std::packaged_task and return a std::vector<std::future< timing >> and go from there.
The above code is untested/uncompiled, but shouldn't have any fundamental flaws.
You'll note that the above does not use std::thread. std::thread is a low level tool that you should build tools on top of, not something you should use directly (it is quite fragile due to the requirement that it be joined or detached prior to destruction, among other things).
While std::async is nothing to write home about, it is great for quick-and-dirty multiple threading, where you want to take a serial task and do it "somewhere else". The lack of decent signaling via std::future makes it less than completely general (and is a reason why you might want to write higher level abstractions around std::thread).
Here is one that will run a sequence of tasks with a minimum amount of delay between them:
#include <chrono>
#include <iostream>
#include <vector>
#include <functional>
#include <thread>
#include <future>
typedef std::chrono::high_resolution_clock::duration duration;
typedef std::chrono::high_resolution_clock::time_point time_point;
typedef std::vector<std::pair<duration, std::function< void() >>> delayed_task_list;
void do_delayed_tasks( delayed_task_list list ) {
time_point start = std::chrono::high_resolution_clock::now();
time_point last = start;
for (auto&& task: list) {
time_point next = last + task.first;
duration wait_for = next - std::chrono::high_resolution_clock::now();
std::this_thread::sleep_for( wait_for );
task.second();
last = next;
}
}
std::future<void> execute_delayed_tasks_in_order_elsewhere( delayed_task_list list ) {
return std::async( std::launch::async, do_delayed_tasks, std::move(list) );
}
int main() {
delayed_task_list meh;
meh.emplace_back( duration(), []{ std::cout << "hello world\n"; } );
std::future<void> f = execute_delayed_tasks_in_order_elsewhere( meh );
f.wait(); // wait for the task list to complete: you can instead store the `future`
}
which should make the helper async thread sleep for (at least as long as) the durations you use before running each task. As written, time taken to execute each task is not counted towards the delays, so if the tasks take longer than the delays, you'll end up with the tasks running with next to no delay between them. Changing that should be easy, if you want to.

Your trouble is understandable, because what you need in order to have timers that don't block your event loop, is an event loop, and C++ doesn't yet have a standard one. You need to use other frameworks (such as Qt, Boost.Asio(?) or non-portable APIs (select(), etc)) to write event loops.

Related

Generating interrupt each 100 microsecond on windows

i want to generate interrupt every 100 microseconds on windows. Actually i couldnt do this on windows,because windows does not guarantee the interrupts less then 500 microseconds. So, i generate 2 threads. One of them is for timer counter(query performance counter), the other thread is the actual work. When timer counter is 100 microseconds, it change the state of the other thread(actual work) . But i have problem with race condition, because i dont want the threads wait each others, they must always run. So actually i need interrupts. How do i write such fast interrupt on windows with c++?

To avoid having two threads communicating when you have these short time windows, I'd put both the work and the timer in a loop in one thread.
Take a sample of the clock when the thread starts and add 100μs to that each loop.
Sleep until the calculated time occurs. Normally, one would use std::this_thread::sleep_until to do such a sleep, but in this case, when the naps are so short, it often becomes a little too inaccurate, so I suggest busy-waiting in a tight loop that just checks the time.
Do your work.
In this example a worker thread runs for 10s without doing any real work. On my machine I could add work consisting of ~3000 additions in the slot where you are supposed to do your work before the whole loop started taking more than 100μs, so you'd better do what you aim to do really fast.
Example:
#include <atomic>
#include <chrono>
#include <iostream>
#include <thread>
using namespace std::chrono_literals;
static std::atomic<bool> running = true;
using myclock = std::chrono::steady_clock;
void worker() {
int loops = 0;
auto sleeper = myclock::now();
while(running) {
++loops; // count loops to check that it's good enough afterwards
// add 100us to the sleeper time_point
sleeper += 100us;
// busy-wait until it's time to do some work
while(myclock::now() < sleeper);
// do your work here
}
std::cout << loops << " (should be ~100000)\n";
}
int main() {
auto th = std::thread(worker);
// let the thread work for 10 seconds
std::this_thread::sleep_for(10s);
running = false;
th.join();
}
Possible output:
99996 (should be ~100000)
It takes a few clock cycles to get the thread started so don't worry about the number of loops not being exactly on target. Double the time the thread runs and you should still stay close to the target number of loops. What matters is that it's pretty good (but not realtime-good) once it's started running.

How to maintain certain frame rate in different threads

I have two different computational tasks that have to execute at certain frequencies. One has to be performed every 1ms and the other every 13.3ms. The tasks share some data.
I am having a hard time how to schedule these tasks and how to share data between them. One way that I thought might work is to create two threads, one for each task.
The first task is relatively simpler and can be handled in 1ms itself. But, when the second task (that is relatively more time-consuming) is going to launch, it will make a copy of the data that was just used by task 1, and continue to work on them.
Do you think this would work? How can it be done in c++?

There are multiple ways to do that in C++.
One simple way is to have 2 threads, as you described. Each thread does its action and then sleeps till the next period start. A working example:
#include <functional>
#include <iostream>
#include <chrono>
#include <thread>
#include <atomic>
#include <mutex>
std::mutex mutex;
std::atomic<bool> stop = {false};
unsigned last_result = 0; // Whatever thread_1ms produces.
void thread_1ms_action() {
// Do the work.
// Update the last result.
{
std::unique_lock<std::mutex> lock(mutex);
++last_result;
}
}
void thread_1333us_action() {
// Copy thread_1ms result.
unsigned last_result_copy;
{
std::unique_lock<std::mutex> lock(mutex);
last_result_copy = last_result;
}
// Do the work.
std::cout << last_result_copy << '\n';
}
void periodic_action_thread(std::chrono::microseconds period, std::function<void()> const& action) {
auto const start = std::chrono::steady_clock::now();
while(!stop.load(std::memory_order_relaxed)) {
// Do the work.
action();
// Wait till the next period start.
auto now = std::chrono::steady_clock::now();
auto iterations = (now - start) / period;
auto next_start = start + (iterations + 1) * period;
std::this_thread::sleep_until(next_start);
}
}
int main() {
std::thread a(periodic_action_thread, std::chrono::milliseconds(1), thread_1ms_action);
std::thread b(periodic_action_thread, std::chrono::microseconds(13333), thread_1333us_action);
std::this_thread::sleep_for(std::chrono::seconds(1));
stop = true;
a.join();
b.join();
}
If executing an action takes longer than one period to execute, then it sleeps till the next period start (skips one or more periods). I.e. each Nth action happens exactly at start_time + N * period, so that there is no time drift regardless of how long it takes to perform the action.
All access to the shared data is protected by the mutex.

So I'm thinking that task1 needs to make the copy, because it knows when it is safe to do so. Here is one simplistic model:
Shared:
atomic<Result*> latestResult = {0};
Task1:
Perform calculation
Result* pNewResult = new ResultBuffer
Copy result to pNewResult
latestResult.swap(pNewResult)
if (pNewResult)
delete pNewResult; // Task2 didn't take it!
Task2:
Result* pNewResult;
latestResult.swap(pNewResult);
process result
delete pNewResult;
In this model task1 and task2 only ever naggle when swapping a simple atomic pointer, which is quite painless.
Note that this makes many assumptions about your calculation. Could your task1 usefully calculate the result straight into the buffer, for example? Also note that at the start Task2 may find the pointer is still null.
Also it inefficiently new()s the buffers. You need 3 buffers to ensure there is never any significant naggling between the tasks, but you could just manage three buffer pointers under mutexes, such that Task 1 will have a set of data ready, and be writing another set of data, while task 2 is reading from a third set.
Note that even if you have task 2 copy the buffer, Task 1 still needs 2 buffers to avoid stalls.

You can use C++ threads and thread facilities like class thread and timer classes like steady_clock like it has been described in previous answer but if this solution works strongly depends on the platform your code is running on.
1ms and 13.3ms are pretty short time intervals and if your code is running on non-real time OS like Windows or non-RTOS Linux, there is no guarantee that OS scheduler will wake up your threads at exact times.
C++ 11 has the class high_resolution_clock that should use high resolution timer if your platform supports one but it still depends on the implementation of this class. And the bigger problem than the timer is using C++ wait functions. Neither C++ sleep_until nor sleep_for guarantees that they will wake up your thread at specified times. Here is the quote from C++ documentation.
sleep_for - blocks the execution of the current thread for at least the specified sleep_duration. sleep_for
Fortunately, most OS have some special facilities like Windows Multimedia Timers you can use if your threads are not woken up at expected times.
Here are more details. Precise thread sleep needed. Max 1ms error

How much time it takes for a thread waiting with pthread_cond_wait to wake after being signaled? how can I estimate this time?

I'm writing a C++ ThreadPool implantation and using pthread_cond_wait in my worker's main function. I was wondering how much time will pass from signaling the condition variable until the thread/threads waiting on it will wake up.
do you have any idea of how can I estimate/calculate this time?
Thank you very much.

It depends, on the cost of a context switch
on the OS,
The CPU
is it thread or a different process
the load of the machine
Is the switch to same core as it last ran on
what is the working set size
time since it last ran
Linux best case, i7, 1100ns, thread in same process, same core as it ran in last, ran as the last thread, no load, working set 1 byte.
Bad case, flushed from cache, different core, different process, just expect 30µs of CPU overhead.
Where does the cost go:
Save last process context 70-400 cycles,
load new context 100-400 cycles
if different process, flush TLB, reload 3 to 5 page walks, which potentially could be from memory taking ~300 cycles each. Plus a few page walks if more than one page is touched, including instructions and data.
OS overhead, we all like the nice statistics, for example add 1 to context switch counter.
Scheduling overhead, which task to run next
potential cache misses on new core ~12 cycles per cache line on own L2 cache, and downhill from there the farther away the data is and the more there is of it.

As mentioned time for condition variable to react depends on many factors. One option is to actually measure it: you may start a thread that waits on a condition variable. Then, another thread that signals the condition variable takes timestamp right before signaling the variable. The thread that waits on the variable also takes timestamp the moment it wakes up. Simple as that. This way you may have rough approximation about time it takes for the thread to notice the signaled condition.
#include <mutex>
#include <condition_variable>
#include <thread>
#include <chrono>
#include <stdio.h>
typedef std::chrono::time_point<std::chrono::high_resolution_clock> timep;
int main()
{
std::mutex mx;
std::condition_variable cv;
timep t0, t1;
bool done = false;
std::thread th([&]() {
while (!done)
{
std::unique_lock lock(mx);
cv.wait(lock);
t1 = std::chrono::high_resolution_clock::now();
}
});
for (int i = 0; i < 25; ++i) // measure 25 times
{
std::this_thread::sleep_for(std::chrono::milliseconds(10));
t0 = std::chrono::high_resolution_clock::now();
cv.notify_one();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
std::unique_lock lock(mx);
printf("test#%-2d: cv reaction time: %6.3f micro\n", i,
1000000 * std::chrono::duration<double>(t1 - t0).count());
}
{
std::unique_lock lock(mx);
done = true;
}
cv.notify_one();
th.join();
}
Try it on coliru, it produced this output:
test#0 : cv reaction time: 50.488 micro
test#1 : cv reaction time: 55.057 micro
test#2 : cv reaction time: 53.765 micro
test#3 : cv reaction time: 50.973 micro
test#4 : cv reaction time: 51.015 micro
test#5 : cv reaction time: 57.166 micro
and so on...
On my windows 11 laptop I got values roughly 5-10x faster (5-10 microseconds).

Synchronizing very fast threads

In the following example (an idealized "game") there are two threads. The main thread which updates data and RenderThread which "renders" it to the screen. What I need it those two to be synchronized. I cannot afford to run several update iteration without running a render for every single one of them.
I use a condition_variable to sync those two, so ideally the faster thread will spend some time waiting for the slower. However condition variables don't seem to do the job if one of the threads completes an iteration for a very small amount of time. It seems to quickly reacquire the lock of the mutex before wait in the other thread is able to acquire it. Even though notify_one is called
#include <iostream>
#include <thread>
#include <chrono>
#include <atomic>
#include <functional>
#include <mutex>
#include <condition_variable>
using namespace std;
bool isMultiThreaded = true;
struct RenderThread
{
RenderThread()
{
end = false;
drawing = false;
readyToDraw = false;
}
void Run()
{
while (!end)
{
DoJob();
}
}
void DoJob()
{
unique_lock<mutex> lk(renderReadyMutex);
renderReady.wait(lk, [this](){ return readyToDraw; });
drawing = true;
// RENDER DATA
this_thread::sleep_for(chrono::milliseconds(15)); // simulated render time
cout << "frame " << count << ": " << frame << endl;
++count;
drawing = false;
readyToDraw = false;
lk.unlock();
renderReady.notify_one();
}
atomic<bool> end;
mutex renderReadyMutex;
condition_variable renderReady;
//mutex frame_mutex;
int frame = -10;
int count = 0;
bool readyToDraw;
bool drawing;
};
struct UpdateThread
{
UpdateThread(RenderThread& rt)
: m_rt(rt)
{}
void Run()
{
this_thread::sleep_for(chrono::milliseconds(500));
for (int i = 0; i < 20; ++i)
{
// DO GAME UPDATE
// when this is uncommented everything is fine
// this_thread::sleep_for(chrono::milliseconds(10)); // simulated update time
// PREPARE RENDER THREAD
unique_lock<mutex> lk(m_rt.renderReadyMutex);
m_rt.renderReady.wait(lk, [this](){ return !m_rt.drawing; });
m_rt.readyToDraw = true;
// SUPPLY RENDER THREAD WITH DATA TO RENDER
m_rt.frame = i;
lk.unlock();
m_rt.renderReady.notify_one();
if (!isMultiThreaded)
m_rt.DoJob();
}
m_rt.end = true;
}
RenderThread& m_rt;
};
int main()
{
auto start = chrono::high_resolution_clock::now();
RenderThread rt;
UpdateThread u(rt);
thread* rendering = nullptr;
if (isMultiThreaded)
rendering = new thread(bind(&RenderThread::Run, &rt));
u.Run();
if (rendering)
rendering->join();
auto duration = chrono::high_resolution_clock::now() - start;
cout << "Duration: " << double(chrono::duration_cast<chrono::microseconds>(duration).count())/1000 << endl;
return 0;
}
Here is the source of this small example code, and as you can see even on ideone's run the output is frame 0: 19 (this means that the render thread has completed a single iteration, while the update thread has completed all 20 of its).
If we uncomment line 75 (ie simulate some time for the update loop) everything runs fine. Every update iteration has an associated render iteration.
Is there a way to really truly sync those threads, even if one of them completes an iteration in mere nanoseconds, but also without having a performance penalty if they both take some reasonable amount of milliseconds to complete?

If I understand correctly, you want the 2 threads to work alternately: updater wait until the renderer finish before to iterate again, and the renderer wait until the updater finish before to iterate again. Part of the computation could be parallel, but the number of iteration shall be similar between both.
You need 2 locks:
one for the updating
one for the rendering
Updater:
wait (renderingLk)
update
signal(updaterLk)
Renderer:
wait (updaterLk)
render
signal(renderingLk)
EDITED:
Even if it look simple, there are several problems to solve:
Allowing part of the calculations to be made in parallel: As in the above snippet, update and render will not be parallel but sequential, so there is no benefit to have multi-thread. To a real solution, some the calculation should be made before the wait, and only the copy of the new values need to be between the wait and the signal. Same for rendering: all the render need to be made after the signal, and only getting the value between the wait and the signal.
The implementation need to care also about the initial state: so no rendering is performed before the first update.
The termination of both thread: so no one will stay locked or loop infinitely after the other terminate.

I think a mutex (alone) is not the right tool for the job. You might want to consider using a semaphore (or something similar) instead. What you describe sound a lot like a producer/consumer problem, i.e., one process is allowed to run once everytime another process has finnished a task. Therefore you might also have a look at producer/consumer patterns. For example this series might get you some ideas:
A multi-threaded Producer Consumer with C++11
There a std::mutex is combined with a std::condition_variable to mimic the behavior of a semaphore. An approach that appears quite reasonable. You would probably not count up and down but rather toggle true and false a variable with needs redraw semantics.
For reference:
http://en.cppreference.com/w/cpp/thread/condition_variable
C++0x has no semaphores? How to synchronize threads?

This is because you use a separate drawing variable that is only set when the rendering thread reacquires the mutex after a wait, which may be too late. The problem disappears when the drawing variable is removed and the check for wait in the update thread is replaced with ! m_rt.readyToDraw (which is already set by the update thread and hence not susceptible to the logical race.
Modified code and results
That said, since the threads do not work in parallel, I don't really get the point of having two threads. Unless you should choose to implement double (or even triple) buffering later.

A technique often used in computer graphics is to use a double-buffer. Instead of having the renderer and the producer operate on the same data in memory, each one has its own buffer. This is implemented by using two independent buffers, and switch them when needed. The producer updates one buffer, and when it is done, it switches the buffer and fills the second buffer with the next data. Now, while the producer is processing the second buffer, the renderer works with the first one and displays it.
You could use this technique by letting the renderer lock the swap operation such that the producer may have to wait until rendering is finished.

Reusing thread in loop c++

I need to parallelize some tasks in a C++ program and am completely new to parallel programming. I've made some progress through internet searches so far, but am a bit stuck now. I'd like to reuse some threads in a loop, but clearly don't know how to do what I'm trying for.
I am acquiring data from two ADC cards on the computer (acquired in parallel), then I need to perform some operations on the collected data (processed in parallel) while collecting the next batch of data. Here is some pseudocode to illustrate
//Acquire some data, wait for all the data to be acquired before proceeding
std::thread acq1(AcquireData, boardHandle1, memoryAddress1a);
std::thread acq2(AcquireData, boardHandle2, memoryAddress2a);
acq1.join();
acq2.join();
while(user doesn't interrupt)
{
//Process first batch of data while acquiring new data
std::thread proc1(ProcessData,memoryAddress1a);
std::thread proc2(ProcessData,memoryAddress2a);
acq1(AcquireData, boardHandle1, memoryAddress1b);
acq2(AcquireData, boardHandle2, memoryAddress2b);
acq1.join();
acq2.join();
proc1.join();
proc2.join();
/*Proceed in this manner, alternating which memory address
is written to and being processed until the user interrupts the program.*/
}
That's the main gist of it. The next run of the loop would write to the "a" memory addresses while processing the "b" data and continue to alternate (I can get the code to do that, just took it out to prevent cluttering up the problem).
Anyway, the problem (as I'm sure some people can already tell) is that the second time I try to use acq1 and acq2, the compiler (VS2012) says "IntelliSense: call of an object of a class type without appropriate operator() or conversion functions to pointer-to-function type". Likewise, if I put std::thread in front of acq1 and acq2 again, it says " error C2374: 'acq1' : redefinition; multiple initialization".
So the question is, can I reassign threads to a new task when they have completed their previous task? I always wait for the previous use of the thread to end before calling it again, but I don't know how to reassign the thread, and since it's in a loop, I can't make a new thread each time (or if I could, that seems wasteful and unnecessary, but I could be mistaken).
Thanks in advance

The easiest way is to use a waitable queue of std::function objects. Like this:
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <functional>
#include <chrono>
class ThreadPool
{
public:
ThreadPool (int threads) : shutdown_ (false)
{
// Create the specified number of threads
threads_.reserve (threads);
for (int i = 0; i < threads; ++i)
threads_.emplace_back (std::bind (&ThreadPool::threadEntry, this, i));
}
~ThreadPool ()
{
{
// Unblock any threads and tell them to stop
std::unique_lock <std::mutex> l (lock_);
shutdown_ = true;
condVar_.notify_all();
}
// Wait for all threads to stop
std::cerr << "Joining threads" << std::endl;
for (auto& thread : threads_)
thread.join();
}
void doJob (std::function <void (void)> func)
{
// Place a job on the queu and unblock a thread
std::unique_lock <std::mutex> l (lock_);
jobs_.emplace (std::move (func));
condVar_.notify_one();
}
protected:
void threadEntry (int i)
{
std::function <void (void)> job;
while (1)
{
{
std::unique_lock <std::mutex> l (lock_);
while (! shutdown_ && jobs_.empty())
condVar_.wait (l);
if (jobs_.empty ())
{
// No jobs to do and we are shutting down
std::cerr << "Thread " << i << " terminates" << std::endl;
return;
}
std::cerr << "Thread " << i << " does a job" << std::endl;
job = std::move (jobs_.front ());
jobs_.pop();
}
// Do the job without holding any locks
job ();
}
}
std::mutex lock_;
std::condition_variable condVar_;
bool shutdown_;
std::queue <std::function <void (void)>> jobs_;
std::vector <std::thread> threads_;
};
void silly (int n)
{
// A silly job for demonstration purposes
std::cerr << "Sleeping for " << n << " seconds" << std::endl;
std::this_thread::sleep_for (std::chrono::seconds (n));
}
int main()
{
// Create two threads
ThreadPool p (2);
// Assign them 4 jobs
p.doJob (std::bind (silly, 1));
p.doJob (std::bind (silly, 2));
p.doJob (std::bind (silly, 3));
p.doJob (std::bind (silly, 4));
}

The std::thread class is designed to execute exactly one task (the one you give it in the constructor) and then end. If you want to do more work, you'll need a new thread. As of C++11, that's all we have. Thread pools didn't make it into the standard. (I'm uncertain what C++14 has to say about them.)
Fortunately, you can easily implement the required logic yourself. Here is the large-scale picture:
Start n worker threads that all do the following:
Repeat while there is more work to do:
Grab the next task t (possibly waiting until one becomes ready).
Process t.
Keep inserting new tasks in the processing queue.
Tell the worker threads that there is nothing more to do.
Wait for the worker threads to finish.
The most difficult part here (which is still fairly easy) is properly designing the work queue. Usually, a synchronized linked list (from the STL) will do for this. Synchronized means that any thread that wishes to manipulate the queue must only do so after it has acquired a std::mutex so to avoid race conditions. If a worker thread finds the list empty, it has to wait until there is some work again. You can use a std::condition_variable for this. Each time a new task is inserted into the queue, the inserting thread notifies a thread that waits on the condition variable and will therefore stop blocking and eventually start processing the new task.
The second not-so-trivial part is how to signal to the worker threads that there is no more work to do. Clearly, you can set some global flag but if a worker is blocked waiting at the queue, it won't realize any time soon. One solution could be to notify_all() threads and have them check the flag each time they are notified. Another option is to insert some distinct “toxic” item into the queue. If a worker encounters this item, it quits itself.
Representing a queue of tasks is straight-forward using your self-defined task objects or simply lambdas.
All of the above are C++11 features. If you are stuck with an earlier version, you'll need to resort to third-party libraries that provide multi-threading for your particular platform.
While none of this is rocket science, it is still easy to get wrong the first time. And unfortunately, concurrency-related bugs are among the most difficult to debug. Starting by spending a few hours reading through the relevant sections of a good book or working through a tutorial can quickly pay off.

This
std::thread acq1(...)
is the call of an constructor. constructing a new object called acq1
This
acq1(...)
is the application of the () operator on the existing object aqc1. If there isn't such a operator defined for std::thread the compiler complains.
As far as I know you may not reused std::threads. You construct and start them. Join with them and throw them away,

Well, it depends if you consider moving a reassigning or not. You can move a thread but not make a copy of it.
Below code will create new pair of threads each iteration and move them in place of old threads. I imagine this should work, because new thread objects will be temporaries.
while(user doesn't interrupt)
{
//Process first batch of data while acquiring new data
std::thread proc1(ProcessData,memoryAddress1a);
std::thread proc2(ProcessData,memoryAddress2a);
acq1 = std::thread(AcquireData, boardHandle1, memoryAddress1b);
acq2 = std::thread(AcquireData, boardHandle2, memoryAddress2b);
acq1.join();
acq2.join();
proc1.join();
proc2.join();
/*Proceed in this manner, alternating which memory address
is written to and being processed until the user interrupts the program.*/
}
What's going on is, the object actually does not end it's lifetime at the end of the iteration, because it is declared in the outer scope in regard to the loop. But a new object gets created each time and move takes place. I don't see what can be spared (I might be stupid), so I imagine this it's exactly the same as declaring acqs inside the loop and simply reusing the symbol. All in all ... yea, it's about how you classify a create temporary and move.
Also, this clearly starts a new thread each loop (of course ending the previously assigned thread), it doesn't make a thread wait for new data and magically feed it to the processing pipe. You would need to implement it a differently like. E.g: Worker threads pool and communication over queues.
References: operator=, (ctor).
I think the errors you get are self-explanatory, so I'll skip explaining them.

I think you need a much more simpler answer for running a set of threads more than once, this is the best solution:
do{
std::vector<std::thread> thread_vector;
for (int i=0;i<nworkers;i++)
{
thread_vector.push_back(std::thread(yourFunction,Parameter1,Parameter2, ...));
}
for(std::thread& it: thread_vector)
{
it.join();
}
q++;
} while(q<NTIMES);

You also could make your own Thread class and call its run method like:
class MyThread
{
public:
void run(std::function<void()> func) {
thread_ = std::thread(func);
}
void join() {
if(thread_.joinable())
thread_.join();
}
private:
std::thread thread_;
};
// Application code...
MyThread myThread;
myThread.run(AcquireData);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js