Atomically increment and assign to another atomic - c++

Suppose I have some global:
std::atomic_int next_free_block;
and a number of threads each with access to a
std::atomic_int child_offset;
that may be shared between threads. I would like to allocate free blocks to child offsets in a contiguous manner, that is, I want to perform the following operation atomically:
if (child_offset != 0) child_offset = next_free_block++;
Obviously the above implementation does not work as multiple threads may enter the body of the if statement and then try to assign different blocks to child_offset.
I have also considered the following:
int expected = child_offset;
do {
if (expected == 0) break;
int updated = next_free_block++;
} while (!child_offset.compare_exchange_weak(&expected, updated);
But this also doesn't work because if the CAS fails, the side effect of incrementing next_free_block remains even if nothing is assigned to child_offset. This leaves gaps in the allocation of free blocks.
I am aware that I could do this with a mutex (or some kind of spin lock) around each child_offset and potentially DCLP, but I would like to know if this is possible to implement efficiently with atomic operations.
The use case for this is as follows: I have a large tree that I'm building in parallel. The tree is an array of the following:
struct tree_page {
atomic<uint32_t> allocated;
uint32_t child_offset[8];
uint32_t nodes[1015];
};
The tree is built level by level: first the nodes at depth 0 are created, then at depth 1, etc. A separate thread is dispatched for each non-leaf node at the previous step. If no more space is left in a page, a new page is allocated from the global next_free_page which points to the first unused page in the array of struct tree_page and is assigned to an element of child_ptr. A bit field is then set in the node word that indicates which element of the child_ptr array should be used to find the node's children.
The code I am trying to write looks like this:
int expected = allocated.load(relaxed), updated;
do {
updated = expected + num_children;
if (updated > NODES_PER_PAGE) {
expected = -1; break;
}
} while (!allocated.compare_exchange_weak(&expected, updated));
if (expected != -1) {
// successfully allocated in the same page
} else {
for (int i = 0; i < 8; ++i) {
// this is the operation I would like to be atomic
if (child_offset[i] == 0)
child_offset[i] = next_free_block++;
int offset = try_allocating_at_page(pages[child_offset[i]]);
if (offset != -1) {
// successfully allocated at child_offset i
// ...
break;
}
}
}

As far as I understood from you description you array of child_offset is filled with 0 initially and then filled with some concrete values concurrently by different threads.
In this case you can atomically "tag" value first and if you are successful assign valid value. Something like this:
constexpr int INVALID_VALUE = -1;
for (int i = 0; i < 8; ++i) {
int expected = 0;
// this is the operation I would like to be atomic
if (child_offset[i].compare_exchange_weak(expected, INVALID_VALUE)) {
child_offset[i] = next_free_block++;
}
// Not sure if this is needed in your environment, but just in case
if (child_offset[i] == INVALID_VALUE) continue;
...
}
This doesn't guarantee that all values in child_offset array will be in ascending order. But if you need that why not fill it without multithreading involved?

Related

Deadlock occuring after multiple iterations of queries in threads (multithreading)

I encounter deadlocks while executing the code snippet below as a thread.
void thread_lifecycle(
Queue<std::tuple<int64_t, int64_t, uint8_t>, QUEUE_SIZE>& query,
Queue<std::string, QUEUE_SIZE>& output_queue,
std::vector<Object>& pgs,
bool* pgs_executed, // Initialized to array false-values
std::mutex& pgs_executed_mutex,
std::atomic<uint32_t>& atomic_pgs_finished
){
bool iter_bool = false;
std::tuple<int64_t, int64_t, uint8_t> next_query;
std::string output = "";
int64_t lower, upper;
while(true) {
// Get next query
next_query = query.pop_front();
// Stop Condition reached terminate thread
if (std::get<2>(next_query) == uint8_t(-1)) break;
//Set query params
lower = std::get<0>(next_query);
upper = std::get<1>(next_query);
// Scan bool array
for (uint32_t i = 0; i < pgs.size(); i++){
// first lock for reading
pgs_executed_mutex.lock();
if (pgs_executed[i] == iter_bool) {
pgs_executed[i] = !pgs_executed[i];
// Unlock and execute the query
pgs_executed_mutex.unlock();
output = pgs.at(i).get_result(lower, upper);
// If query yielded a result, then add it to the output
if (output.length() != 0) {
output_queue.push_back(output);
}
// Inform main thread in case of last result
if (++atomic_pgs_finished >= pgs.size()) {
output_queue.push_back("LAST_RESULT_IDENTIFIER");
atomic_pgs_finished.exchange(0);
}
} else {
pgs_executed_mutex.unlock();
continue;
}
}
//finally flip for next query
iter_bool = !iter_bool;
}
}
Explained:
I have a vector of objects containing information which can be queried (similar to as a table in a database). Each thread can access the objects and all of them iterate the vector ONCE to query the objects which have not been queried and return results, if any.
In the next query it goes through the vector again, and so on... I use the bool* array to denote the entries which are currently queried, so that the processes can synchronize and determine which query should be executed next.
If all have been executed, the last thread having possibly the last results will also return an identifier for the main thread in order to inform that all objects have been queried.
My Question:
Regarding the bool* as well as atomic-pgs_finished, can there be a scenario, in-which a deadlock can occur. As far as i can think, i cannot see a deadlock in this snippet. However, executing this and running this for a while results into a deadlock.
I am seriously considering that a bit (byte?) has randomly flipped causing this deadlock (on ECC-RAM), so that 1 or more objects actually were not executed. Is this even possible?
Maybe another implementation could help?
Edit, Implementation of the Queue:
template<class T, size_t MaxQueueSize>
class Queue
{
std::condition_variable consumer_, producer_;
std::mutex mutex_;
using unique_lock = std::unique_lock<std::mutex>;
std::queue<T> queue_;
public:
template<class U>
void push_back(U&& item) {
unique_lock lock(mutex_);
while(MaxQueueSize == queue_.size())
producer_.wait(lock);
queue_.push(std::forward<U>(item));
consumer_.notify_one();
}
T pop_front() {
unique_lock lock(mutex_);
while(queue_.empty())
consumer_.wait(lock);
auto full = MaxQueueSize == queue_.size();
auto item = queue_.front();
queue_.pop();
if(full)
producer_.notify_all();
return item;
}
};
Thanks to #Ulrich Eckhardt (,
#PaulMcKenzie and all the other comments, thank you for the brainstorming!). I probably have found the cause of the deadlock. I tried to reduce this example even more and thought on removing atomic_pgs_finished, a variable indicating whether all pgs have been queried. Interestingly: ++atomic_pgs_finished >= pgs.size() returns not only once but multiple times true, so that multiple threads are in this specific if-clause.
I simply fixed it by using another mutex around this if-clause. Maybe someone can explain why ++atomic_pgs_finished >= pgs.size() is not atomic and causes true for multiple threads.
Below i have updated the code (mostly the same as in the question) with comments, so that it might be more understandable.
void thread_lifecycle(
Queue<std::tuple<int64_t, int64_t, uint8_t>, QUEUE_SIZE>& query, // The input queue containing queries, in my case triples
Queue<std::string, QUEUE_SIZE>& output_queue, // The Output Queue of results
std::vector<Object>& pgs, // Objects which should be queried
bool* pgs_executed, // Initialized to an array of false-values
std::mutex& pgs_executed_mutex, // a mutex, protecting pgs_executed
std::atomic<uint32_t>& atomic_pgs_finished // atomic counter to count how many have been executed (to send a end signal)
){
// Initialize variables
std::tuple<int64_t, int64_t, uint8_t> next_query;
std::string output = "";
int64_t lower, upper;
// Set the first iteration to false for the very first query
// This flips on the second iteration to reuse pgs_executed with true values and so on...
bool iter_bool = false;
// Execute as long as valid queries are received
while(true) {
// Get next query
next_query = query.pop_front();
// Stop Condition reached terminate thread
if (std::get<2>(next_query) == uint8_t(-1)) break;
// "Parse query" to query the objects in pgs
lower = std::get<0>(next_query);
upper = std::get<1>(next_query);
// Now iterate through the pgs and pgs_executed (once)
for (uint32_t i = 0; i < pgs.size(); i++){
// Lock to read and write into pgs_executed
pgs_executed_mutex.lock();
if (pgs_executed[i] == iter_bool) {
pgs_executed[i] = !pgs_executed[i];
// Unlock since we now execute the query on the object (which was not queried before)
pgs_executed_mutex.unlock();
// Query Execution
output = pgs.at(i).get_result(lower, upper);
// If the query yielded a result, then add it to the output for the main thread to read
if (output.length() != 0) {
output_queue.push_back(output);
}
// HERE THE ROOT CAUSE OF THE DEADLOCK HAPPENS
// Here i would like to inform the main thread that we exexuted the query on
// every object in pgs, so that it should no longer wait for other results
if (++atomic_pgs_finished >= pgs.size()) {
// In this if-clause multiple threads are present at once!
// This is not intended and causes a deadlock, push_back-ing
// multiple times "LAST_RESULT_IDENTIFIER" in-which the main-thread
// assumed that a query has finished. The main thread then simply added the next query, while the
// previous one was not finished causing threads to race each other on two queries simultaneously
// and not having the same iter_bool!
output_queue.push_back("LAST_RESULT_IDENTIFIER");
atomic_pgs_finished.exchange(0);
}
// END: HERE THE ROOT CAUSE OF THE DEADLOCK HAPPENS
} else {
// This case happens when the next element in the list was already executed (by another process),
// simply unlock pgs_executed and continue with the next element in pgs
pgs_executed_mutex.unlock();
continue; // This is uneccessary and could be removed
}
}
//finally flip for the next query in order to reuse bool* (which now has trues if a second query is incoming)
iter_bool = !iter_bool;
}
}

how to implement a memory allocator

I'm trying to implement the freelist algorithm to allocate memory. The two functions I'm trying to write can be described as shown below.
// allocates a block of memory of at least size words and returns the address of that memory or 0 if no memory could be allocated.
int64_t *mymalloc(int64_t size)
// deallocates the memory stored at addr. the address will either be one allocated by mymalloc or the value 0.
void myfree(int64_t *addr)
The implementations of these functions should only use memory returned by the function pool(), whose signature is described below. Thus it cannot use the functions new, delete, malloc, calloc, realloc, etc.
// pool is a function that returns the address of a beginning // of a block of RAM that may be used for dynamic memory
// allocation. The size of the pool in bytes is stored in the // first word, which can be assumed to be a multiple of 8.
// When pool() is called, the first word isn't always overwritten with its size.
// Each word is an int64_t *, and so is 8 bytes.
// Assume this function works.
int64_t *pool();
I think defining some global variables like freelst, which points to the start of the freelst, may be helpful. It can be defined as
int64_t *freelst = pool();
I know that when allocating memory, there are some steps to follow:
The free list pointer should be updated accordingly.
The number of allocated blocks should be incremented.
The amount of memory allocated should be subtracted from the first word of the freelist, so that the first word always stores the size of memory available.
One needs to check if the current block of memory has been previously freed.
When deallocating memory, one needs to ensure addresses are inserted into the freelist in increasing order so that neighbours can be determined. If neighbours (which differ by 8) are free, they need to be merged, and as many times as necessary until no free neighbours are encountered to reduce fragmentation. Also, the second word of the freelst should be a pointer to the next word of the free
Below is some code I've come up with for this problem. It's incomplete, but the basic ideas are there.
#include <iostream>
#include <cstdint>
#include "pool.h" // place where pool is defined
const int NODE_SIZE = 8;
int64_t *freelst = pool();
int64_t *start_of_pool = freelst; // just keep this fixed I guess
// assume that the pool function works.
int64_t *mymalloc(int64_t size) {
int64_t *currentBlock = freelst;
while (currentBlock) {
if (*currentBlock >= size) { // if the currentBlock is large enough, set it to this value (we're doing first fit).
break;
}
currentBlock = currentBlock + 1; // since incrementing involves moving to the address that's one word past the current one.
}
// assuming we've found a large enough block, we now have to allocate it
if (currentBlock == 0) {
return 0; // I think this should occur because not enough memory was found
}
int64_t *prev_val = freelst; // save the previous value of the freelist
freelst = freelst + 1 + *currentBlock; // assuming *currentBlock is the size of currentBlock.
*freelst -= NODE_SIZE + *currentBlock; // update the size of the freelst here (though likely this was done incorrectly)
return currentBlock + 1; // return address one word after currentBlock
// is this all if we're trying to implement a linked list using raw pointers?
// I don't think so, but I'm not sure what else to add.
}
void myfree(int64_t *p) {
if (p == 0) {
return; // of course if we're freeing a nullptr, we should return 0.
}
// assume the freelst is already in ascending order of course.
// sort the freelst in linear time by positioning the currentBlock into the right place.
// the basic idea is to use insertion sort.
// find where the address p is in the free list.
// I think another method would be to update the prevBlock as the currentBlock is being updated.
int64_t *currentBlock = freelst;
int64_t *prevBlock = freelst;
while (currentBlock != 0 && currentBlock + 1 <= p) { // comparing addresses
prevBlock = currentBlock; // so it's set to the previous block
currentBlock = (int64_t *)*(currentBlock + 1); // set it to the next address
// as a linked list, I'm thinking of doing something like:
// prevBlock = currentBlock;
// currentBlock = currentBlock->next;
}
// after exiting, either currentBlock = 0, in which case p is the largest address,
// or currentBlock + 1 > p, so it's smaller than the current address.
if (currentBlock == 0) { // then p is the largest address
if ((int64_t *)*(prevBlock + 1) != currentBlock) throw std::invalid_argument("A likely error occurred as prevBlock + 1 != currentBlock.");
*(prevBlock + 1) = (int64_t)p;
*(p + 1) = 0;
// p->next = 0
// prevBlock->next = p;
} else {
if (prevBlock == currentBlock) { // in this case currentBlock was the start of the freelst
int64_t *temp = (int64_t *)*(currentBlock + 1);
*(prevBlock + 1) = (int64_t)p; // cast so it passes type-checking
*(p + 1) = (int64_t)temp;
// here I'm trying to mimic what's done for a linked list:
// int64_t *temp = currentBlock->next;
// prevBlock->next = p;
// p->next = temp;
} else {
*(prevBlock + 1) = (int64_t)p;
*(p + 1) = (int64_t)currentBlock;
// here's what I think might be the equivalent for a linked list:
// prevBlock->next = p;
// p->next = currentBlock;
}
}
if (currentBlock != 0) { // if it not null
if (currentBlock + 1 + *currentBlock == (int64_t *)(currentBlock + 1)) { // check if currentBlock is adjacent to prevBlock
*currentBlock += *(int64_t *)*(currentBlock + 1) + NODE_SIZE;
}
// link current block to next next block
*(currentBlock + 1) = (int64_t)((int64_t *)*(currentBlock + 1) + 1);
}
// assuming sorting was done correctly, check if addresses are adjacent
if (prevBlock + 1 + *prevBlock == currentBlock) { // if you add one word plus the size of the previous block to get the
// currentBlock
if (currentBlock == 0) throw std::invalid_argument("A likely error occurred. currentBlock was 0 even though it should have been defined.");
*prevBlock += *currentBlock + NODE_SIZE; // add the sizes of both the currentBlock and previous block,
// assuming they aren't null of course.
// so currentBlock->next->size + NODE_SIZE;
// link previous block to next block
*(prevBlock + 1) = (int64_t)(currentBlock + 1);
}
}
Any help as to how to implement these functions/cases to consider that I've missed with code that deals with them would be appreciated. I can also clarify things if necessary.
I tried looking at this website for some help too, but I'm still having issues.
how to implement a memory allocator
At high level, there are essentially two ways to acquire memory for a custom allocator:
Allocate memory using an implementation defined way. The exact details depend on the target system, so first step is to find out what system you are targeting.
Or allocate memory using a standard way (standard allocator, new, malloc, static storage, ...)
Once you've acquired the memory, you need some data structure to keep track of memory that has been allocated through the allocator. You seem to have roughly described the "free list" structure, which is commonly used for this purpose.

Is access by pointer so expensive?

I've a Process() function that is called very heavy within my DLL (VST plugin) loaded in a DAW (Host software), such as:
for (int i = 0; i < nFrames; i++) {
// ...
for (int voiceIndex = 0; voiceIndex < PLUG_VOICES_BUFFER_SIZE; voiceIndex++) {
Voice &voice = pVoiceManager->mVoices[voiceIndex];
if (voice.mIsPlaying) {
for (int envelopeIndex = 0; envelopeIndex < ENVELOPES_CONTAINER_NUM_ENVELOPE_MANAGER; envelopeIndex++) {
Envelope &envelope = pEnvelopeManager[envelopeIndex]->mEnvelope;
envelope.Process(voice);
}
}
}
}
void Envelope::Process(Voice &voice) {
if (mIsEnabled) {
// update value
mValue[voice.mIndex] = (mBlockStartAmp[voice.mIndex] + (mBlockStep[voice.mIndex] * mBlockFraction[voice.mIndex]));
}
else {
mValue[voice.mIndex] = 0.0;
}
}
It basically takes 2% of CPU within the Host (which is nice).
Now, if I slightly change the code to this (which basically are increments and assignment):
void Envelope::Process(Voice &voice) {
if (mIsEnabled) {
// update value
mValue[voice.mIndex] = (mBlockStartAmp[voice.mIndex] + (mBlockStep[voice.mIndex] * mBlockFraction[voice.mIndex]));
// next phase
mBlockStep[voice.mIndex] += mRate;
mStep[voice.mIndex] += mRate;
}
else {
mValue[voice.mIndex] = 0.0;
}
// connectors
mOutputConnector_CV.mPolyValue[voice.mIndex] = mValue[voice.mIndex];
}
CPU go to 6/7% (note, those var don't interact with other part of codes, or at least I think so).
The only reason I can think is that access to pointer is heavy? How can I reduce this amount of CPU?
Those arrays are basic double "pointer" arrays (the most lighter C++ container):
double mValue[PLUG_VOICES_BUFFER_SIZE];
double mBlockStartAmp[PLUG_VOICES_BUFFER_SIZE];
double mBlockFraction[PLUG_VOICES_BUFFER_SIZE];
double mBlockStep[PLUG_VOICES_BUFFER_SIZE];
double mStep[PLUG_VOICES_BUFFER_SIZE];
OutputConnector mOutputConnector_CV;
Any suggestions?
You might be thinking that "pointer arrays" are the lightest containers. but CPU's don't think in terms of containers. They just read and write values through pointers.
The problem here might very well be that you know that two containers do not overlap (there are no "sub-containers"). But the CPU might not be told that by the compiler. Writing to mBlockStep might affect mBlockFraction. The compiler doesn't have run-time values, so it needs to handle the case where it does. This will mean introducing more memory reads, and less caching of values in registers.
Pack all the data items in a structure and create an array of structure. I would simply use a vector.
In Process function get the single element out of this vector, and use its parameters. At the cache-line/instruction level, all items would be (efficiently) brought into local cache (L1), as the data element (members of struct) as contiguous. Use reference or pointer of struct type to avoid copying.
Try to use integer data-types unless double is needed.
EDIT:
struct VoiceInfo
{
double mValue;
...
};
VoiceInfo voices[PLUG_VOICES_BUFFER_SIZE];
// Or vector<VoiceInfo> voices;
...
void Envelope::Process(Voice &voice)
{
// Get the object (by ref/pointer)
VoiceInfo& info = voices[voice.mIndex];
// Work with reference 'info'
...
}

Problems implementing recursive best-first search in C++ based on Korf 1992

I am having two main issues implementing the algorithm described in this article in C++: properly terminating the algorithm and freeing up dynamically allocated memory without running into a seg fault.
Here is the pseudocode provided in the article:
RBFS (node: N, value: V, bound: B)
IF f(N)>B, return f(N)
IF N is a goal, EXIT algorithm
IF N has no children, RETURN infinity
FOR each child Ni of N,
IF f(N) < V AND f(Ni) < V THEN F[i] := V
ELSE F[i] := f(Ni)
sort Ni and F[i] in increasing order of F[i]
IF only one child, F[2] := infinity
WHILE (F[1] <= B)
F[1] := RBFS(N1, F[1], MIN(B, F[2]))
insert N1 and F[1] in sorted order
return F[1]
Here, f(Ni) refers to the "computed" function value, whereas F[i] refers to the currently stored value of f(Ni).
Here is my C++ implementation, in which I had to use a global variable to keep track of whether the goal had been reached or not (note, I am trying to maximize my f(n) value as opposed to minimizing, so I reversed inequalities, orders, min/max values, etc.):
bool goal_found = false;
bool state_cmp(FlowState *lhs, FlowState *rhs)
{
return (lhs->value > rhs->value);
}
int _rbfs(FlowState *state, int value, int bound)
{
if (state->value < bound) // Returning if the state value is less than bound
{
int value = state->value;
delete state;
return value;
}
if (state->is_goal()) // Check if the goal has been reached
{
cout << "Solved the puzzle!" << endl;
goal_found = true; // Modify the global variable to exit the recursion
return state->value;
}
vector<FlowState*> children = state->children();
if (children.empty())
{
//delete state; // Deleting this state seems to result in a corrupted state elsewhere
return INT_MIN;
}
int n = 0; // Count the number of children
for (const auto& child: children)
{
if (state->value < value && child->value < value)
child->value = value;
else
child->update_value(); // Equivalent of setting stored value to static value (F[i] := f(Ni))
++n;
}
sort(children.begin(), children.end(), state_cmp);
while (children.front()->value >= bound && !goal_found)
{// Loop depends on the global goal_found variable since this is where the recursive calls happen
if (children.size() < 2)
children.front()->set_value(_rbfs(children.front(), children.front()->value, bound));
else
children.front()->set_value(_rbfs(children.front(), children.front()->value, max(children[1]->value, bound)));
}
// Free children except the front
int i;
for (i = 1; i < n; ++i)
delete children[i];
state->child = children.front(); // Records the path
return state->child->value;
}
void rbfs(FlowState* initial_state)
{
// This is the actual function I invoke to call the algorithm
_rbfs(initial_state, initial_state->get_value(), INT_MIN);
print_path(initial_state);
}
My main questions are:
Is there a way to terminate this function than having to use a global variable (bool goal_reached) without a complete re-implementation? Recursive algorithms usually have some kind of base-case to terminate the function, but I am not seeing an obvious way of doing that.
I can't seem to delete the dead-end state (when the state has no children) without running into a segmentation fault, but not deleting it results in unfreed memory (each state object was dynamically allocated). How can I modify this code to ensure that I've freed all of the states that pass through it?
I ran the program with gdb to see what was going on, and it appears that after deleting the dead-end state, the next state that is recursively called is not actually NULL, but appears to be corrupted. It has an address, but the data it contains is all junk. Not deleting that node lets the program terminate just fine, but then many states aren't getting freed. In addition, I had originally used the classical, iterative best-first search (but it takes up far too much memory for my case, and is much slower), and in that case, all dynamically allocated states were properly freed so the issue is in this code somewhere (and yes, I am freeing each of the states on the path in main() after calling rbfs).
In your code, you have
children.front()->set_value(_rbfs(children.front(), ...
where state inside of _rbfs is thus children.front().
And in _rbfs, you sometimes delete state. So children.front() can be deleted and then called with ->set_value. There's your problem.
Is there any reason why you calling delete at all?

Debug assertion failed: Subscript out of range with std::vector

I'm trying to fix this problem which seems like I am accessing at an out of range index, but VS fails to stop where the error occurred leaving me confused about what's causing this.
The Error:
Debug Assertion Failed! Program: .... File: c:\program files\microsoft visual studio 10.0\vc\include\vector Line: 1440 Expression: String subscript out of range
What the program does:
There are two threads:
Thread 1:
The first thread looks (amongst other things) for changes in the current window using GetForegroundWindow(), the check happens not on a loop but when a WH_MOUSE_LL event is triggered. The data is split into structs of fixed size so that it can be sent to a server over tcp. The first thread and records the data (Window Title) into an std::list in the current struct.
if(change_in_window)
{
GetWindowTextW(hActWin,wTitle,256);
std::wstring title(wTitle);
current_struct->titles.push_back(title);
}
Thread 2:
The second thread is called looks for structs not send yet, and it puts their content into char buffers so that they can be sent over tcp. While I do not know exactly where the error is, looking from the type of error it was to do either with a string or a list, and this is the only code from my whole application using lists/strings (rest are conventional arrays). Also commenting the if block as mentioned in the code comments stops the error from happening.
BOOL SendStruct(DATABLOCK data_block,bool sycn)
{
[..]
int _size = 0;
// Important note, when this if block is commented the error ceases to exist, so it has something to do with the following block
if(!data_block.titles.empty()) //check if std::list is empty
{
for (std::list<std::wstring>::iterator itr = data_block.titles.begin(); itr != data_block.titles.end() ; itr++) {
_size += (((*itr).size()+1) * 2);
} //calculate size required. Note the +1 is for an extra character between every title
wchar_t* wnd_wbuffer = new wchar_t[_size/2](); //allocate space
int _last = 0;
//loop through every string and every char of a string and write them down
for (std::list<std::wstring>::iterator itr = data_block.titles.begin(); itr != data_block.titles.end(); itr++)
{
for(unsigned int i = 0; i <= (itr->size()-1); i++)
{
wnd_wbuffer[i+_last] = (*itr)[i] ;
}
wnd_wbuffer[_last+itr->size()] = 0x00A6; // separator
_last += itr->size()+1;
}
unsigned char* wnd_buffer = new unsigned char[_size];
wnd_buffer = (unsigned char*)wnd_wbuffer;
h_io->header_w_size = _size;
h_io->header_io_wnd = 1;
Connect(mode,*header,conn,buffer_in_bytes,wnd_buffer,_size);
delete wnd_wbuffer;
}
else
[..]
return true;
}
My attempt at thread synchronization:
There is a pointer to the first data_block created (db_main)
pointer to the current data_block (db_cur)
//datablock format
typedef struct _DATABLOCK
{
[..]
int logs[512];
std::list<std::wstring> titles;
bool bPrsd; // has this datablock been sent true/false
bool bFull; // is logs[512] full true/false
[..]
struct _DATABLOCK *next;
} DATABLOCK;
//This is what thread 1 does when it needs to register a mouse press and it is called like this:
if(change_in_window)
{
GetWindowTextW(hActWin,wTitle,256);
std::wstring title(wTitle);
current_struct->titles.push_back(title);
}
RegisterMousePress(args);
[..]
//pseudo-code to simplify things , although original function does the exact same thing.
RegisterMousePress()
{
if(it_is_full)
{
db_cur->bFull= true;
if(does db_main exist)
{
db_main = new DATABLOCK;
db_main = db_cur;
db_main->next = NULL;
}
else
{
db_cur->next = new DATABLOCK;
db_cur = db_cur->next;
db_cur->next = NULL;
}
SetEvent(eProcessed); //tell thread 2 there is at least one datablock ready
}
else
{
write_to_it();
}
}
//this is actual code and entry point of thread 2 and my attempy at synchronization
DWORD WINAPI InitQueueThread(void* Param)
{
DWORD rc;
DATABLOCK* k;
SockWClient writer;
k = db_main;
while(true)
{
rc=WaitForSingleObject(eProcessed,INFINITE);
if (rc== WAIT_OBJECT_0)
{
do
{
if(k->bPrsd)
{
continue;
}
else
{
if(!k)
{break;}
k->bPrsd = TRUE;
#ifdef DEBUG_NET
SendStruct(...);
#endif
}
if(k->next == NULL || k->next->bPrsd ==TRUE || !(k->next->bFull))
{
ResetEvent(eProcessed);
break;
}
} while (k = k->next); // next element after each loop
}
}
return 1;
}
Details:
Now something makes me believe that the error is not in there, because the substring error is very rare. I have been only able to reproduce it with 100% chance when pressing Mouse_Down+Wnd+Tab to scroll through windows and keeping it pressed for some time (while it certainly happened on other cases as well). I avoid posting the whole code because it's a bit large and confusion is unavoidable. If the error is not here I will edit the post and add more code.
Thanks in advance
There does not appear to be any thread synchronization here. If one thread reads from the structure while the other writes, it might be read during initialization, with a non-empty list containing an empty string (or something invalid, in between).
If there isn't a mutex or semaphore outside the posted function, that is likely the problem.
All the size calculations appear to be valid for Windows, although I didn't attempt to run it… and <= … -1 instead of < in i <= (itr->size()-1) and 2 instead of sizeof (wchar_t) in new wchar_t[_size/2](); are a bit odd.
The problem with your code is that while thread 2 correctly waits for the data and thread 1 correctly notifies about them, thread 2 doesn't prevent thread 1 from doing anything with them under its hands while it still process the data. The typical device used to solve such problem is the monitor pattern.
It consist of one mutex (used to protect the data, held anytime you access them) and a condition variable (=Event in Windows terms), which will convey the information about new data to the consumer.
The producer would normally obtain the mutex, produce the data, release the mutex, then fire the event.
The consumer is more tricky - it has to obtain the mutex, check if new data hasn't become available, then wait for the Event using the SignalObjectAndWait function that temporarily releases the mutex, then process newly acquired data, then release the mutex.