I am trying to process linked list data in parallel with OpenMP in C++. I'm pretty new to OpenMP and pretty rusty with C++. What I want to do is get several threads to break up the linked list, and output the data of the Nodes in their particular range. I don't care about the order in which the output occurs. If I can get this working, I want to replace the simple output with some actual processing of the Node data.
I've found several things on the internet (including a few questions on this site) and from what I found, I cobbled together a code like this:
#include <iostream>
#include <omp.h>
// various and sundry other stuff ...
struct Node {
int data;
Node* next;
};
int main() {
struct Node *newHead;
struct Node *head = new Node;
struct Node *currNode;
int n;
int tid;
//create a bunch of Nodes in linked list with "data" ...
// traverse the linked list:
// examine data
#pragma omp parallel private(tid)
{
currNode = head;
tid=omp_get_thread_num();
#pragma omp single
{
while (currNode) {
#pragma omp task firstprivate(currNode)
{
cout << "Node data: " << currNode->data << " " << tid << "\n";
} // end of pragma omp task
currNode = currNode->next;
} // end of while
} //end of pragma omp single
} // end of pragma omp parallel
// clean up etc. ...
} // end of main
So I run:
>: export OMP_NUM_THREADS=6
>: g++ -fopenmp ll_code.cpp
>: ./a.out
And the output is:
Node data: 5 0
Node data: 10 0
Node data: 20 0
Node data: 30 0
Node data: 35 0
Node data: 40 0
Node data: 45 0
Node data: 50 0
Node data: 55 0
Node data: 60 0
Node data: 65 0
Node data: 70 0
Node data: 75 0
So, tid is always 0. And that means, unless I'm really misunderstanding something, only one thread did anything with the linked list, and so the linked list was not traversed in parallel at all.
When I get rid of single, the code fails with a seg fault. I have tried moving a few variables in and out of the OpenMP directive scopes, with no change. Changing the number of threads has no effect. How can this be made to work?
A secondary question: Some sites say the firstprivate(currNode) is necessary and others say currNode is firstprivate by default. Who is right?
You certainly can traverse a linked list using multiple threads, but it will be actually slower than just using a single thread.
The reason is that, to know the address of a node N != 0, you must know the address of node N-1.
Assume now that you have N threads, each responsible for "starting at i position". The above paragraph implies that a thread i will depend on the result of thread i-1, which in turn will depend on the result of thread i-2, and so on.
What you end up with is a serial traversal anyway. But now, instead of just a simple for, you have to synchronize threads too, making things inherently slower.
But, if you're trying to do some heavy processing that would benefit from being run in parallel, then yes, you're going for the right approach. You can just change how you're getting the thread id:
#include <iostream>
#include <omp.h>
struct Node {
int data;
Node* next;
};
int main() {
struct Node *head = new Node;
struct Node *currNode = head;
head->data = 0;
for (int i=1;i<10;++i) {
currNode->next = new Node;
currNode = currNode->next;
currNode->data = i;
}
// traverse the linked list:
// examine data
#pragma omp parallel
{
currNode = head;
#pragma omp single
{
while (currNode) {
#pragma omp task firstprivate(currNode)
{
#pragma omp critical (cout)
std::cout << "Node data: " << currNode->data << " " << omp_get_thread_num() << "\n";
}
currNode = currNode->next;
}
}
}
}
Possible output:
Node data: 0 4
Node data: 6 4
Node data: 7 4
Node data: 8 4
Node data: 9 4
Node data: 1 3
Node data: 2 5
Node data: 3 2
Node data: 4 1
Node data: 5 0
See it live!
Finally, for a more idiomatic approach, consider using a std::forward_list:
#include <forward_list>
#include <iostream>
#include <omp.h>
int main() {
std::forward_list<int> list;
for (int i=0;i<10;++i) list.push_front(i);
#pragma omp parallel
#pragma omp single
for(auto data : list) {
#pragma omp task firstprivate(data)
#pragma omp critical (cout)
std::cout << "Node data: " << data << " " << omp_get_thread_num() << "\n";
}
}
Related
I am trying to build a huge binary search tree:
class Node
{
public:
int value;
shared_ptr<Node> left;
Node* right;
Node(int v):value(v){}
void addLeft(){
static int i;
shared_ptr<Node> node=make_shared<Node>(i);
left=node;
cout<<i++<<endl;
if(i<60000)
node->addLeft();
}
};
int main(){
shared_ptr<Node>root=make_shared<Node>(9);
root->addLeft();
return 0;
}
I get a seg fault over running this code, in valgrind I have this report:
==17373== Stack overflow in thread #1: can't grow stack to 0xffe801000
Any clue on how to build the BST without overflowing the RAM space?
Any help is much appreciated
Exceeding the stack is not the same as exceeding your RAM. Function calls accumulate on the stack, the problem is you are trying to place 60000 function calls and variables on the stack. Convert your function to a loop and you will be fine. It will even get rid of that terrible static int i.
Here is a version of your function using a for loop with no recursion.
void addLeft()
{
left = std::make_shared<Node>(0);
// tail is the last element to have been added to the tree
std::shared_ptr<Node> tail = left;
std::cout << 0 << std::endl;
// Add nodes from 1 to 60000 inclusively
for (int i = 1; i <= 60000; ++i)
{
std::cout << i << std::endl;
tail->left = std::make_shared<Node>(i);
tail = tail->left;
}
}
I am learning OpenMP parallel processing library in C++. I felt that I got the basic concepts, and try to test my knowledge by implementing a linked list queue. I wanted to consume the queue from multiple threads.
The challenge here is that not to consume the same node twice. So I was considering sharing the queue between threads but allowing only a single thread to update(go to the next node in the queue) it at a time. For this purpose, I could use critical or lock. However, without using them; somehow, it seems to be working perfectly. No race-condition has occurred.
#include <iostream>
#include <omp.h>
#include <zconf.h>
struct Node {
int data;
struct Node* next = NULL;
Node() {}
Node(int data) {
this->data = data;
}
Node(int data, Node* node) {
this->data = data;
this->next = node;
}
};
void processNode(Node *pNode);
struct Queue {
Node *head = NULL, *tail = NULL;
Queue& add(int data) {
add(new Node(data));
return *this;
}
void add(Node *node) {
if (head == NULL) {
head = node;
tail = node;
} else {
tail->next = node;
tail = node;
}
}
Node* remove() {
Node *node;
node = head;
if (head != NULL)
head = head->next;
return node;
}
};
int main() {
srand(12);
Queue queue;
for (int i = 0; i < 6; ++i) {
queue.add(i);
}
double timer_started = omp_get_wtime();
omp_set_num_threads(3);
#pragma omp parallel
{
Node *n;
while ((n = queue.remove()) != NULL) {
double started = omp_get_wtime();
processNode(n);
double elapsed = omp_get_wtime() - started;
printf("Thread id: %d data: %d, took: %f \n", omp_get_thread_num(), n->data, elapsed);
}
}
double elapsed = omp_get_wtime() - timer_started;
std::cout << "end. took " << elapsed << " in total " << std::endl;
return 0;
}
void processNode(Node *node) {
int r = rand() % 3 + 1; // between 1 and 3
sleep(r);
}
Output looks like this:
Thread id: 0 data: 0, took: 1.000136
Thread id: 2 data: 2, took: 1.000127
Thread id: 2 data: 4, took: 1.000208
Thread id: 1 data: 1, took: 3.001371
Thread id: 0 data: 3, took: 2.001041
Thread id: 2 data: 5, took: 2.004960
end. took 4.00583 in total
I've run this with a different number of threads and many times. But, I couldn't get any race condition or something wrong. I was thinking it was possible for two different threads to invoke 'remove' and process a single node twice. But it did not happen. Why?
https://github.com/muatik/openmp-examples/blob/master/linkedlist/main.cpp
First and foremost, you can never prove multi-threaded code to be correct through testing. Your hunch, that you need a lock / critical section is correct.
Your test is particularly easy on the queue. The following breaks your queue quickly:
for (int i = 0; i < 10000; ++i) {
queue.add(i);
}
double timer_started = omp_get_wtime();
#pragma omp parallel
{
size_t counter = 0;
Node *n;
while ((n = queue.remove()) != NULL) {
processNode(n);
counter++;
}
#pragma omp critical
std::cout << "Thread " << omp_get_thread_num() << " processed " << counter << " nodes." << std::endl;
}
void processNode(Node *node) {}
Show for example the following interesting result:
Thread 1 processed 11133 nodes.
Thread 0 processed 9039 nodes.
But again, if you made a queue that runs a million times correctly with this test code, doesn't mean the queue is implemented correctly.
In particular, it is not sufficient to just protect remove, you must properly protect each and every read and write to the queue data. To get an idea of the difficulty to get this right, watch this excellent talk by Herb Sutter.
Generally, I recommend to use an existing parallel data structure, for example from Boost.Lockfree.
However, unfortunately OpenMP and C++11 lock / atomic primitives don't officially play well together. So strictly speaking, if you use OpenMP, you should stick to OpenMP synchronization primitives or libraries that use them.
I am a fairly experience C# programmer and trying to help out a friend with a C++ application that creates a Stack object. It has been well over 13 years since I've even seen C++ and I am having a damn fine time trying to recall the proper way to do this. It took me a bit to get up to speed on the Header/CPP distinction again, so there may be issues in there even. Here is my problem:
//Stack.h
#ifndef __STACK_INCLUDED__
#define __STACK_INCLUDED__
#include "Node.h"
class Stack
{
private:
/// Going to be the pointer to our top node
Node* m_topNode;
/// Running count of elements
int m_count;
public:
///Constructor
Stack();
///Allows us to retrieve the top value from the stack
/// and remove it from the stack
int Pop();
.
.
.
};
#endif
Below is the CPP that matches the header. I am doing in here JUST for debugging at the moment. I am also fully qualifying everything because I was not sure if that is causing issues with the pointers and loss of references.
//Stack.cpp
#include "stdafx.h"
#include "Stack.h"
#include <iostream>
Stack::Stack(){
m_count = 0;
m_topNode = NULL;
}
void Stack::Push(int Value){
std::cout << "\nPushing Value: ";
std::cout << Value;
std::cout << "\n";
if ( Stack::m_topNode )
{
std::cout << "TopNode Value: ";
std::cout << Stack::m_topNode->data;
std::cout << "\n";
}
std::cout << "\n";
Node newNode(Value, NULL, Stack::m_topNode);
Stack::m_topNode = &newNode;
Stack::m_count++;
}
The node class is a pretty simple entity. Just needs to store a value and the pointers on either side. I know I don't need to track in both directions for a Stack but I wanted to make this something that was easily changed to a Queue or similar construct.
//Node.h
#ifndef __NODE_INCLUDED__
#define __NODE_INCLUDED__
class Node
{
private:
public:
///Constructor allows us to specify all values.
/// In a stack I expect NextNode to be NULL
Node(int Value,Node* NextNode, Node* PreviousNode);
///Pointer to the next node
Node* Next;
///Pointer to the previous node
Node* Prev;
///Value to be stored
int data;
};
#endif
Very simple implementation:
//Node.cpp
#include "stdafx.h"
#include "Node.h"
Node::Node(int Value, Node* NextNode, Node* PreviousNode){
data = Value;
Next = NextNode;
Prev = PreviousNode;
}
My main is just about sending 2 values to the stack right now via Push and seeing what the values are printing:
#include "stdafx.h"
#include "Node.h"
#include "Stack.h"
using namespace std;
int main(){
Stack s = Stack();
for ( int i = 0; i < 2; i++ ){
s.Push(i * 10);
}
int blah;
cin >> blah; //Stall screen
return 0;
}
Here is the Output:
Pushing Value: 0
<blank line>
Pushing Value: 10
TopNode Value: -858993460
When I hit Node newNode(Value, NULL, Stack::m_topNode) in the debugger I can see it tracking the proper value in the current node, but m_topNode references a really odd value. I'm hoping it's very obvious that I'm doing something dumb as I don't remember this being this tricky when I did it years ago. Appreciate any help/insight to my incorrect manners.
Node newNode(Value, NULL, Stack::m_topNode);
Stack::m_topNode = &newNode;
Stack::m_count++;
This is your problem. You allocate the new node on the current stack, and then put the pointer into the linked list of nodes. This pointer will be invalid as soon as your stack frame returns, and all hell breaks lose. ;)
You need to allocate the node with new.
As stated by Norwæ, you need to allocate your newNode with "new" because if you dont, your newNode is static and will be out of scope at the end of the Push function.
You also need to call your private members without the "Stack::" as this is used in C++ only to access static class members and functions. replace "Stack::m_topNode" for "m_topNode" only, and Stack::m_count for m_count.
Here is a working Push function :
void Stack::Push(int Value){
std::cout << "\nPushing Value: ";
std::cout << Value;
std::cout << "\n";
if ( m_topNode )
{
std::cout << "TopNode Value: ";
std::cout << m_topNode->data;
std::cout << "\n";
}
std::cout << "\n";
Node * newNode = new Node(Value, NULL, m_topNode);
m_topNode = newNode;
m_count++;
}
This line:
std::cout << Stack::m_topNode->data;
happens before
Node newNode(Value, NULL, Stack::m_topNode);
Stack::m_topNode = &newNode;
Stack::m_count++;
So you're trying to print an uninitialized value. Reverse these and see what happens.
Suppose I have a set of 10000 points and they randomly connected to each other. For example let's take 10 points. And they connected like the picture-
Definition of Similar Points:
The points that has same number of links are called similar points. From the picture we can see-
Node 1 is connected with node [2] and [10]
Node 2 is connected with node [1},[3],[4],[5],[6],[7],[8]
Node 3 is connected with only node [2]
Node 4 is connected with only node [2]
Node 5 is connected with only node [2]
Node 6 is connected with only node [2]
Node 7 is connected with only node [2]
Node 8 is connected with node [2] and [9]
Node 9 is connected with only node [8]
Node 10 is connected with only node [1)
So according to the definition, Node- 3,4,5,6,7,9,10 are similar because each of them has only one link.
Again Node- 1 & 8 are similar because each of them has two links.
My Problem
Now I want to calculate the sum of the links of similar points. For example-
Node 1 has 8 are similar.
For node 1:
It is connected to Node 2 (which has 7 links)
And also connected to Node 10 (which has 1 link )
For node 8:
It is connected to Node 2 (which has 7 links)
And also connected to Node 9 (which has 1 link )
So for the group with two links, the number of total links should be= 7+1+7+1 =16.
Like this way I would like to calculate the total links for other similar points.
My Code
Here is my code. It gives the result for the total links for each of the points.
#include <cstdlib>
#include <cmath>
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
struct Node {
vector< int > links_to;
Node(void){};
Node(int first_link){
links_to.push_back(first_link);
};
};
class Links : public vector<Node> {
public:
void CreateLinks(int n,int m);
void OutputNodes();
};
int RandGenerate(int max) {
return int(drand48()*double(max));
}
void CreateRandom(int *nums,int m,int max) {
bool clear;
for(int i=0;i<m;i++) {
clear=true;
while(clear) {
clear=false;
nums[i]=RandGenerate(max);
for(int j=0;j<i;j++) {
if(nums[i]==nums[j]){
clear=true;break;
}
}
}
}
}
void Links::CreateLinks(int n,int m) {
clear();
for(int i=0;i<m;i++) {
push_back(Node());
}
int edge_targets[m],nums[m];
for(int i=0;i<m;i++) {
edge_targets[i]=i;
}
vector<int> repeated_nodes;
int source=m;
while(source<n) {
push_back(Node());
Node &node=*(end()-1);
for(int i=0;i<m;i++) {
node.links_to.push_back(edge_targets[i]);
at(edge_targets[i]).links_to.push_back(source);
repeated_nodes.push_back(edge_targets[i]);
repeated_nodes.push_back(source);
}
CreateRandom(nums,m,repeated_nodes.size());
for(int i=0;i<m;i++) {
edge_targets[i]=repeated_nodes[nums[i]];
}
source++;
}
}
void Links::OutputNodes() {
for(int i=0;i<size();i++){
cout<<endl;
for(int j=0;j<at(i).links_to.size();j++){
cout<<"Node "<<(i+1)<<" is connected with ["<<(at(i).links_to[j]+1)<<"]"<<endl;
}
cout<<"For Node: "<<(i+1)<<"\t"<<"Total links: "<<at(i).links_to.size()<<endl;
}
}
int main() {
srand48(46574621);
Links network;
network.CreateLinks(10,1); //(nodes,minimum value of link)
network.OutputNodes();
return 0;
}
Which generate the result like this-
Node 1 is connected with [2]
Node 1 is connected with [10]
For Node: 1 Total links: 2
Node 2 is connected with [1]
Node 2 is connected with [3]
Node 2 is connected with [4]
Node 2 is connected with [5]
Node 2 is connected with [6]
Node 2 is connected with [7]
Node 2 is connected with [8]
For Node: 2 Total links: 7
Node 3 is connected with [2]
For Node: 3 Total links: 1
Node 4 is connected with [2]
For Node: 4 Total links: 1 ... etc
I would like to add a function so that it groups the similar points and gives the output of the total links for each groups. How can I do that?
Updated in response to the answer of Pixelchemist
Let's say I store the data in a file name "MyLinks.txt" like this-
1 2
1 10
2 1
2 3
2 4
2 5
2 6
2 7
2 8...etc
And get the input from the file. Here is the code-
int main (void)
{
ifstream inputFile("MyLinks.txt");
double Temp[2];
Links links_object;
while (true) {
for (unsigned i = 0; i < 2; i++){
inputFile>>Temp[i];
}
for (size_t i(0u); i<10; ++i)
{
links_object.add(Node());
}
links_object.link_nodes(Temp[0], Temp[1]);
/*
links_object.link_nodes(0u, 9u);
links_object.link_nodes(1u, 2u);
links_object.link_nodes(1u, 3u);
links_object.link_nodes(1u, 4u);
links_object.link_nodes(1u, 5u);
links_object.link_nodes(1u, 6u);
links_object.link_nodes(1u, 7u);
links_object.link_nodes(7u, 8u);
*/
}
std::vector<size_t> linksum;
for (auto const & node : links_object.nodes())
{
size_t const linksum_index(node.links().size()-1u);
if (linksum.size() < node.links().size())
{
size_t const nls(node.links().size());
for (size_t i(linksum.size()); i<nls; ++i)
{
linksum.push_back(0u);
}
}
for (auto linked : node.links())
{
linksum[linksum_index] += linked->links().size();
}
}
for (size_t i(0u); i<linksum.size(); ++i)
{
std::cout << "Sum of secondary links with " << i+1;
std::cout << "-link nodes is: " << linksum[i] << std::endl;
}
}
Updated my code,store the results of 'connection' in a text file and trying to get the values from that. But now it gives me the segmentation fault. How can I fix it?
I would use a map. The number of links would be the key and its value would be a vector containing the IDs of nodes with that number of links.
typedef std::map<size_t,std::vector<size_t> SimilarNodeMap;
SimilarNodeMap myMap;
... // fill up the map
for (SimilarNodeMap::iterator it=mymap.begin(); it!=mymap.end(); ++it)
{
std::cout << "Nodes with " it->first << " links: ";
for ( size_t i = 0; i < second->size(); ++i )
{
std::cout << second->at(i) << std::endl;
}
}
You can go through the nodes that are part of the "pair" and put them into a list. If there is an element you are trying to add that's already in the list don't add it.(e.x if statement check) Then after going through all the elements check the list size and that should be your links.
Correct me if this isn't what you are asking.
I'm sure there is a better way to do this. The complexity of this is O(n^2) time i believe.
I'd use a std::vector<size_t> where the index of the vector is the number of links of the respective node type.
You iterate over all of your nodes and increment the std::vector<size_t>-entry corresponding to the number of links of this node with the number of links of all nodes that are linked to the current one.
This code:
#include <vector>
#include <stdexcept>
class Node
{
std::vector< Node const * > m_links;
public:
Node(void) { }
void link_to (Node const &n)
{
m_links.push_back(&n);
}
std::vector< Node const * > const & links (void) const
{
return m_links;
}
};
class Links
{
std::vector<Node> m_nodes;
public:
void add (Node const &node) { m_nodes.push_back(node); }
void link_nodes (size_t node_a, size_t node_b)
{
size_t ns(m_nodes.size());
if (node_a >= ns || node_b >= ns)
{
throw std::logic_error("Requested invalid link.");
}
m_nodes[node_a].link_to(m_nodes[node_b]);
m_nodes[node_b].link_to(m_nodes[node_a]);
}
std::vector<Node> const & nodes (void) const
{
return m_nodes;
}
};
int main (void)
{
Links links_object;
for (size_t i(0u); i<10; ++i)
{
links_object.add(Node());
}
links_object.link_nodes(0u, 1u);
links_object.link_nodes(0u, 9u);
links_object.link_nodes(1u, 2u);
links_object.link_nodes(1u, 3u);
links_object.link_nodes(1u, 4u);
links_object.link_nodes(1u, 5u);
links_object.link_nodes(1u, 6u);
links_object.link_nodes(1u, 7u);
links_object.link_nodes(7u, 8u);
std::vector<size_t> linksum;
for (auto const & node : links_object.nodes())
{
size_t const linksum_index(node.links().size()-1u);
if (linksum.size() < node.links().size())
{
size_t const nls(node.links().size());
for (size_t i(linksum.size()); i<nls; ++i)
{
linksum.push_back(0u);
}
}
for (auto linked : node.links())
{
linksum[linksum_index] += linked->links().size();
}
}
for (size_t i(0u); i<linksum.size(); ++i)
{
std::cout << "Sum of secondary links with " << i+1;
std::cout << "-link nodes is: " << linksum[i] << std::endl;
}
}
Prints:
Sum of secondary links with 1-link nodes is: 39
Sum of secondary links with 2-link nodes is: 16
Sum of secondary links with 3-link nodes is: 0
Sum of secondary links with 4-link nodes is: 0
Sum of secondary links with 5-link nodes is: 0
Sum of secondary links with 6-link nodes is: 0
Sum of secondary links with 7-link nodes is: 9
You should get the idea.
You might iterate over all nodes and count.
Pseudo code:
std::map<std::size_t, std::size_t> counter;
for each node
++counter[node.links().size]
I'm trying to implement a high performance blocking queue backed by a circular buffer on top of pthreads, semaphore.h and gcc atomic builtins. The queue needs to handle multiple simulataneous readers and writers from different threads.
I've isolated some sort of race condition, and I'm not sure if it's a faulty assumption about the behavior of some of the atomic operations and semaphores, or whether my design is fundamentally flawed.
I've extracted and simplified it to the below standalone example. I would expect that this program never returns. It does however return after a few hundred thousand iterations with corruption detected in the queue.
In the below example (for exposition) it doesn't actually store anything, it just sets to 1 a cell that would hold the actual data, and 0 to represent an empty cell. There is a counting semaphore (vacancies) representing the number of vacant cells, and another counting semaphore (occupants) representing the number of occupied cells.
Writers do the following:
decrement vacancies
atomically get next head index (mod queue size)
write to it
increment occupants
Readers do the opposite:
decrement occupants
atomically get next tail index (mod queue size)
read from it
increment vacancies
I would expect that given the above, precisely one thread can be reading or writing any given cell at one time.
Any ideas about why it doesn't work or debugging strategies appreciated. Code and output below...
#include <stdlib.h>
#include <semaphore.h>
#include <iostream>
using namespace std;
#define QUEUE_CAPACITY 8 // must be power of 2
#define NUM_THREADS 2
struct CountingSemaphore
{
sem_t m;
CountingSemaphore(unsigned int initial) { sem_init(&m, 0, initial); }
void post() { sem_post(&m); }
void wait() { sem_wait(&m); }
~CountingSemaphore() { sem_destroy(&m); }
};
struct BlockingQueue
{
unsigned int head; // (head % capacity) is next head position
unsigned int tail; // (tail % capacity) is next tail position
CountingSemaphore vacancies; // how many cells are vacant
CountingSemaphore occupants; // how many cells are occupied
int cell[QUEUE_CAPACITY];
// (cell[x] == 1) means occupied
// (cell[x] == 0) means vacant
BlockingQueue() :
head(0),
tail(0),
vacancies(QUEUE_CAPACITY),
occupants(0)
{
for (size_t i = 0; i < QUEUE_CAPACITY; i++)
cell[i] = 0;
}
// put an item in the queue
void put()
{
vacancies.wait();
// atomic post increment
set(__sync_fetch_and_add(&head, 1) % QUEUE_CAPACITY);
occupants.post();
}
// take an item from the queue
void take()
{
occupants.wait();
// atomic post increment
get(__sync_fetch_and_add(&tail, 1) % QUEUE_CAPACITY);
vacancies.post();
}
// set cell i
void set(unsigned int i)
{
// atomic compare and assign
if (!__sync_bool_compare_and_swap(&cell[i], 0, 1))
{
corrupt("set", i);
exit(-1);
}
}
// get cell i
void get(unsigned int i)
{
// atomic compare and assign
if (!__sync_bool_compare_and_swap(&cell[i], 1, 0))
{
corrupt("get", i);
exit(-1);
}
}
// corruption detected
void corrupt(const char* action, unsigned int i)
{
static CountingSemaphore sem(1);
sem.wait();
cerr << "corruption detected" << endl;
cerr << "action = " << action << endl;
cerr << "i = " << i << endl;
cerr << "head = " << head << endl;
cerr << "tail = " << tail << endl;
for (unsigned int j = 0; j < QUEUE_CAPACITY; j++)
cerr << "cell[" << j << "] = " << cell[j] << endl;
}
};
BlockingQueue q;
// keep posting to the queue forever
void* Source(void*)
{
while (true)
q.put();
return 0;
}
// keep taking from the queue forever
void* Sink(void*)
{
while (true)
q.take();
return 0;
}
int main()
{
pthread_t id;
// start some pthreads to run Source function
for (int i = 0; i < NUM_THREADS; i++)
if (pthread_create(&id, NULL, &Source, 0))
abort();
// start some pthreads to run Sink function
for (int i = 0; i < NUM_THREADS; i++)
if (pthread_create(&id, NULL, &Sink, 0))
abort();
while (true);
}
Compile the above as follows:
$ g++ -pthread AboveCode.cpp
$ ./a.out
The output is different every time, but here is one example:
corruption detected
action = get
i = 6
head = 122685
tail = 122685
cell[0] = 0
cell[1] = 0
cell[2] = 1
cell[3] = 0
cell[4] = 1
cell[5] = 0
cell[6] = 1
cell[7] = 1
My system is Ubuntu 11.10 on Intel Core 2:
$ uname -a
Linux 3.0.0-14-generic #23-Ubuntu SMP \
Mon Nov 21 20:28:43 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo | grep Intel
model name : Intel(R) Core(TM)2 Quad CPU Q9300 # 2.50GHz
$ g++ --version
g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
Thanks,
Andrew.
One of possible situations, traced step by step for two writer threads (W0, W1) and one reader thread (R0). W0 entered put() earlier than W1, was interrupted by OS or hardware and finished later.
w0 (core 0) w1 (core 1) r0
t0 ---- --- blocked on occupants.wait() / take
t1 entered put() --- ---
t2 vacancies.wait() entered put() ---
t3 got new_head = 1 vacancies.wait() ---
t4 <interrupted by OS> got new_head = 2 ---
t5 written 1 at cell[2] ---
t6 occupants.post(); ---
t7 exited put() waked up
t8 --- got new_tail = 1
t9 <still in interrupt> --- read 0 from ceil[1] !! corruption !!
t10 written 1 at cell[1]
t11 occupants.post();
t12 exited put()
From a design point of view, I would consider the whole queue as a shared resource and protect it with a single mutex.
Writers do the following:
take the mutex
write to the queue (including handling of indexes)
free the mutex
Readers do the following:
take the mutex
read from the queue (including handling of indexes)
free the mutex
I have a theory. It's a circular queue so one reading thread may be getting lapped. Say a reader takes index 0. Before it does anything it loses the CPU. Another reader thread takes index 1, then 2, then 3 ... then 7, then 0. The first reader wakes up and both threads think they have exclusive access to index 0. Not sure how to prove it. Hope that helps.