C++: divide a loop into several threads - c++

First at all I'm a complete beginner in C++, that's why I apologize if this question may be stupid (or may not have any sense) but I have absolutely no clues about what can I do in my situation.
So, I've been trying to learn about multi-threading lately because I thought it'd be a way much better to use threads instead of a simple loop running across a whole big content (which is the content of a file) to gain actually more speed.
Here's the code I actually have (not complete but it doesn't matters in this case):
int end = 40000;
std::string content; // this variable is filled before calling the function "another_function", don't mind it
// this function is completely useless, this is just for testing purposes
void dummy_function(int *idx, int value_to_divide) {
std::list<int> test;
for (; *idx <= ending / div; *idx++) {
int count = 100 + *i;
test.insert(test.end(), { 1, 2, 3, count });
std::cout << content.c_str() + *i << endl;
test.clear();
}
}
void another_function() {
int idx = 0;
std::vector<std::thread> th;
for (int j = 4; j != 0; j--) {
th.push_back(std::thread(&dummy_function, &idx + ((j != 4) ? (end / (j + 1)) : 0), j));
}
for (auto& thread:th)
thread.join();
}
How I see what I did is that I divide the reading of the content variable into 4 several threads (to make them stopping at the same time as they should have the same length), the first one starting at 0 (beginning), the 2nd one is starting at (i + (end / 3)),
et cetera...
But, it does segfault when the 1st thread stop and in fact the other threads are not even starting where I thought they would start, so maybe I didn't even understand the concept of threading in first place (as I said I'm a beginner in this lol).
I've heard about something called "Safe Queue" or "Safe Threading" (using mutex and stuff related to) but it seems that I didn't understand how to use it in my case.
Is someone able to explain how I could make these threads running in parallel and "safely" (so)?
Thank you :)

This isn't a threading problem. It's far more basic than that.
Here's your code:
void another_function() {
int idx = 0;
std::vector<std::thread> th;
for (int j = 4; j != 0; j--) {
th.push_back(std::thread(&dummy_function, &idx + ((j != 4) ? (end / (j + 1)) : 0), j));
}
for (auto& thread:th)
thread.join();
}
idx is a single integer on the stack. You're passing the address of idx plus some offset that you're calculating. And then your function assumes that's a good address.
But it's NOT a good address. It points out to who knows what. You don't have 40000 integers allocated anywhere. You have exactly 1.
Now, I'm not sure what else you're trying to do, but you could change idx slightly:
int idx[40000];
And then I wouldn't do the for-loop quite the same. Instead:
for (int j = 4; j != 0; j--) {
int offset = j != 4 ? end / (j + 1) : 0;
th.push_back(std::thread(&dummy_function, &idx[offset], j));
}
Or something like that. I don't think the method you're calling is going to be happy, though, and you would do better to instead pass in the entire array and then a start + length or start + stop values.
But your main problem is because you're running rampant over random memory.

Related

Array index loop to begin instead of mem access error

I'm currently developping a vision system which is controlled/moonitored via a computer application that I'm writting in C/C++. If you are wondering why I,m writting in C/C++ its basically that my vision sensor (LMI Gocator 2490) has functions developped in C and that I'm using opencv (C++) to analyse its data. Long story short, I've had 1 informatics course which was in C and I'm bad at C++.
I'm wondering if there is a simple way, using or writting a function for example, to retrieve an array index that loops back to its begining instead of trying to read write in an index that isn't part of the array which also causes mem access violation. For example:
long a[4];
for (size_t i = 0; i < sizeof(a) / sizeof(long); i++)
{
if (a[i] && a[i + 1]) //Define some conditions regarding a[i] and a[i+1]
{
//Process code
}
}
So here, its easy to understand that when i will reach the value of 3, I will get mem access violation. What I would like to do, is to have some sort of mecanism which would return a array index that "looped" back to begining. In the case shown above, i+1 where i = 3 would be 0. If I were to put i+2, I would like to get 1, and so on...
This might be silly but it would save me the pain of writting 4 times the same logic in some switch/case statement inside a for-loop which then I'd need to apply modifications in all the 4 logics when debugging... Thank's!
use modulus to wrap when reaching the upper bound
long a[4];
size_t sz = sizeof(a) / sizeof(long);
for (size_t i = 0; i < sz; i++)
{
if (a[i] && a[(i + 1) % sz]) //Define some conditions regarding a[i] and a[i+1]
{
//Process code
}
}
maybe not super-optimal performance wise, though (well, with a value of 4, the compiler can trade modulus for mask so it would be okay).
In the general case, avoid a division, and check if overruns then choose 0 if it does
for (size_t i = 0; i < sz; i++)
{
size_t j = i < sz-1 ? i+1 : 0;
if (a[i] && a[j]) //Define some conditions regarding a[i] and a[i+1]
{
//Process code
}
}
(the code will work in C and C++)

Fill an array from different threads concurrently c++

First of all, I think it is important to say that I am new to multithreading and know very little about it. I was trying to write some programs in C++ using threads and ran into a problem (question) that I will try to explain to you now:
I wanted to use several threads to fill an array, here is my code:
static const int num_threads = 5;
int A[50], n;
//------------------------------------------------------------
void ThreadFunc(int tid)
{
for (int q = 0; q < 5; q++)
{
A[n] = tid;
n++;
}
}
//------------------------------------------------------------
int main()
{
thread t[num_threads];
n = 0;
for (int i = 0; i < num_threads; i++)
{
t[i] = thread(ThreadFunc, i);
}
for (int i = 0; i < num_threads; i++)
{
t[i].join();
}
for (int i = 0; i < n; i++)
cout << A[i] << endl;
return 0;
}
As a result of this program I get:
0
0
0
0
0
1
1
1
1
1
2
2
2
2
2
and so on.
As I understand, the second thread starts writing elements to an array only when the first thread finishes writing all elements to an array.
The question is why threads dont't work concurrently? I mean why don't I get something like that:
0
1
2
0
3
1
4
and so on.
Is there any way to solve this problem?
Thank you in advance.
Since n is accessed from more than one thread, those accesses need to be synchronized so that changes made in one thread don't conflict with changes made in another. There are (at least) two ways to do this.
First, you can make n an atomic variable. Just change its definition, and do the increment where the value is used:
std::atomic<int> n;
...
A[n++] = tid;
Or you can wrap all the accesses inside a critical section:
std::mutex mtx;
int next_n() {
std::unique_lock<std::mutex> lock(mtx);
return n++;
}
And in each thread, instead of directly incrementing n, call that function:
A[next_n()] = tid;
This is much slower than the atomic access, so not appropriate here. In more complex situations it will be the right solution.
The worker function is so short, i.e., finishes executing so quickly, that it's possible that each thread is completing before the next one even starts. Also, you may need to link with a thread library to get real threads, e.g., -lpthread. Even with that, the results you're getting are purely by chance and could appear in any order.
There are two corrections you need to make for your program to be properly synchronized. Change:
int n;
// ...
A[n] = tid; n++;
to
std::atomic_int n;
// ...
A[n++] = tid;
Often it's preferable to avoid synchronization issues altogether and split the workload across threads. Since the work done per iteration is the same here, it's as easy as dividing the work evenly:
void ThreadFunc(int tid, int first, int last)
{
for (int i = first; i < last; i++)
A[i] = tid;
}
Inside main, modify the thread create loop:
for (int first = 0, i = 0; i < num_threads; i++) {
// possible num_threads does not evenly divide ASIZE.
int last = (i != num_threads-1) ? std::size(A)/num_threads*(i+1) : std::size(A);
t[i] = thread(ThreadFunc, i, first, last);
first = last;
}
Of course by doing this, even though the array may be written out of order, the values will be stored to the same locations every time.

threading program in C++ not faster

I have a program which reads the file line by line and then stores each possible substring of length 50 in a hash table along with its frequency. I tried to use threads in my program so that it will read 5 lines and then use five different threads to do the processing. The processing involves reading each substring of that line and putting them into hash map with frequency. But it seems there is something wrong which I could not figure out for which the program is not faster then the serial approach. Also, for large input file it is aborted. Here is the piece of code I am using
unordered_map<string, int> m;
mutex mtx;
void parseLine(char *line, int subLen){
int no_substr = strlen(line) - subLen;
for(int i = 0; i <= no_substr; i++) {
char *subStr = (char*) malloc(sizeof(char)* subLen + 1);
strncpy(subStr, line+i, subLen);
subStr[subLen]='\0';
mtx.lock();
string s(subStr);
if(m.find(s) != m.end()) m[s]++;
else {
pair<string, int> ret(s, 1);
m.insert(ret);
}
mtx.unlock();
}
}
int main(){
char **Array = (char **) malloc(sizeof(char *) * num_thread +1);
int num = 0;
while (NOT END OF FILE) {
if(num < num_th) {
if(num == 0)
for(int x = 0; x < num_th; x++)
Array[x] = (char*) malloc(sizeof(char)*strlen(line)+1);
strcpy(Array[num], line);
num++;
}
else {
vector<thread> threads;
for(int i = 0; i < num_th; i++) {
threads.push_back(thread(parseLine, Array[i]);
}
for(int i = 0; i < num_th; i++){
if(threads[i].joinable()) {
threads[i].join();
}
}
for(int x = 0; x < num_th; x++) free(seqArray[x]);
num = 0;
}
}
}
It's a myth that just by the virtue of using threads, the end result must be faster. In general, in order to take advantage of multithreading, two conditions must be met(*):
1) You actually have to have sufficient physical CPU cores, that can run the threads at the same time.
2) The threads have independent tasks to do, that they can do on their own.
From a cursory examination of the shown code, it seems to fail on the second part. It seems to me that, most of the time all of these threads will be fighting each other in order to acquire the same mutex. There's little to be gained from multithreading, in this situation.
(*) Of course, you don't always use threads for purely performance reasons. Multithreading also comes in useful in many other situations too, for example, in a program with a GUI, having a separate thread updating the GUI helps the UI working even while the main execution thread is chewing on something, for a while...

Almost same code running much slower

I am trying to solve this problem:
Given a string array words, find the maximum value of length(word[i]) * length(word[j]) where the two words do not share common letters. You may assume that each word will contain only lower case letters. If no such two words exist, return 0.
https://leetcode.com/problems/maximum-product-of-word-lengths/
You can create a bitmap of char for each word to check if they share chars in common and then calc the max product.
I have two method almost equal but the first pass checks, while the second is too slow, can you understand why?
class Solution {
public:
int maxProduct2(vector<string>& words) {
int len = words.size();
int *num = new int[len];
// compute the bit O(n)
for (int i = 0; i < len; i ++) {
int k = 0;
for (int j = 0; j < words[i].length(); j ++) {
k = k | (1 <<(char)(words[i].at(j)));
}
num[i] = k;
}
int c = 0;
// O(n^2)
for (int i = 0; i < len - 1; i ++) {
for (int j = i + 1; j < len; j ++) {
if ((num[i] & num[j]) == 0) { // if no common letters
int x = words[i].length() * words[j].length();
if (x > c) {
c = x;
}
}
}
}
delete []num;
return c;
}
int maxProduct(vector<string>& words) {
vector<int> bitmap(words.size());
for(int i=0;i<words.size();++i) {
int k = 0;
for(int j=0;j<words[i].length();++j) {
k |= 1 << (char)(words[i][j]);
}
bitmap[i] = k;
}
int maxProd = 0;
for(int i=0;i<words.size()-1;++i) {
for(int j=i+1;j<words.size();++j) {
if ( !(bitmap[i] & bitmap[j])) {
int x = words[i].length() * words[j].length();
if ( x > maxProd )
maxProd = x;
}
}
}
return maxProd;
}
};
Why the second function (maxProduct) is too slow for leetcode?
Solution
The second method does repetitive call to words.size(). If you save that in a var than it working fine
Since my comment turned out to be correct I'll turn my comment into an answer and try to explain what I think is happening.
I wrote some simple code to benchmark on my own machine with two solutions of two loops each. The only difference is the call to words.size() is inside the loop versus outside the loop. The first solution is approximately 13.87 seconds versus 16.65 seconds for the second solution. This isn't huge, but it's about 20% slower.
Even though vector.size() is a constant time operation that doesn't mean it's as fast as just checking against a variable that's already in a register. Constant time can still have large variances. When inside nested loops that adds up.
The other thing that could be happening (someone much smarter than me will probably chime in and let us know) is that you're hurting your CPU optimizations like branching and pipelining. Every time it gets to the end of the the loop it has to stop, wait for the call to size() to return, and then check the loop variable against that return value. If the cpu can look ahead and guess that j is still going to be less than len because it hasn't seen len change (len isn't even inside the loop!) it can make a good branch prediction each time and not have to wait.

c++ counting sort

I tried to write a countingsort, but there's some problem with it.
here's the code:
int *countSort(int* start, int* end, int maxvalue)
{
int *B = new int[(int)(end-start)];
int *C = new int[maxvalue];
for (int i = 0; i < maxvalue; i++)
{
*(C+i) = 0;
}
for (int *i = start; i < end; i++)
{
*(C+*i) += 1;
}
for (int i = 1; i < maxvalue-1 ; i++)
{
*(C+i) += *(C+i-1);
}
for (int *i = end-1; i > start-1; i--)
{
*(B+*(C+(*i))) = *i;
*(C+(*i)) -= 1;
}
return B;
}
In the last loop it throws an exception "Acces violation writing at location: -some ram address-"
Where did I go wrong?
for (int i = 1; i < maxvalue-1 ; i++)
That's the incorrect upper bound. You want to go from 1 to maxvalue.
for (int *i = end-1; i > start-1; i--)
{
*(B+*(C+(*i))) = *i;
*(C+(*i)) -= 1;
}
This loop is also completely incorrect. I don't know what it does, but a brief mental test shows that the first iteration sets the element of B at the index of the value of the last element in the array to the number of times it shows. I guarantee that that is not correct. The last loop should be something like:
int* out = B;
int j=0;
for (int i = 0; i < maxvalue; i++) { //for each value
for(j<C[i]; j++) { //for the number of times its in the source
*out = i; //add it to the output
++out; //in the next open slot
}
}
As a final note, why are you playing with pointers like that?
*(B + i) //is the same as
B[i] //and people will hate you less
*(B+*(C+(*i))) //is the same as
B[C[*i]]
Since you're using C++ anyway, why not simplify the code (dramatically) by using std::vector instead of dynamically allocated arrays (and leaking one in the process)?
std::vector<int>countSort(int* start, int* end, int maxvalue)
{
std::vector<int> B(end-start);
std::vector<int> C(maxvalue);
for (int *i = start; i < end; i++)
++C[*i];
// etc.
Other than that, the logic you're using doesn't make sense to me. I think to get a working result, you're probably best off sitting down with a sheet of paper and working out the steps you need to use. I've left the counting part in place above, because I believe that much is correct. I don't think the rest really is. I'll even give a rather simple hint: once you've done the counting, you can generate B (your result) based only on what you have in C -- you do not need to refer back to the original array at all. The easiest way to do it will normally use a nested loop. Also note that it's probably easier to reserve the space in B and use push_back to put the data in it, rather than setting its initial size.