C++ multi-threading: thread-safe memory allocation

C++ multi-threading: thread-safe memory allocation - c++

I am trying to understand if in C++11 new/delete are thread-safe.
I have found conflicting answers.
I am running this short program and sometimes I get different results from the two threads (I would expect to always get the same result instead).
Is this due to issues in memory allocation? What am I missing?
I tried with malloc/free, same behaviour.
I am compiling it with:
g++ -o out test_thread.cpp -std=c++11 -pthread
g++ (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
Thanks.
#include <string>
#include <iostream>
#include <thread>
#include <stdlib.h>
void task(int id)
{
int N = 10000;
srand(100);
int j;
long tot = 0;
int *v = new int[N];
/* int *v = 0;
v = (int *) malloc (N * sizeof(int));
*/
for (j = 0; j < N; j++)
v[j] = rand();
for (j = 0; j < N; j++)
tot += v[j];
//free(v);
delete [] v;
printf("Thread #%d: total %ld\n", id, tot);
}
int main()
{
std::thread t1(task, 1);
std::thread t2(task, 2);
t1.join();
t2.join();
}

rand() shares state between threads; that already accounts for your observations.

Related

MSVC compiled program is 3x slower than program, compiled with MinGW

I need to write some addon for NodeJS. Since addon should work on Windows, I need to compile it with MSVC. But after compilation I discovered that addon was slower than original program. So, after check, I made a conclusion that problem is MSVC. See results below. So, I don't understand, why MSVC produce so slow program. Can I optimize compilation with MSVC in some way, to reach speed of MinGW compiled program? Or simply MSVC produce less optimized code and maximum optimization for MSVC is already reached?
Compiler: g++ (x86_64-win32-sjlj-rev3, Built by MinGW-W64 project)
12.1.0
Compilation: g++ -O3 main.cpp
Execution time: 18517 ms
Compiler: Microsoft (R) C/C++ Optimizing Compiler Version 19.33.31629 for x64 (Toolset v143, SDK 10.0.20348.0)
Compilation: cl /O2 main.cpp
Execution time: 58144 ms
Notes:
I added swap(word1, word2) only to prevent MinGW optimization, that results in 0s execution time.
If I switch from string to char[] the execution time in MinGW almost not change (1-2s faster), but it reduces significantly in MSVC - 36 s after switch. If I replace with vector of int it reduces again to 25 s.
Minimal reproducible example:
#include <cmath>
#include <string>
#include <chrono>
#include <iostream>
using namespace std;
int cached[6];
int SIZE;
void setSize(int size) {
SIZE = size;
for (int i = 0; i < 6; i++)
cached[i] = pow(3, i);
}
int getMask(const string& guess, const string& answer) {
int results[6];
bool visited[6];
for (int i = 0; i < SIZE; i++) {
if (guess[i] == answer[i]) {
results[i] = 2;
visited[i] = true;
}
else {
results[i] = 0;
visited[i] = false;
}
}
for (int i = 0; i < SIZE; i++) {
if (results[i] != 2) {
for (int j = 0; j < SIZE; j++) {
if (answer[j] == guess[i] && !visited[j]) {
results[i] = 1;
visited[j] = true;
break;
}
}
}
}
int result = results[0];
for (int i = 1; i < SIZE; i++) {
result += results[i] * cached[i];
}
return result;
}
int main() {
setSize(6);
int sum = 0;
auto t0 = chrono::steady_clock::now();
string word1 = "abcdef";
string word2 = "fedcba";
for (int i = 0; i < 30000; i++) {
for (int j = 0; j < 30000; j++) {
sum += getMask(word1, word2);
swap(word1, word2);
}
}
auto t1 = chrono::steady_clock::now();
cout << chrono::duration_cast<chrono::milliseconds>(t1 - t0).count() << "[ms]" << endl;
cout << sum << endl;
return 0;
}

SSEx with a only 1.25x speed boost

a newbie in coding, really need your advice......
Recently I'm been trying some SSE coding to speed up simple calculations (addition and multiplication), I've been told there will be a 2x more speed boost with SSEx. But my result shows only a 1.25x boost, is there anything wrong with my code?
I've tried declaring the input arrays as global variables to maintain address continuity,not using local variables in SSE part, both in vain.
The following is the code,compiling with
g++ -mfpath=sse -mmmx -msse -msse2 -msse4.1 -O -Wall test.c
#define N 32768
#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>
#include <smmintrin.h> //sse4.1
#include <emmintrin.h> //sse2
#include <xmmintrin.h> //sse
#include <mmintrin.h> //mmx
#include <time.h>
#include <string.h>
void init_with_rand(float *array);
float input1[N];
float input2[N];
float input3[N];
float output1[N];
float output2[N];
__m128 A,B,C,MUX,SUM;
int main(void)
{
clock_t t1, t2;
int i,j;
init_with_rand(input1);
init_with_rand(input2);
init_with_rand(input3);
t1 = clock();
for(j = 0; j < 1000000; j++){
for(i = 0; i < N; i++){
output1[i] = input1[i] * input2[i] + input3[i];
}
}
t1 = clock()-t1;
printf ("It took me %d clicks (%f seconds).\n",t1,((float)t1)/CLOCKS_PER_SEC);
/////////////////////////////////////////////////////////////////////////////////
t2 = clock();
for(j = 0; j < 1000000; j++){
for(i = 0; i < N; i+=4){
A = _mm_load_ps(input1+i);
B = _mm_load_ps(input2+i);
C = _mm_load_ps(input3+i);
MUX = _mm_mul_ps(A, B);
SUM = _mm_add_ps( MUX , C);
_mm_store_ps(output2+i, SUM);
}
}
t2 = clock()-t2;
printf ("It took me %d clicks (%f seconds).\n",t2,((float)t2)/CLOCKS_PER_SEC);
printf ("Performance is increased by %f times.\n",((float)t1/(float)t2));
if(!memcmp(output1,output2,N))
printf("Valid\n");
else if(memcmp(output1,output2,N))
printf("Invalid\n");
else
printf("Error\n");
return 0;
}
void init_with_rand(float *array)
{
int i;
for( i = 0; i < N; i++)
array[i] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
}
Thanks for any suggestion!

C++ simple threads example

Hello I'm studying c++ & threads.I'm new to c++ and the following code is my own based on experience in other languages. However although to me it seems okay and it does compile, when I execute it, it hangs - does nothing. Could you please tell me what I'm doing wrong?
#include <iostream>
#include <thread>
#include <vector>
#include <string>
void printLine(std::string str) {
std::cout << str << std::endl;
}
void child(int id) {
printLine("This is a thread with id: " + std::to_string(id));
}
int main() {
printLine("This is the main thread and we are baout to spawn threads...");
std::vector<std::thread> threads;
for (int i = 0; i < 10; i++) {
threads[i] = std::thread(child, i);
threads[i].join();
}
printLine("Press any key to exit...");
std::getchar();
return 0;
}

Your code trying to get some threads running isn't the problem here, it's with the test case:
std::vector<std::thread> threads;
for (int i = 0; i < 10; i++) {
threads[i] = std::thread(child, i);
threads[i].join();
}
threads is empty upon entering the for loop, thus accessing threads[0] or > 0 leads to Undefined Behaviour.
You should use push_back ( or emplace_back ) instead to actually add elements to that vector:
std::vector<std::thread> threads;
for (int i = 0; i < 10; i++) {
threads.push_back(std::thread(child, i));
threads[i].join();
}

C++ Strange std::bad_alloc exception

So I have the following code:
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
const int MAXN = 1000000;
int isNotPrime[MAXN];
vector<int> primes;
void sieve()
{
for(int i = 2; i <= sqrt(MAXN); ++i)
{
if(isNotPrime[i]) continue;
for(int j = i*i; j <= MAXN; j += i)
{
isNotPrime[j] = true;
}
}
for(int i = 2; i <= MAXN; ++i)
{
if(!isNotPrime[i])
{
primes.push_back(i);
}
}
}
int main()
{
ios::sync_with_stdio(false);
sieve();
return 0;
}
What I cannot understand is why my program throws a std::bad_alloc exception when it executes. Even more mind-boggling is that when I swap the lines int isNotPrime[MAXN]; and vector<int> primes; the programs executes as intended.
Swapped like this:
vector<int> primes;
int isNotPrime[MAXN];

The problem is here:
for(int i = 2; i <= MAXN; ++i)
The check should be i < MAXN instead. (Or, make the array have size MAXN + 1.)
At some point, the isNotPrime[MAXN] = true; executes, which overflows the bounds of the array, causing undefined behaviour. In practice, this overwrites some internal field of the next variable (primes), which confuses the std::vector implementation, probably causing it to request a lot of memory.
This also explains why switching the variable order "fixes" it, because now you're scribbling over something else instead of primes.

Segmentation fault / glibc detected when creating shared library

EDITS----------------I tried with gcc-4.8.1 and still the same error.-------------I am trying to implement a simple matrix multiplication example using pthreads via a shared library. But I get this error when I try to create a shared library:
g++ -shared -o libMatmul.so matmul.o
collect2: ld terminated with signal 11 [Segmentation fault], core dumped
Here is the code I am using:
matmul.h:
#ifndef matmul_h__
#define matmul_h__
#define SIZE 10
typedef struct {
int dim;
int slice;
} matThread;
int num_thrd;
int A[SIZE][SIZE], B[SIZE][SIZE], C[SIZE][SIZE];
int m[SIZE][SIZE];
extern void init_matrix(int m[SIZE][SIZE]);
extern void print_matrix(int m[SIZE][SIZE]);
extern void* multiply(void* matThread);
#endif
matmul.c:
extern "C"
{
#include <pthread.h>
#include <unistd.h>
}
#include <iostream>
#include "matmul.h"
using namespace std ;
matThread* s=NULL;
// initialize a matrix
void init_matrix(int m[SIZE][SIZE])
{
int i, j, val = 0;
for (i = 0; i < SIZE; i++)
for (j = 0; j < SIZE; j++)
m[i][j] = val++;
}
void print_matrix(int m[SIZE][SIZE])
{
int i, j;
for (i = 0; i < SIZE; i++) {
cout<<"\n\t|" ;
for (j = 0; j < SIZE; j++)
cout<<m[i][j] ;
cout<<"|";
}
}
// thread function: taking "slice" as its argument
void* multiply(void* param)
{
matThread* s = (matThread*)param; // retrive the slice info
int slice1=s->slice;
int D= s->dim=10;
int from = (slice1 * D)/num_thrd; // note that this 'slicing' works fine
int to = ((slice1+1) * D)/num_thrd; // even if SIZE is not divisible by num_thrd
int i,j,k;
cout<<"computing slice " << slice1<<" from row "<< from<< " to " <<to-1<<endl;
for (i = from; i < to; i++)
{
for (j = 0; j < D; j++)
{
C[i][j] = 0;
for ( k = 0; k < D; k++)
C[i][j] += A[i][k]*B[k][j];
}
}
cout<<" finished slice "<<slice1<<endl;
return NULL;
}
main.c:
extern "C"
{
#include <pthread.h>
#include <unistd.h>
}
#include <iostream>
#include "matmul.h"
using namespace std;
// Size by SIZE matrices
// number of threads
matThread* parm=NULL;
int main(int argc, char* argv[])
{
pthread_t* thread; // pointer to a group of threads
int i;
if (argc!=2)
{
cout<<"Usage:"<< argv[0]<<" number_of_threads"<<endl;
exit(-1);
}
num_thrd = atoi(argv[1]);
init_matrix(A);
init_matrix(B);
thread = (pthread_t*) malloc(num_thrd*sizeof(pthread_t));
matThread *parm = new matThread();
for (i = 0; i < num_thrd; i++)
{
parm->slice=i;
// creates each thread working on its own slice of i
if (pthread_create (&thread[i], NULL, multiply, (void*)parm) != 0)
{
cerr<<"Can't create thread"<<endl;
free(thread);
exit(-1);
}
}
for (i = 1; i < num_thrd; i++)
pthread_join (thread[i], NULL);
cout<<"\n\n";
print_matrix(A);
cout<<"\n\n\t *"<<endl;
print_matrix(B);
cout<<"\n\n\t="<<endl;
print_matrix(C);
cout<<"\n\n";
free(thread);
return 0;
}
The commands that I use are:
g++ -c -Wall -fPIC matmul.cpp -o matmul.o and
g++ -shared -o libMatmul.so matmul.o
The code might look little off because I am passing SIZE(dim) in a struct when its already in #define, but this is how I want it to be implemented. Its a test program for a bigger project that I am doing.
Any help is greatly appreciated! Thanks in advance.

First, you're mixing a lot of C and C++ idioms (calling free and new for instance) and you're not using any C++ library/STL features (like a std::vector or std::list instead of a C array), so while your code is 'technically' valid (minus some bugs) it's not good practice to mix C and C++ like that, there are many small idiosyncratic differences between C and C++ (syntax, compilation and linkage differences for example) that can add confusion to the code if it's not explicitly clear to the intentions.
That being said, I've made some changes to your code to make it C++98 compatible (and fix the bugs):
start matmul.h:
#ifndef matmul_h__
#define matmul_h__
#define SIZE 10
#include <pthread.h>
typedef struct matThread {
int slice;
int dim;
pthread_t handle;
matThread() : slice(0), dim(0), handle(0) {}
matThread(int s) : slice(s), dim(0), handle(0) {}
matThread(int s, int d) : slice(s), dim(d), handle(0) {}
} matThread;
// explicitly define as extern (for clarity)
extern int num_thrd;
extern int A[SIZE][SIZE];
extern int B[SIZE][SIZE];
extern int C[SIZE][SIZE];
extern void init_matrix(int m[][SIZE]);
extern void print_matrix(int m[][SIZE]);
extern void* multiply(void* matThread);
#endif
start matmul.cpp:
#include <iostream> // <stdio.h>
#include "matmul.h"
int num_thrd = 1;
int A[SIZE][SIZE];
int B[SIZE][SIZE];
int C[SIZE][SIZE];
// initialize a matrix
void init_matrix(int m[][SIZE])
{
int i, j, val;
for (i = 0, val = -1; i < SIZE; i++) {
for (j = 0; j < SIZE; j++) {
m[i][j] = ++val;
}
}
}
void print_matrix(int m[][SIZE])
{
int i, j;
for (i = 0; i < SIZE; i++) {
std::cout << "\n\t|"; // printf
for (j = 0; j < SIZE; j++) {
std::cout << m[i][j];
}
std::cout << "|"; // printf
}
}
// thread function: taking "slice" as its argument
void* multiply(void* param)
{
matThread* s = (matThread*)param; // retrive the slice info
int slice1 = s->slice;
int D = s->dim = 10;
int from = (slice1 * D) / num_thrd; // note that this 'slicing' works fine
int to = ((slice1+1) * D) / num_thrd; // even if SIZE is not divisible by num_thrd
int i, j, k;
std::cout << "computing slice " << slice1 << " from row " << from << " to " << (to-1) << std::endl; // printf
for (i = from; i < to; i++) {
for (j = 0; j < D; j++) {
C[i][j] = 0;
for ( k = 0; k < D; k++) {
C[i][j] += A[i][k]*B[k][j];
}
}
}
std::cout << " finished slice " << slice1 << std::endl; // printf
return NULL;
}
start main.cpp:
#include <iostream>
#include <cstdlib> // atoi .. if C++11, you could use std::stoi in <string>
#include "matmul.h"
int main(int argc, char** argv)
{
if (argc != 2) {
std::cout << "Usage: " << argv[0] << " number_of_threads" << std::endl;
return -1;
} else {
num_thrd = std::atoi(argv[1]);
}
matThread mt[num_thrd];
int i = 0;
init_matrix(A);
init_matrix(B);
for (i = 0; i < num_thrd; i++) {
mt[i].slice = i;
// creates each thread working on its own slice of i
if (pthread_create(&mt[i].handle, NULL, &multiply, static_cast<void*>(&mt[i])) != 0) {
printf("Can't create thread\n");
return -1;
}
}
for (i = 0; i < num_thrd; i++) {
pthread_join(mt[i].handle, NULL);
}
std::cout << "\n\n";
print_matrix(A);
std::cout << "\n\n\t *\n";
print_matrix(B);
std::cout << "\n\n\t=\n";
print_matrix(C);
std::cout << "\n\n";
return 0;
}
To compile and use it you'll need to do the following commands:
g++ -c -Wall -fPIC matmul.cpp -o matmul.o
g++ -shared -Wl,-soname,libMatmul.so -o libMatmul.so.1 matmul.o
ln /full/path/to/libMatmul.so.1 /usr/lib/libMatmul.so
g++ main.cpp -o matmul -Wall -L. -lMatmul -pthread
Note that for your system to be able to find and link against the shared library you've just created, you'll need to ensure it's in your distro's lib folder (like /usr/lib/). You can copy/move it over, create a link to it (or a sym link via ln -s if you can't do hard links), and if you don't want to copy/move/link it, you can also ensure your LD_LIBRARY_PATH is properly set to include the build directory.
As I said; your code is NOT inherently C++ aside from the few print statements (std::cout, etc), and changing the C++ code (std::cout to printf and some other minor things for example) you could compile this as standard C99 code. I'm not 100% sure how the rest of your shared library will be designed so I didn't change the structure of the lib code (i.e. the functions you have) but if you wanted this code to be 'more C++' (i.e. with classes/namespaces, STL, etc.), you'd basically need to redesign your code, but given the context of your code, I don't think that's absolutely necessary unless you have a specific need for it.
I hope that can help.

Should
for (i = 1; i < num_thrd; i++)
not be
for (i = 0; i < num_thrd; i++)
You created num_thrd threads, but did not join all of them, therefore, a race condition is created as you're trying to read the data before the thread is finished.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ multi-threading: thread-safe memory allocation - c++

rand() shares state between threads; that already accounts for your observations.

Related

MSVC compiled program is 3x slower than program, compiled with MinGW

SSEx with a only 1.25x speed boost

C++ simple threads example

C++ Strange std::bad_alloc exception

Segmentation fault / glibc detected when creating shared library

Categories

Resources