I have a question on memory allocation for containers on C++.
Look at the pseudocode for a multi threaded application (assume it is in c++). I declare vector object in the main method. Then I run a thread and pass this object to the thread. The thread runs in another processor. Now, I insert 100000 elements into the vector.
typedef struct myType
{
int a;
int b;
}myType;
ThreadRoutine()
{
Run Thread in processor P;
insert 1000000 elements into myTypeObject
}
int main()
{
std::vector<myType> myTypeObject;
CALLTHREAD and pass myTypeObject
}
I want to know where the memory will be allocated for the 100000 elements:
-From the main itself
-From the thread
The reason I ask this is because, I want to run the thread in a different processor. And my machine is a NUMA machine. So if the memory is allocated from the thread, it will be in the local memory bank of the thread. But if the memory is allocated from main, it will be allocated from the local memory bank of the main thread.
By my intuition, I would say that the memory is allocated only in the threads. Please let me know your thoughts.
the reallocation(s) will be called from ThreadRoutine() -- so, whichever thread calls that (the secondary in your example).
of course, you could reserve on the main thread before passing it around, if you want to avoid resizing on the secondary thread.
Related
this is more a general question rather than a specific coding problem. Does (and if yes, how) C++ avoid that multiple threads try to allocate the same memory adresses?
For example:
#include <vector>
#include <thread>
int main() {
std::vector<int> x, y;
std::thread do_work([&x] () {
/* push_back a lot of ints to x */
});
/* push_back a lot of ints to y */
do_work.join()
/* do more stuff */ }
When the two vectors allocate memory because they reached their capacity, is it possible that both vectors try to allocate the same piece of memory since the heap is shared among threads? Is code like this unsafe because it potentially creates races?
Allocation memory (via malloc/new/HeapAlloc and such) are thread-safe by default as long as you've compiled you application against thread safe runtimes (which you will, unless you explicitly change that).
Each vector will get its own slice of memory whenever they resize, but once a slice is freed any (other) thread could end up getting it the next time an allocation occurs.
You could, however, mess things up if you replace your allocators. Like if you overload the "new operator" and you're no longer getting memory for a thread safe source. https://en.cppreference.com/w/cpp/memory/new/operator_new
You could also opt to use a non-thread-safe version of malloc if you replace if via some sort of library-preload on Linux (Overriding 'malloc' using the LD_PRELOAD mechanism).
But assuming you're not doing any of that, the default implementation of user-space allocators (new/malloc) are thread safe, and OS level allocators (VirtualAlloc/mmap) are always thread safe.
I've read that different threads share the same memory segments apart from the stack. There is something i've been trying to understand. I have a class which creates different threads. Below is a simple example of what I am doing, creating one thread in the constructor.
When an object of this class is created, in main() for example, all the threads can access the same member variables. If every thread gets its own stack, why doesn't each thread get a copy of the member variables rather than access to the same variable. I have looked around and I'm trying to get a picture in my mind of what is going on in memory here with the different stack frames. Many thanks in advance for any replies.
////////////////////////
MyClass::MyClass()
{
t1 = std::thread([this] { this->threadFunc(); });
t1.detach();
}
/////////////////////////
void MyClass::threadFunc()
{
///do stuff.. update member variables etc.
}
I've read that different threads share the same memory segments apart
from the stack.
This is not entirely accurate. Threads share the same address space as opposed to being sand-boxed like processes are.
The stack is just some memory in your application that has been specifically reserved and is used to hold things such as function parameters, local variables, and other function-related information.
Every thread has it's own stack. This means that when a particular thread is executing, it will use it's own specific stack to avoid trampling over other threads which might be idle or executing simultaneously in a multi-core system.
Remember that these stacks are all still inside the same address space which means that any thread can access the contents of another threads' stack.
A simple example:
#include <iostream>
#include <thread>
void Foo(int& i)
{
// if thread t is executing this function then j will sit inside thread t's stack
// if we call this function from the main thread then j will sit inside the main stack
int j = 456;
i++; // we can see i because threads share the same address space
}
int main()
{
int i = 123; // this will sit inside the main threads' stack
std::thread t(std::bind(&Foo, std::ref(i))); // we pass the address of i to our thread
t.join();
std::cout << i << '\n';
return 0;
}
An illustration:
>
As you can see, each thread has its own stack (which is just some part of the processes' memory) which lives inside the same address space.
I've read that different threads share the same memory segments apart from the stack. [...] If every thread gets its own stack, why doesn't each thread get a copy of the member variables rather than access to the same variable.
Hmm... the short answer is, that's the purpose for which threads were designed: multiple threads of execution, all accessing the same data. Then you have threading primitives - like mutexes - to make sure that they don't step on each others' toes, so to speak - that you don't have concurrent writes, for example.
Thus in the default case, threads share objects, and you have to do extra work to synchronize accesses. If you want threads to each have copies of objects... then you can instead write the code to give each thread a copy of the object, instead of the same object. Or, you may consider using processes (e.g. via fork()) instead of threads.
If you have
MyClass c1;
MyClass c2;
then there will be two instances of MyClass, c1 and c2, each with its own members. The thread started during construction of c1 will have access to c1's members only and the thread started in c2 will have access to c2's members only.
If you are talking about several threads within one class (not obvious from your code), then each thread has a copy of this which is just a pointer to your MyClass object. New (local) variables will be allocated on the thread's stack and will be visible only for the thread which creates them.
Lets say there are 4 consumer threads that run in a loop continuously
function consumerLoop(threadIndex)
{
int myArray[100];
main loop {
..process data..
myArray[myIndex] += newValue
}
}
I have another monitor thread which does other background tasks.
I need to access the myArray for each of these threads from the monitor thread.
Assume that the loops will run for ever(so the local variables would exist) and the only operation required from the monitor thread is to read the array contents of all the threads.
One alternative is to change myArray to a global array of arrays. But i am guessing that would slow down the consumer loops.
What are the ill effects of declaring a global pointer array
int *p[4]; and assigning each element to the address of the local variable by adding a line in consumerLoop like so p[threadIndex] = myArray and accessing p from monitor thread?
Note: I am running It in a linux system and the language is C++. I am not concerned about synchronization/validity of the array contents when i am accessing it from the monitor thread.Lets stay away from a discussion of locking
If you are really interested in the performance difference, you have to measure. I would guess, that there are nearly no differenced.
Both approaches are correct, as long as the monitor thread doesn't access stack local variables that are invalid because the function returned.
You can not access myArray from different thread because it is local variable.
You can do 1) Use glibal variable or 2) Malloca and pass the address to all threads.
Please protect the critical section when all threads rush to use the common memory.
I am optimizing a for loop with openMP. In each thread, a large array will be temporarily used (not needed when this thread finishes). Since I don't want to repeatedly allocate & delete these arrays, so I plan to allocate a large block of memory, and assign a part to each thread. To avoid conflicting, I should have a unique ID for each running thread, which should not change and cannot be equal to another thread. So my question is, can I use the thread ID return by function omp_get_thread_num() for this purpose? Or is there any efficient solution for such memory allocation & assignment task? Thanks very much!
You can start the parallel section and then start allocating variables/memory. Everything that is declared within the parallel section is thread private on their own stack. Example:
#pragma omp parallel
{
// every variable declared here is thread private
int * temp_array_pointer = calloc(sizeof(int), num_elements);
int temp_array_on_stack[num_elements];
#pragma omp for
for (...) {
// whatever my loop does
}
// if you used dynamic allocation
free(temp_array_pointer);
}
Once your program encounters a parallel region, that is once it hits
#pragma omp parallel
the threads (which may have been started at program initialisation or not until the first parallel construct) will become active. Inside the parallel region any thread which allocates memory, to an array for example, will be allocating that memory inside it's own, private, address space. Unless the thread deallocates the memory it will remain allocated for the entirety of the parallel region.
If your program first, in serial, allocates memory for an array and then, on entering the parallel region, copies that array to all threads, use the firstprivate clause and let the run time take care of copying the array into the private address space of each thread.
Given all that, I don't see the point of allocating, presumably before encountering the parallel region, a large amount of memory then sharing it between threads using some roll-your-own approach to dividing it based on calculations on the thread id.
In Linux is it possible to start a process (e.g. with execve) and make it use a particular memory region as stack space?
Background:
I have a C++ program and a fast allocator that gives me "fast memory". I can use it for objects that make use of the heap and create them in fast memory. Fine. But I also have a lot of variable living on the stack. How can I make them use the fast memory as well?
Idea: Implement a "program wrapper" that allocates fast memory and then starts the actual main program, passing a pointer to the fast memory and the program uses it as stack. Is that possible?
[Update]
The pthread setup seems to work.
With pthreads, you could use a secondary thread for your program logic, and set its stack address using pthread_attr_setstack():
NAME
pthread_attr_setstack, pthread_attr_getstack - set/get stack
attributes in thread attributes object
SYNOPSIS
#include <pthread.h>
int pthread_attr_setstack(pthread_attr_t *attr,
void *stackaddr, size_t stacksize);
DESCRIPTION
The pthread_attr_setstack() function sets the stack address and
stack size attributes of the thread attributes object referred
to by attr to the values specified in stackaddr and stacksize,
respectively. These attributes specify the location and size
of the stack that should be used by a thread that is created
using the thread attributes object attr.
stackaddr should point to the lowest addressable byte of a buf‐
fer of stacksize bytes that was allocated by the caller. The
pages of the allocated buffer should be both readable and
writable.
What I don't follow is how you're expecting to get any performance improvements out of doing something like this (I assume the purpose of your "fast" memory is better performance).