What are the available improved Performance with Custom Pool Allocators? (e.x. in terms of the multi-thread access to the pool)
Or let's say: Is there any updated answer to this question?
Can multithreading speed up memory allocation? This question was asked more than 5 years ago. So I guess some updated answers would be helpful.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
What is the performance penalty when accessing a data structure if this is located:
In the same process memory block.
In a shared memory block (including locking, but supposing
no other processes access it for a significant amount of time).
I am interested in an approximate comparison values (e.g. percentage), for access, read and write.
All your process memory is mmaped. It does not matter whether one or more processes map the same physical pages of memory, there is no difference in the speed of access in this regard.
What matters in whether memory is located on the local or remote NUMA node.
See NUMA benchmarks in Challenges of Memory Management on Modern NUMA System.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Does memory allocation in multiple threads in modern C++ compilers cause a global lock access? How much does that vary between compilers and operation systems? How much benefit is there to putting small amounts of data in a pre-allocated global array (less clean, less convenient) instead of dynamically allocating it when needed by individual threads?
All threads share a common virtual address space, so any memory allocation from the heap (malloc or new) will result in an update to the virtual address spaces used by all threads. How this is implemented will depend on the operating system as well as the compiler.
If the allocated memory only needs function scope and isn't too large, then it could be allocated using alloca() (or _alloca()), which allocates from the stack, which would be a thread and function local instance of that allocated memory.
In the multi-threaded programs I've written, I've used message and/or buffer "free" pools that are allocated at startup, then have the threads "allocate" and "free" the messages and/or buffers from the "free" pools.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm still new to C++ but I catch on quick and also have experience in C#. Something I want to know is, what are some performance safe actions I can take to ensure that my game runs efficiently. Also, in which scenario am I likely to run out of memory on. 2k to 3k bullet objects on the stack, or heap? I think the stack is generally faster, but I heard that too much causes a stack overflow. That being said, how much is too much exactly?
Sorry for the plethora of questions, I just want to make sure I don't design a game engine in which it relies on good PCs in order to run well.
Firstly, program your game safely and only worry about optimizations like memory layout after profiling & debugging.
That being said, I have to dispel the myth that the stack is faster than the heap. What matters is cache performance.
The stack generally is faster for small quick accesses, because the stack usually already is in the cache. But when you are iterating over thousands of bullet objects on the heap, as long as you store them contiguously (e.g. std::vector, not std::list), everything should be loaded into the cache, and there should be no performance difference.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
How can I allocate shared memory accessible from multiple processes by using only native C++ operations? Or should I use my OS API as it is in the case of inter thread synchronization objects such as mutex and semaphores are? (I mean you can not use bool instead of mutex. OS has specific types for organizing the synchronization.)
There is no notion of "shared memory", or even "process", in "only native C++". Those are necessarily platform-specific concepts.
You can try Boost's Interprocess library for some useful abstractions.
Basically, you need to use OS API. But there are cross-platform libraries (e.g. Boost) which implement access to a shared memory.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I just came across a new term: thread pool. I don't know what it is, can any ody offer some information about this?
What it is a thread pool and how it is implemented?
Is a thread pool just a collection of threads?
ThreadPool is basically collection of threads. Whenever a task is assigned to the threadpool, the available thread accepts the task and executes it.
The advantages of thread pool is to control the threads creation\destruction and also, optimize the thread usage.
Thread Pool concept is not the C++ language feature. There are many custom implementation of thread Pool. ( Using different strategies).
You can also read
thread pool in .NET
The threadpool library
to know
more.