Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I understand, if shared memory is used correctly it can be faster than any other kind of IPC. My question is a bit more specific: If I transfer many small packets, eg 100 bytes, from different programs to one main program, what kind of speed difference can I expect?
The benefit from using shared memory will not be so much, because you will end up with using conditional variables on the shared memory (cf. pthread_condattr_setpshared; it will be a substantial coding work, by the way.) Then your logic is governed by the OS scheduler, and it's not very different from using localhost TCP connection which has a different and fast implementation than standard TCP on most OS.
If it's OK to entirely rely on a spinlock on the shared mem, then you will indeed realize substantial speed up like x3 fold.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
All in title. Since coroutine just need a sort of EIP memory, and thread provides that, is it possible to do it? That's way to have a highly portable coroutine library.
You can implement coroutines / generators based on threads or even fibers. But they would be less performant compared to implementations like msvc or gcc provide.
I implemented coroutines that way (no threads, but fibers) in delphi. You need to estimate how much stack-memory do you need, you need to create those threads (which is heavy...).
So you should avoid that.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
What is the performance penalty when accessing a data structure if this is located:
In the same process memory block.
In a shared memory block (including locking, but supposing
no other processes access it for a significant amount of time).
I am interested in an approximate comparison values (e.g. percentage), for access, read and write.
All your process memory is mmaped. It does not matter whether one or more processes map the same physical pages of memory, there is no difference in the speed of access in this regard.
What matters in whether memory is located on the local or remote NUMA node.
See NUMA benchmarks in Challenges of Memory Management on Modern NUMA System.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Is there any way to allow multithreading on a program with a loop like this one?
int a=100000,b=50,c;while(a){c=b*a+c;a--;}
Windows 7/Code::Blocks IDE with the default mingw c++ compiler/Dual-core 4 threaded i5 cpu
This isn't a programming issue per se.
Your CPU has either four cores, or two cores with hyperthreading. The program is using 100% of 1 core, which is reported as 25% usage in the Windows task manager.
You won't be able to 'increase your CPU usage' without threading.
(As an aside, the reason you see it as 'distributed between the four threads' is because the operating system is, if you like, changing its mind about which core it wants to run your program on. Such issues of scheduling can't be changed (and won't have a noticable impact on) individual programs.)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
In theory memory circularBuffer sounds like a good idea... the setting and the getting are never at the same address. However the limiting factor in the hardware. The computer will only allow use to access one memory location at a time. So then how can a circularBuffer improve performance ??
This link gives some reasons why circular buffers offer better performance than synchronized access to a single, shared data structure.
What hardware are you using, that only allows access to one memory location at a time?
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
How can I allocate shared memory accessible from multiple processes by using only native C++ operations? Or should I use my OS API as it is in the case of inter thread synchronization objects such as mutex and semaphores are? (I mean you can not use bool instead of mutex. OS has specific types for organizing the synchronization.)
There is no notion of "shared memory", or even "process", in "only native C++". Those are necessarily platform-specific concepts.
You can try Boost's Interprocess library for some useful abstractions.
Basically, you need to use OS API. But there are cross-platform libraries (e.g. Boost) which implement access to a shared memory.