Synchronise Shared Memory in a multi threaded environment - c++

I have implemented a Observer Pattern, in C++ project.
My Subject is a XML File reader which reads tags and publishes its value.
I have some "processing objects" which are my observers. They check the tag that has currently been read, if they have subsribed to the tag they will process it else ignore it.
I have banks of memory into which the tags and their values are dumped into.
My problem no is, how do I synchronise the memory operations?
When my XML reader wants to publish some tag/value, its should get a unused block of memory and "lock" it so that on its unavailable for editing. Once all the "Processing Objects" are done with the memory, they should be able to "unlock" for further use.
How can I achieve this? please help.

have you checked out boost shared memory? It has various synchronization mechanisms and examples...
The synchronization mechanisms outlined in the interprocess library are specifically useful if you want to put the mutexes in the shared memory block itself.

I assume your main task is not to learn/develop the synchronization mechanism.
You should reuse existing components available, and there are many. This is very good http://www.rabbitmq.com/getstarted.html
It will support multiple (including distributed/networking) models. Although there might initial learning period, but once integrated you can continue using it for your feature extensions, and focus on the meat of problem instead.

Related

asio, shared data, Active Object vs mutexes

I want to understand what is true-asio way to use shared data?
reading the asio and the beast examples, the only example of using shared data is http_crawl.cpp. (perhaps I missed something)
in that example the shared object is only used to collect statistics for sessions, that is the sessions do not read that object's data.
as a result I have three questions:
Is it implied that interaction with shared data in asio-style is an Active Object? i.e. should mutexes be avoided?
whether the statement will be correct that for reading the shared data it is also necessary to use "requests" to Active Object, and also no mutexes?
has anyone tried to evaluate the overhead of "requests" to Active Object, compared to using mutexes?
Is it implied that interaction with shared data in asio-style is an Active Object? i.e. should mutexes be avoided?
Starting at the end, yes mutexes should be avoided. This is because all service handlers (initiations and completions) will be executed on the service thread(s) which means that blocking in a handler will block all other handlers.
Whether that leads to Active Object seems to be a choice to me. Yes, a typical approach would be like Active Object (see e.g. boost::asio and Active Object), where operations queue for the data.
However, other approaches are viable and frequently seen, like e.g. the data being moving with their task(s) e.g. through a task flow.
whether the statement will be correct that for reading the shared data it is also necessary to use "requests" to Active Object, and also no mutexes?
Yes, synchronization needs to happen for shared state, regardless of the design pattern chosen (although some design pattern reduce sharing alltogether).
The Asio approach is using strands, which abstract away the scheduling from the control flow. This gives the service the option to optimize for various cases (e.g. continuation on the same strand, the case where there's only one service thread anyway etc.).
has anyone tried to evaluate the overhead of "requests" to Active Object, compared to using mutexes?
Lots of people and lots of times. Often are wary of trying Asio because "it uses locking internally". If you know what you're doing, throughput can be excellent, which goes for most patterns and industrial-strength frameworks.
Specific benchmarks depend heavily on specific implementation choices. I'm pretty sure you can find examples on github, blogs and perhaps even on this site.
(perhaps I missed something)
You're missing the fact that all IO objects are not thread-safe, which means that they themselves are shared data for any composed asynchronous operation (chain)

Confusion regarding multiprocessing.pool memory usage in Python

I've been reading up on Python's "multiprocessing", specifically the "Pool" stuff. I'm familiar with threading but not the approach used here. If I were to pass a very large collection (say a dictionary of some sort) to the process pool ("pool.map(myMethod, humungousDictionary)") are copies made of the dictionary in memory and than handed off to each process, or does there exist only the one dictionary? I'm concerned about memory usage. Thank you in advance.
The short answer is: No. Processes work in their own independent memory space, effectively duplicating your data.
If your dictionary is read only, and modifications will not be made, here are some options you could consider:
Save your data into a database. Each worker will read the data and work independently
Have a single process with a parent that spawns multiple workers using os.fork. Thus, all threads share the same context.
Use shared memory. Unix systems offer shared memory for interprocess communication. If there is a chance of racing, you will need semaphores as well.
You may also consider referring here for deeper insight on a possible solution.

JSON for Modern C++ thread safe?

I am using a library named "JSON for Modern C++" (https://github.com/nlohmann/json) which is pretty slick, letting me use JSON configuration files by a C++ program that are shared with a Javascript server side application. This library essentially creates another datatype that is accessed and manipulated in way that is very close to the same as a Javascript JSON objects.
My question is, do I need to be concerned about thread safety on JSON variable accesses and manipulations or can I trust the library is thread safe. I've looked in the documentation and I don't see it say it is thread safe but I also don't see anywhere that says it isn't thread safe.
Is anyone else using this library in a multithreaded environment? Did you need to protect it yourself or did the library protect itself. Maybe I'm really lucky and the repository author nlohmann will answer directly!
Any help is greatly appreciated!
nlohmann library is NOT thread safe. Take a look at the header file. It's a single one. There's no mutexes, locks or atomics or anything related to threads.
https://github.com/nlohmann/json/blob/develop/src/json.hpp
You are responsible for protecting against concurrency of multiple threads accessing this data.
Per the author in About thread safety #2366:
No, the container is like a map or a vector: you have to ensure thread safety yourself.

C++ Shared-memory necessary for queue of std::strings to pass through JNI?

I'm trying to understand what the mechanism is for getting a string from a c++ daemon I've written to Java for use by a UI. I'll post a picture of what I envision, then continue the question afterward:
There are two issues that I envision here:
1) The semaphore needs to be available to the library. In Windows, that could've been done with a named semaphore and access to it's handle. In Linux, I've been pointed toward using a semaphore in shared memory and making processes aware of it through a key to the shared memory. It's vague to me, but will that concept work to synchronize Java and the daemon?
2) Do I have to place the queue in shared memory in order to make the ??? link in the above chart work? Can and should the queue reside in the .so?
So those are my concerns. I'd love and welcome any and all help, challenges, and pleas for sanity and will do my best to provide all additionally necessary information. Thanks in advance.
You're running both applications in a separate process, in vanilla Linux this means you cannot communicate between these processes via memory directly. The Java VM is a process, and the C++ daemon is a process. It's in separate memory locations which are btw scrambled by the Memory Manager (MMU). So there is no way of getting memory access.
Google on "inner process communication" if you'd like. I prefer to run with socketpair for bi-directional parent-child communication.

share queue between parent and child process in c++

I know there are many way to handle inter-communication between two processes, but I'm still a bit confused how to deal with it. Is it possible to share queue (from standard library) between two processes in efficient way?
Thanks
I believe your confusion comes from not understanding the relationship between the memory address spaces of the parent and child process. The two address spaces are effectively unrelated. Yes, immediately after the fork() the two processes contain almost identical copies of memory, but you should think of them as copies. Any change one proces makes to memory in its address space has no impact on the other process's memory.
Any "plain old data structures" (such as provided by the C++ standard library) are purely abstractions of memory, so there is no way to use them to communicate between the two processes. To send data from one process to the other, you must use one of several system calls that provide interprocess communication.
But, note that shared memory is an exception to this. You can use system calls to set up a section of share memory, and then create data structures in the share memory. You'll still need to protect these data structures with a mutex, but the mutex will have to be shared-memory aware. With Posix threads, you'd use pthread_mutexattr_init with the PTHREAD_PROCESS_SHARED attribute.
Simple answer: Sharing an std::queue by two processes can be done but it is not trivial to do.
You can use shared memory to hold the queue together with some synchronization mechanism (usually a mutex). Note that not only the std::queue object must be constructed in the shared memory region, but also the contents of the queue, so you will have to provide your own allocator that manages the creation of memory in the shared region.
If you can, try to look at higher level libraries that might provide already packed solutions to your process communication needs. Consider Boost.Interprocess or search in your favorite search engine for interprocess communication.
I don't think there are any simple ways to share structures/objects like that between two projects. If you want to implement a queue/list/array/etc between two processes, you will need to implement some kind of communication between the processes to manage the queues and to retrieve and store entries.
For example, you could implement the queue management in one process and implement some kind of IPC (shared memory, sockets, pipes, etc.) to hand off entries from one process to the other.
There may be other methods outside of the standard C++ libraries that will do this for you. For example, there are likely Boost libraries that already implement this.