Transfering data between threads in C++ and Fortran

Transfering data between threads in C++ and Fortran - c++

I need to move large amounts (~10^6 floats) between multiple c++ threads and a fortran thread. At the moment we use windows shared memory to move very small piece of data, mainly for communication, and then save the file to a proprietary format to move the data. I've been asked to look at moving the bulk of the data via shared memory, but looking at the shared memory techniques in windows (seemingly a character buffer) this looks like a mess. Another possibility is boost's interprocess communication, but not sure how to use that from fortran, or if it's a good idea. Another idea was to use a database like sqlite.
I'm just wondering if anyone had any experience or would like to comment, as this is a little over my head at the moment.
Thanks very much
Jim

Use pipes. If you can inherit handles between processes, you can use anonym pipes, when not, you have to use named pipes. Also, threads share the address space, so you're probably thinking of processes when you say threads.

Related

best way to share data between c codes

I have 3 C code running on a RPI. They all start at boot and do some stuff (e.g. reading some data and driving an LCD)
I have implemented the codes separately but now I need to share a 30 byte buffer to be shared between them.
what is your advice to do this?
program1.c<-----------> program2.c<-----------> program3.c
buff[30] <-----------> buff[30] <-----------> buff[30]

You can use shared memory IPC, which simply allows you to access the same physical memory from multiple cooperating processes. I say cooperating because they need to be careful about synchronization, or reads may observe partially written data etc. Here's one tutorial: http://www.raspberry-projects.com/pi/programming-in-c/memory/shared-memory

You basically want to implement an IPC between those processes. Personally, I would like to go with FIFO ( a named pipe ) because they implement the queue structure for you. So, you can focus more on interpreting the data and less on sync problems.
this might help you.

Fastest technique to pass messages between processes on Linux?

What is the fastest technology to send messages between C++ application processes, on Linux? I am vaguely aware that the following techniques are on the table:
TCP
UDP
Sockets
Pipes
Named pipes
Memory-mapped files
are there any more ways and what is the fastest?

Whilst all the above answers are very good, I think we'd have to discuss what is "fastest" [and does it have to be "fastest" or just "fast enough for "?]
For LARGE messages, there is no doubt that shared memory is a very good technique, and very useful in many ways.
However, if the messages are small, there are drawbacks of having to come up with your own message-passing protocol and method of informing the other process that there is a message.
Pipes and named pipes are much easier to use in this case - they behave pretty much like a file, you just write data at the sending side, and read the data at the receiving side. If the sender writes something, the receiver side automatically wakes up. If the pipe is full, the sending side gets blocked. If there is no more data from the sender, the receiving side is automatically blocked. Which means that this can be implemented in fairly few lines of code with a pretty good guarantee that it will work at all times, every time.
Shared memory on the other hand relies on some other mechanism to inform the other thread that "you have a packet of data to process". Yes, it's very fast if you have LARGE packets of data to copy - but I would be surprised if there is a huge difference to a pipe, really. Main benefit would be that the other side doesn't have to copy the data out of the shared memory - but it also relies on there being enough memory to hold all "in flight" messages, or the sender having the ability to hold back things.
I'm not saying "don't use shared memory", I'm just saying that there is no such thing as "one solution that solves all problems 'best'".
To clarify: I would start by implementing a simple method using a pipe or named pipe [depending on which suits the purposes], and measure the performance of that. If a significant time is spent actually copying the data, then I would consider using other methods.
Of course, another consideration should be "are we ever going to use two separate machines [or two virtual machines on the same system] to solve this problem. In which case, a network solution is a better choice - even if it's not THE fastest, I've run a local TCP stack on my machines at work for benchmark purposes and got some 20-30Gbit/s (2-3GB/s) with sustained traffic. A raw memcpy within the same process gets around 50-100GBit/s (5-10GB/s) (unless the block size is REALLY tiny and fits in the L1 cache). I haven't measured a standard pipe, but I expect that's somewhere roughly in the middle of those two numbers. [This is numbers that are about right for a number of different medium-sized fairly modern PC's - obviously, on a ARM, MIPS or other embedded style controller, expect a lower number for all of these methods]

I would suggest looking at this also: How to use shared memory with Linux in C.
Basically, I'd drop network protocols such as TCP and UDP when doing IPC on a single machine. These have packeting overhead and are bound to even more resources (e.g. ports, loopback interface).

NetOS Systems Research Group from Cambridge University, UK has done some (open-source) IPC benchmarks.
Source code is located at https://github.com/avsm/ipc-bench .
Project page: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/ .
Results: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/results.html
This research has been published using the results above: http://anil.recoil.org/papers/drafts/2012-usenix-ipc-draft1.pdf

Check CMA and kdbus:
https://lwn.net/Articles/466304/
I think the fastest stuff these days are based on AIO.
http://www.kegel.com/c10k.html

As you tagged this question with C++, I'd recommend Boost.Interprocess:
Shared memory is the fastest interprocess communication mechanism. The
operating system maps a memory segment in the address space of several
processes, so that several processes can read and write in that memory
segment without calling operating system functions. However, we need
some kind of synchronization between processes that read and write
shared memory.
Source
One caveat I've found is the portability limitations for synchronization primitives. Nor OS X, nor Windows have a native implementation for interprocess condition variables, for example,
and so it emulates them with spin locks.
Now if you use a *nix which supports POSIX process shared primitives, there will be no problems.
Shared memory with synchronization is a good approach when considerable data is involved.

Well, you could simply have a shared memory segment between your processes, using the linux shared memory aka SHM.
It's quite easy to use, look at the link for some examples.

posix message queues are pretty fast but they have some limitations

Interaction of two c/c++ programs

I'm in complete lack of understanding in this. Maybe this is too broad for stack, but here it goes:
Suppose I have two programs (written in C/C++) running simultaneously, say A and B, with different PIDs.
What are the options to make then interact with each other. For instance, how do I pass information from one to another like having one being able to wait for a signal from the other, and respond accordingly.
I know MPI, but MPI normally works for programs that are compiled using the same source (so, it works more for parallel computing than just interaction from completely different programs built to interact with each other).
Thanks

You must lookout for "IPC" (inter process communication). There are several types:
pipes
signals
shared memory
message queues
semaphores
files (per suggestion of #JonathanLeffler :-)
RPC (suggested by #sftrabbit)
Which is usually more geared towards Client/Server
CORBA
D-Bus

You use one of the many interprocess communication mechanisms, like pipes (one applications writes bytes into a pipe, the other reads from it. Imagine stdin/stdout.) or shared memory (a region of memory is mapped into both programs virtual address space and they can communicate through it).

The same source doesn't matter - once your programs are compiled the system doesn't know or care where they came from.
There are different ways to communicate between them depending on how much data, how fast, one way or bidirectional, predicatable rate etc etc....
The simplest is possibly just to use the network - note that if you are on the same machine the network stack will automatically use some higher performance system to actually send the data (ie shared memory)

Sharing data locally (like with sockets) between multiple programs in C++

My goal is to send/share data between multiple programs. These are the options I thought of:
I could use a file, but prefer to use my RAM because it's generally faster.
I could use a socket, but that would require a lot of address information which is unnecessary for local stuff. And ports too.
I could ask others about an efficient way to do this.
I chose the last one.
So, what would be an efficient way to send data from one program to another? It might use a buffer, for example, and write bytes to it and wait for the reciever to mark the first byte as 'read' (basically anything else than the byte written), then write again, but where would I put the buffer and how would I make it accessible for both programs? Or perhaps something else might work too?
I use linux.

What about fifos and pipes? if you are on a linux environment, this is the way to allow 2 programs to share data.

The fastest IPC for processes running on same host is a shared memory.
In short, several processes can access same memory segment.
See this tutorial.

You may want to take a look at Boost.Interprocess
Boost.Interprocess simplifies the use of common interprocess communication and synchronization mechanisms and offers a wide range of them:
Shared memory.
Memory-mapped files.
Semaphores, mutexes, condition variables and upgradable mutex types to place them in shared
memory and memory mapped files.
Named versions of those synchronization objects, similar to UNIX/Windows sem_open/CreateSemaphore API.
File locking.
Relative pointers.
Message queues.

To answer your questions:
Using a file is probably not the best way, and files are usually not used for passing inner-process information. Remember the os has to open, read, write, close them. They are however used for locking (http://en.wikipedia.org/wiki/File_locking).
The highest performance you get using pipestream (http://linux.die.net/man/3/popen), but in Linux it's hard to get right. You have to redirect the stdin, stdout, and stderr. This has to be done for each inner-process. So it will work well for two applications but go beyond that and it gets very hairy.
My favorite solution, use socketpairs (http://pubs.opengroup.org/onlinepubs/009604499/functions/socketpair.html). These are very robust and easy to setup. But if you use multiple applications you have to prepare some sort of pool where to access the applications.

On Linux, when using files, they are very often in cache, so you won't read the disk that often, and you could use a "RAM" filesystem like tmpfs (actually tmpfs use virtual memory, so RAM + swap, and practically the files are kept in RAM most of the time).
The main issue remains synchronization.
Using sockets (which may be, if all processes are on the same machine, AF_UNIX sockets which are faster than TCP/IP ones) has the advantage of making our code easily portable to environments where you prefer to run several processes on several machines.
And you could also use an existing framework for parallel execution, like e.g. MPI, Corba, etc etc.
You should have a gross idea of the bandwidth and latency expected from your application.
(it is not the same if you need to share dozens of megabytes every millisecond, or hundreds of kilobytes every tenths of seconds).
I would suggest learning more about serialization techniques, formats and libraries like XDR, ASN1, JSON, YAML, s11n, jsoncpp etc.
And sending or sharing data is not the same. When you send (and recieve) data, you think in terms of message passing. When you share data you think in terms of a shared memory. Programming style is very different.

Shared memory is the best for sharing the data between the processes. But it needs lots of synchronization and if more than 2 processes are sharing the data then synchronization is like a Cyclops. (Single eye - Single shared memory).
But if you make use of sockets (multicast sockets), then implementation will be little difficult, but scalability and maintainability is very easy. You no need to bother how many apps will be waiting for the data, you can just multicast and they will listen to the data and process. No need to wait for the semaphore (shared memory synchronization technique) to read the data.
So reading the data time can be reduced.
Shared memory - Wait for the semaphore, read the data and process the data.
Sockets - Receive the data, process the data.
Performance, scalability and maintainability will be added advantages with the sockets.
Regards,
SSuman185

How to implement a shared buffer?

I've got one program which creates 3 worker programs. The preferable method of communication in my situation would be through a memory buffer which all four programs may access.
Is there a way to pass a pointer, reference or any kind of handler to the child processes?
Update
The three child programs are transforming vertex data while the main program primarily deals with UI, system messages, errors, etc..
I'm hoping there is some way to leverage OpenCL such that the four programs can share a context. If this is not possible, it would be nice to have access to the array of vertices across all programs.
I suppose our target platform is Windows right now but we'd like to keep it as cross-platform as possible. If there is no way to implement this utilizing OpenCL we'll probably fall back to wrapping this piece of code for a handful of different platforms.

Your question is platform dependent, therefore :
for Windows : Named Shared Memory
for linux : mmap or POSIX shared memory access
general case : boost::interprocess

If you explain a bit what kind of data is shared and other constraints/goal of the system it would be easier to answer your question.
I wonder why you think a shared buffer would be good? Is that because you want to pass a pointer in the buffer to the data to be worked on? Then you need shared memory if you want to work across processes.
What about a client-server approach where you send data to clients on request?
More information about your problem helps giving a better answer.

You should use Named Shared Memory and inter-process synchronization.

This is somewhat wider than the original question on shared memory buffers, but depending on your design, volume of data and performance requirements you could look into in-memory databases such as Redis or distributed caches, especially if you find yourself in 'publish-subscribe' situation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js