unix socket vs shared memory message which is faster - c++

I am looking at a linux server program which, for each client, creates some shared memory and uses message queues (a C++ class called from the code) in that shared memory to send messages to and fro. On the face of it this sounds like the same usage pattern as domain sockets - i.e. have a server program that sends and recvs payloads from its clients.
My question is - what extra work do unix domain sockets do? What could conceivably cause shared memory with a message queue to be faster than a socket and vice versa?
My guess is there is some overhead to calling send and recv, but I'm not exactly sure what. I might try and benchmark this, just looking for some insight before I do this.

Here is one discussion:
UNIX Domain sockets vs Shared Memory (Mapped File)
I can add that sockets are very primitive, just a stream of bytes for stream sockets. This may actually be an advantage - it tends to make messages between different subsystems small and simple, promoting lean interfaces and loose coupling. But sometimes, shared memory is really useful. I used shared memory in a C++ Linux back-end to a data-intensive Google Maps application - the database was just huge (+1 Gigabyte) png rasters in shared memory.

Related

Can libraries replace local socketing in C/C++?

I'm trying to develop a specific DB server in C++, and I have two questions:
Is it possible to have a dynamic library take care of the communication between client programs instead of using sockets? This way, serializing is avoided and all querying can be made using native C/C++ library calls, while the server listens to the library for incoming requests
Does any known database work like that, and if yes what are the pros and cons of such an approach?
As far as I can see, having native calls to the DB server through the library removes overhead from serializing and socket system calls (even though it adds calls to a dynamic library). Also, I'm not sure how memory can be shared with libraries, but if it can then it could be very beneficial to "almost" share memory with the server as a client.
(I am focusing on Linux and POSIX, but the principles would be the same on other OSes like Windows, Android, MacOSX)
The communication between a database client and the database server is very likely to happen on socket(7)s or some similar byte stream, like pipe(7)s or fifo(7)s. Using shared memory (shm_overview(7)...) for that communication is unusual, and you still need some synchronization mechanism (e.g. semaphores sem_overview(7)...).
There are some libraries (above sockets) to facilitate such communications, e.g.
0mq.
Some database libraries exist that work without communicating to some database sever, in particular sqlite, which manage the database storage directly (in your client process). You might have some issues if several processes are accessing the same database concurrently (so ACID properties might not be guaranteed, at least if using sqlite without care).
Notice that local inter-process communications are quite efficient on Linux. It is not unusual to have a bandwidth of several hundreds of megabytes per second on a local pipe (use rather large buffers, e.g. of 64 Kbytes or a megabyte, for read(2) & write(2)...)
In practice, in a database, indexing and disk access are more likely to be the bottleneck than client <-> server communication, at least on the same local host. If the server is a remote host, network communication is probably the bottleneck (at least on common gigabit/sec ethernet).
Read also this, in particular the table in Answers section.
Perhaps gdbm, redis, mongodb, postgresql might be relevant for your issues.
Yes, if you DB clients are on the same machine that your DB server is on, they could communicate directly using techniques like shared memory IPC. However, this is typically not useful, because:
A database with all its clients on a single machine is rare.
A database with even one client on the same machine other than an administrative interface is not typical.
Systems like Linux already have optimizations built in for localhost socket communication, so it doesn't go via the network at all--only through the kernel.
A database whose performance is limited by socket IPC due to syscalls could easily overcome this by simply using a third-party kernel bypass solution for network communication, which does not require any special code at all--just plug in a kernel-bypass TCP stack--you can do this with many existing databases.

What's the most efficient inter-process communication method for a high-bandwidth data stream on a Mac?

I have a C++ program (running under MacOS/X) that generates a high-bandwidth stream of data (about 27 megabytes per second). A second C++ program receives that data and processes it in (soft) real time. Low latency and high reliability are both goals for this system. Due to circumstances beyond my control, the two processes need to be kept separate -- that is, I can't convert them into two threads within the same process.
Currently I'm using UDP packets (sent by process A to a UDP port on 127.0.0.1 that process B is listening on) to implement this data transfer, and that more-or-less-kind-of-works (modulo the occasional dropped packet), but I'm wondering if there isn't a more efficient/appropriate mechanism for this use case. Would a Unix pipe() be significantly more efficient or reliable? Or should I write the data to a mmap()'d shared memory region, and use a pipe/socket/semaphore/etc to synchronize the two processes' writes and reads? Or is UDP-over-the-loopback-device already efficient enough that there is little benefit to be gained by switching over to another method?
You certainly can't beat shared memory if you can manage the synchronization. Single copy in memory, no other movement. Your only "slow" point will be any fighting over who can do what, and where.

c++ : Pthreads or Linux processes

Let us suppose I have different data structures in c++ on Linux
Data1, Data2, Data3, Data4 and many more
Afterwards, I make use of a network trace file (wireshark file). Send each packet to all these above data files. If anyone of them sets a flag for the packet. I want all the other data files to stop processing on that packet and move to the next packet in that network trace file.
In my scenario, which one will be better to use :
Pthreads or Linux processes (fork...)
Processes have individual address spaces where each heave a separate heap, stack and code laying inside. Loading processes require OS to create and manage memory resources.Data transferring through one to another require OS support, Inter Process Communication technologies such as Shared Memory or Pipes in case of Linux. Also each time accessing data that is protected by a shared semaphore, will require system calls. That will reduce your speed highly. Processes are protected from others by OS. If one process works right than the chance of another to break it is hard. Processes create a sandbox where you code is secured from others.
Thread's are more light weight. Creating and Deleting takes less time and afford. Doesn't have separate address space (page tables). Easy to share data one to another. Doesn't require OS support for that. But Threads are more vulnerable the mistakes of other threads. And still for the shared data you need concurrency tools such as semaphores or mutexes.
A small example of this is most browsers use threads to manage tab's. But when one fail mostly all application crash. But Chrome runs each tabs and extensions as different processes; If one crashes you still have others without major problem.
Go with threads if you are not sure. They will satisfy your needs stated in the question without problem.

Boost: is there an interprocess::message_queue-like mechanism for thread-only communication?

The boost::interprocess::message_queue mechanism seems primarily designed for just that: interprocess communication.
The problem is that it serializes the objects in the message:
"A message queue just copies raw bytes between processes and does not send objects."
This makes it completely unsuitable for fast and repeated interthread communication with large composite objects being passed.
I want to create a message with a ref/shared_ptr/pointer to a known and previously-created object and safely pass it from one thread to the next.
You CAN use asio::io_service and post with bind completions, but that's rather klunky AND requires that the thread in question be using asio, which seems a bit odd.
I've already written my own, sadly based on asio::io_service, but would prefer to switch over to a boost-supported general mechansim.
You need a mechanism, that designed for interprocess communication because separate processes has separate address space and you cannot simply pass pointers except very spacial cases. For thread communication you can use standard containers like std::stack, std::queue and std::priority_queue to communicate between threads, you just need to provide proper synchronization through mutexes. Or you can use lock-free containers, which also provided by boost. What else would you need for interthread communication?
Whilst I'm no expert in Boost per se, there is a fundamental difficulty in communicating between processes and threads via a pipe, message queue, etc, especially if it is assumed that a program's data is classes containing dynamically allocated memory (which is pretty much the case for things written with Boost; a string is not a simple object like it is in C...).
Copying of Data in Classes
Message queues and pipes are indeed just a way of passing a collection of bytes from one thread/process to another thread/process. Generally when you use them you're looking for the destination thread to end up with a copy of the original data, not just a copy of the references to the data (which would be pointing back at the original data).
With a simple C struct containing no pointers at all it's easy; a copy of the struct contains all the data, no problem. But a C++ class with complex data types like strings is now a structure containing references / pointers to allocated memory. Copy that structure and you haven't actually copied the data in the allocated memory.
That's where serialisation comes in. For interprocess communications where both processes can't ordinarily share the same memory serialisation serves as a way of parcelling up the structure to be sent plus all the data it refers to into a stream of bytes that can be unpacked at the other end. For threads it's no different if you don't want the two threads accessing the same memory at the same time. Serialisation is a convenient way of saving yourself having to navigating through a class to see exactly what needs to be copied.
Efficiency
I don't know what Boost uses for serialisation, but clearly serialising to XML would be painfully inefficient. A binary serialisation like ASN.1 BER would be much faster.
Also, copying data through pipes, message queues is no longer as inefficient as it used to be. Traditionally programmers don't do it because of the perceived waste of time spent copying the data repeatedly just to share it with another thread. With a single core machine that involves a lot of slow and wasteful memory accesses.
However, if one considers what "memory access" is in these days of QPI, Hypertransport, and so forth, it's not so very different to just copying the data in the first place. In both cases it involves data being sent over a serial bus from one core's memory controller to another core's cache.
Today's CPUs are really NUMA machines with memory access protocols layered on top of serial networks to fake an SMP environment. Programming in the style of copying messages through pipes, message queues, etc. is definitely edging towards saying that one is content with the idea of NUMA, and that really you don't need SMP at all.
Also, if you do all your inter-thread communications as message queues, they're not so very different to pipes, and pipes aren't so different to network sockets (at least that's the case on Not-Windows). So if you write your code carefully you can end up with a program that can be redeployed across a distributed network of computers or across a number of threads within a single process. That's a nice way of getting scalability because you're not changing the shape or feel of your program in any significant way when you scale up.
Fringe Benefits
Depending on the serialisation technology used there can be some fringe benefits. With ASN.1 you specify a message schema in which you set out the valid ranges of the message's contents. You can say, for example, that a message contains an integer, and it can have values between 0 and 10. The encoders and decoders generated by decent ASN.1 tools will automatically check that the data you're sending or receiving meets that constraint, and returns errors if not.
I would be surprised if other serialisers like Google Protocol Buffers didn't do a similar constraints check for you.
The benefit is that if you have a bug in your program and you try and send an out of spec message, the serialiser will automatically spot that for you. That can save a ton of time in debugging. Also it is something you definitely don't get if you share a memory buffer and protect it with a semaphore instead of using a message queue.
CSP
Communicating Sequential Processes and the Actor model are based on sending copies of data through message queues, pipes, etc. just like you're doing. CSP in particular is worth paying attention to because it's a good way of avoiding a lot of the pitfalls of multi-threaded software that can lurk undetected in source code.
There are some CSP implementations you can just use. There's JCSP, a class library for Java, and C++CSP, built on top of Boost to do CSP for C++. They're both from the University of Kent.
C++CSP looks quite interesting. It has a template class called csp::mobile, which is kind of like a Boost smart pointer. If you send one of these from one thread to another via a channel (CSP's word for a message queue) you're sending the reference, not the data. However, the template records which thread 'owns' the data. So a thread receiving a mobile now owns the data (which hasn't actually moved), and the thread that sent it can no longer access it. So you get the benefits of CSP without the overhead of copying the data.
It also looks like C++CSP is able to do channels over TCP; that's a very attractive feature, up scaling is a really simple possibility. JCSP works over network connections too.

Fast Cross Platform Inter Process Communication in C++

I'm looking for a way to get two programs to efficiently transmit a large amount of data to each other, which needs to work on Linux and Windows, in C++. The context here is a P2P network program that acts as a node on the network and runs continuously, and other applications (which could be games hence the need for a fast solution) will use this to communicate with other nodes in the network. If there's a better solution for this I would be interested.
boost::asio is a cross platform library handling asynchronous io over sockets. You can combine this with using for instance Google Protocol Buffers for your actual messages.
Boost also provides you with boost::interprocess for interprocess communication on the same machine, but asio lets you do your communication asynchronously and you can easily have the same handlers for both local and remote connections.
I have been using ICE by ZeroC (www.zeroc.com), and it has been fantastic. Super easy to use, and it's not only cross platform, but has support for many languages as well (python, java, etc) and even an embedded version of the library.
Well, if we can assume the two processes are running on the same machine, then the fastest way for them to transfer large quantities of data back and forth is by keeping the data inside a shared memory region; with that setup, the data is never copied at all, since both processes can access it directly. (If you wanted to go even further, you could combine the two programs into one program, with each former 'process' now running as a thread inside the same process space instead. In that case they would be automatically sharing 100% of their memory with each other)
Of course, just having a shared memory area isn't sufficient in most cases: you would also need some sort of synchronization mechanism so that the processes can read and update the shared data safely, without tripping over each other. The way I would do that would be to create two double-ended queues in the shared memory region (one for each process to send with). Either use a lockless FIFO-queue class, or give each double-ended queue a semaphore/mutex that you can use to serialize pushing data items into the queue and popping data items out of the queue. (Note that the data items you'd be putting into the queues would only be pointers to the actual data buffers, not the data itself... otherwise you'd be back to copying large amounts of data around, which you want to avoid. It's a good idea to use shared_ptrs instead of plain C pointers, so that "old" data will be automatically freed when the receiving process is done using it). Once you have that, the only other thing you'd need is a way for process A to notify process B when it has just put an item into the queue for B to receive (and vice versa)... I typically do that by writing a byte into a pipe that the other process is select()-ing on, to cause the other process to wake up and check its queue, but there are other ways to do it as well.
This is a hard problem.
The bottleneck is the internet, and that your clients might be on NAT.
If you are not talking internet, or if you explicitly don't have clients behind carrier grade evil NATs, you need to say.
Because it boils down to: use TCP. Suck it up.
I would strongly suggest Protocol Buffers on top of TCP or UDP sockets.
So, while the other answers cover part of the problem (socket libraries), they're not telling you about the NAT issue. Rather than have your users tinker with their routers, it's better to use some techniques that should get you through a vaguely sane router with no extra configuration. You need to use all of these to get the best compatibility.
First, ICE library here is a NAT traversal technique that works with STUN and/or TURN servers out in the network. You may have to provide some infrastructure for this to work, although there are some public STUN servers.
Second, use both UPnP and NAT-PMP. One library here, for example.
Third, use IPv6. Teredo, which is one way of running IPv6 over IPv4, often works when none of the above do, and who knows, your users may have working IPv6 by some other means. Very little code to implement this, and increasingly important. I find about half of Bittorrent data arrives over IPv6, for example.