Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Over the last couple of months I've been working on some implementations of sockets servers in C++ and Java. I wrote a small server in Java that would handle & process input from a flash application hosted on a website and I managed to successfully write a server that handles input from a 2D game client with multiple players in C++. I used TCP in one project & UDP in the other one. Now, I do have some questions that I couldn't really find on the net and I hope that some of the experts could help me. :)
Let's say I would like to build a server in C++ that would handle the input from thousands of standalone and/or web applications, how should I design my server then? So far, I usually create a new & unique thread for each user that connects, but I doubt this is the way to go.
Also, How does one determine the layout of packets sent over the network; is data usually sent over the network in a binary or text state? How do you handle serializated objects when you send data to different media (eg C++ server to flash application)?
And last, is there any easy to use library which is commonly used that supports portability (eg development on a windows machine & deployment on a linux box) other than boost asio.
Thank you.
Sounds like you have a couple of questions here. I'll do my best to answer what I can see.
1. How should I handle threading in my network server?
I would take a good look at what kind of work you're doing on the worker threads that are being spawned by your server. Spawning a new thread for each request isn't a good idea...but it might not hurt anything if the number of parallel requests is small and and tasks performed on each thread are fast running.
If you really want to do things the right way, you could have a configurable/dynamic thread pool that would recycle the worker threads as they became free. That way you could set a max thread pool size. Your server would then work up to the pool size...and then make further requests wait until a worker thread was available.
2. How do I format the data in my packets?
Unless you're developing an entirely new protocol...this isn't something you really need to worry about. Unless you're dealing with streaming media (or another application where packet loss/corruption is acceptable), you probably won't be using UDP for this application. TCP/IP is probably going to be your best bet...and that will dictate the packet design for you.
3. Which format do I use for serialization?
The way you serialize your data over the wire depends on what kind of applications are going to be consuming your service. Binary serialization is usually faster and results in a smaller amount of data that needs to be transfered over the network. The downside to using binary serialization is that the binary serialization in one language may not work in another. Therefore the clients connecting to your server are, most likely, going to have to be written in the same language you are using.
XML Serialization is another option. It will take longer and have a larger amount of data to be transmitted over the network. The upside to using something like XML serialization is that you won't be limited to the types of clients that can connect to your server and consume your service.
You have to choose what fits your needs the best.
...play around with the different options and figure out what works best for you. Hopefully you'll find something that can perform faster and more reliably than anything I've mentioned here.
As far as server design concern, I would say that you are right: although ONE-THREAD-PER-SOCKET is a simple and easy approach, it is not the way to go since it won't scale as well as other server design patterns.
I personally like the COMMUNICATION-THREADS/WORKER-THREADS approach, where a pool of a dynamic number of worker threads handle all the work generated by producer threads.
In this model, you will have a number of threads in a pool waiting for tasks that are going to be generated from another set of threads handling network I/O.
I found UNIX Network Programming by Richard Stevens and amazing source for this kind on network programming approaches. And, despite its name, it will be very useful in windows environments as well.
Regarding the layout of the packets (you should have post a different question for this since it is a totally different question, in my opinion), there are tradeoffs when selecting TEXT vs BINARY approach.
TEXT (i.e. XML) is probably easier to parse and document, and more simple in general, while a BINARY protocol should give you better performance in terms of speed of processing and size of network packets, but you will have to deal with more complicated issues such as ENDIANNES of the words and stuff like that.
Hope it helps.
Though previous answers provide good direction, just for completeness, I'd like to point out that threads are not an absolute requirement for great socket server performance. Some examples are here. There are many approaches to scalability too - thread pools, pre-forked processes, server pools, etc.
1) And last, is there any easy to use library which is commonly used that supports portability (eg development on a windows machine & deployment on a linux box) other than boost asio.
The ACE library is another alternative. It's very mature (been around since the early 90s) and widely deployed. A brief discussion about how it compares to Boost ASIO is available on the Riverace website here. Keep in mind that ACE has had to support a large number of legacy platforms for long time so it doesn't utilize modern C++ features as much as Boost ASIO, for example.
2) Let's say I would like to build a server in C++ that would handle the input from thousands of standalone and/or web applications, how should I design my server then? So far, I usually create a new & unique thread for each user that connects, but I doubt this is the way to go.
There are a number of commonly used approaches including but not limited to: thread-per-connection (the approach you describe) and thread pool (the approach Justin described). Each have their pros and cons. Many have a looked at the trade-offs. A good starting point might be the links on the Thread Pool Pattern Wikipedia page.
Dan Kegel's "The C10K Problem" web page has lots of useful notes about improving scalability as well.
3) Also, How does one determine the layout of packets sent over the network; is data usually sent over the network in a binary or text state? How do you handle serializated objects when you send data to different media (eg C++ server to flash application)?
I agree with others that sending binary data is generally going to be most efficient. The boost serialization library can be used to marshal data into a binary form (as well as text). Mature binary formats include XDR and CDR. CDR is the format used by CORBA, for instance. The company ZeroC defines the ICE encoding, which is supposed to be much more efficient than CDR.
There are lots of binary formats to choose from. My suggestion would be to avoid reinventing the wheel by at least reading about some of these binary formats so that you don't end up running into the same pitfalls these existing binary formats were designed to address.
That said, lots of middleware exists that already provides a canned solution for most of your needs. For example, OpenSplice and OpenDDS are both implementations of the OMG Data Distribution Service standard. DDS focuses on efficient distribution of data such as through a publish-subscribe model, rather than remote invocation of functions. I'm more familiar with the OMG defined technologies but I'm sure there are other middleware implementations that will fit your needs.
you're still going to need a socket to handle every client, but the idea would be to create a pool of X sockets (say 50) and then, when you get close (say 90%) to consuming all those sockets, create another pool of X sockets. At some point, after clients have connected, sent data and disconnected, some of your sockets will be available for use and you can use them (google socket pools for this info)
The layout of data is always difficult. If all your clients and servers will be using the same hardware and operating system, you can send data in binary format, but there are many trips and traps there (byte alignment is at the top of the list). sending formatted text is always easier, but certainly more expensive in terms of bandwidth and processing power because you have to change format from machine to text before sending and, of course, back again at the receiver.
re: serialized, I'm sorry, I can't help you, nor with libraries (I'm too embedded to have used much of these)
About server sockets and serialization(marshaling). The most important problem is growing sockets number is readable and writable state in select. I am not about limitation in the FD_SET. This is solvable simply. I am about growth of time of signaling and problem data accumulation in not read sockets while processing data available in evaluated socket. So the solution may be even out of SW boundaries and require multiple processor model,when roles of processors are limited: one reads and writes, N are processing. In this case all available socket data should has been read when select returned and sent to another processing units.
The same is about incoming data.
About marshaling. Of coarse a binary format is preferable because performance.By the way XML in the terms of UNICODE has the same problem. But,... comrades, it is not simply copying long or integer value into a socket stream. But in this case even htons, htonl could help (it sends/receives in NW format and OS is responsible for data convert). But it is safe more sending data following representation header, where exposed format of most/least significant bits placed, bytes order and IEEE data type. This works, I had not a case when not.
Kind regards and great success for everyone.
Simon Cantor
Related
I am learning about servers and data distribution. Much of what I have read from various sources (here is just one) talks about how market data is distributed over UDP to take advantage of multicasting. Indeed, in this video at this point about building a trading exchange, the presenter mentions how TCP is not the optimal choice to distribute data because it means having to "loop over" every client then send the data to each in turn, meaning that the "first in the list" of clients has a possibly unfair advantage.
I was very surprised then when I learned that I could connect to the Binance feed of market data using a websocket connection, which is TCP, using a command such as
websocat_linux64 wss://stream.binance.com:9443/ws/btcusdt#trade --protocol ws
Many other sources mention Websockets, so they certainly seem to be a common method of delivering market data, indeed this states
"Cryptocurrency trading applications often have real-time market data
streamed to trader front-ends via websockets"
I am confused. If Binance distributes over TCP, is "fairness" really a problem as the YouTube video seems to suggest?
So, overall, my main question is that if I want to distribute data (of any kind generally, but we can keep the market data theme if it helps) to multiple clients (possibly thousands) over the internet, should I use UDP or TCP, and is there any specific technique that could be employed to ensure "fairness" if that is relevant?
I've added the C++ tag as I would use C++, lots of high performance servers are written in C++, and I feel there's a good chance that someone will have done something similar and/or accessed the Binance feeds using C++.
The argument on fairness due to looping, in code, is ridiculous.
The whole field of trading where decisions need to be made quickly, where you need to use new information before someone else does is called: low-latency trading.
This tells you what's important: reducing the latency to a minimum. This is why UDP is used over TCP. TCP has flow control, re-sends data and buffers traffic to deliver it in order. This would make it terrible for low-latency trading.
WebSockets, in addition to being built on top of TCP are heavier and slower simply due to the extra amount of data (and needed processing to read/write it).
So even though the looping would be a tiny marginal latency cost, there's plenty of other reasons to pick UDP over TCP and even more over WebSockets.
So why does Binance does it? Their market is not institutional traders with hardware located at the exchanges. It's for traders that are willing to accept some latency. If you don't trade to the millisecond, then some extra latency is acceptable. It makes it much easier to integrate different piece of software together. It also makes fairness, in latency, not so important. If Alice is 0.253 seconds away and Bob is 0.416 seconds away, does it make any difference who I tell first (by a few microseconds)? Probably not.
I have a c# dll that needs to send a quite complex object over the network to an unmanaged c++ process. I'm aware that there are a number of ways to do this but was wondering if anyone can recommend the best fit. Important points to note:
It is critical that the c# process is informed if data is not received by the c++ process. Preferably I'd like as reliable a method of transferring the data as possible.
Unfortunately I'm also under pressure to deliver the packets as quickly as possible so ultimately some trade off will be required between reliability and performance.
The c# object that represents the data is likely to change. This is a third party object which we have no control over. Ideally I'd require some dynamic mechanism in the c++ process to handle these changes with as minimal impact as possible. Worst case scenario would be if we have to recompile the code everytime the object data changes.
Id rather avoid using third party libraries if possible.
Any help would be much appreciated.
As far as I understand about your question, you might mean LAN or WLAN by "network". so, I suggest to use UDP/IP on both sides. and for "It is critical that the c# process is informed if data is not received by the c++ process", you need to handle it manually (e.g. via an acknowledge packet per receiving a packet or bunch of packets).
there are a lot of samples available over internet for UDP/IP unde 'socket programming':
like one here: http://www.abc.se/~m6695/udp.html
What is the fastest technology to send messages between C++ application processes, on Linux? I am vaguely aware that the following techniques are on the table:
TCP
UDP
Sockets
Pipes
Named pipes
Memory-mapped files
are there any more ways and what is the fastest?
Whilst all the above answers are very good, I think we'd have to discuss what is "fastest" [and does it have to be "fastest" or just "fast enough for "?]
For LARGE messages, there is no doubt that shared memory is a very good technique, and very useful in many ways.
However, if the messages are small, there are drawbacks of having to come up with your own message-passing protocol and method of informing the other process that there is a message.
Pipes and named pipes are much easier to use in this case - they behave pretty much like a file, you just write data at the sending side, and read the data at the receiving side. If the sender writes something, the receiver side automatically wakes up. If the pipe is full, the sending side gets blocked. If there is no more data from the sender, the receiving side is automatically blocked. Which means that this can be implemented in fairly few lines of code with a pretty good guarantee that it will work at all times, every time.
Shared memory on the other hand relies on some other mechanism to inform the other thread that "you have a packet of data to process". Yes, it's very fast if you have LARGE packets of data to copy - but I would be surprised if there is a huge difference to a pipe, really. Main benefit would be that the other side doesn't have to copy the data out of the shared memory - but it also relies on there being enough memory to hold all "in flight" messages, or the sender having the ability to hold back things.
I'm not saying "don't use shared memory", I'm just saying that there is no such thing as "one solution that solves all problems 'best'".
To clarify: I would start by implementing a simple method using a pipe or named pipe [depending on which suits the purposes], and measure the performance of that. If a significant time is spent actually copying the data, then I would consider using other methods.
Of course, another consideration should be "are we ever going to use two separate machines [or two virtual machines on the same system] to solve this problem. In which case, a network solution is a better choice - even if it's not THE fastest, I've run a local TCP stack on my machines at work for benchmark purposes and got some 20-30Gbit/s (2-3GB/s) with sustained traffic. A raw memcpy within the same process gets around 50-100GBit/s (5-10GB/s) (unless the block size is REALLY tiny and fits in the L1 cache). I haven't measured a standard pipe, but I expect that's somewhere roughly in the middle of those two numbers. [This is numbers that are about right for a number of different medium-sized fairly modern PC's - obviously, on a ARM, MIPS or other embedded style controller, expect a lower number for all of these methods]
I would suggest looking at this also: How to use shared memory with Linux in C.
Basically, I'd drop network protocols such as TCP and UDP when doing IPC on a single machine. These have packeting overhead and are bound to even more resources (e.g. ports, loopback interface).
NetOS Systems Research Group from Cambridge University, UK has done some (open-source) IPC benchmarks.
Source code is located at https://github.com/avsm/ipc-bench .
Project page: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/ .
Results: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/results.html
This research has been published using the results above: http://anil.recoil.org/papers/drafts/2012-usenix-ipc-draft1.pdf
Check CMA and kdbus:
https://lwn.net/Articles/466304/
I think the fastest stuff these days are based on AIO.
http://www.kegel.com/c10k.html
As you tagged this question with C++, I'd recommend Boost.Interprocess:
Shared memory is the fastest interprocess communication mechanism. The
operating system maps a memory segment in the address space of several
processes, so that several processes can read and write in that memory
segment without calling operating system functions. However, we need
some kind of synchronization between processes that read and write
shared memory.
Source
One caveat I've found is the portability limitations for synchronization primitives. Nor OS X, nor Windows have a native implementation for interprocess condition variables, for example,
and so it emulates them with spin locks.
Now if you use a *nix which supports POSIX process shared primitives, there will be no problems.
Shared memory with synchronization is a good approach when considerable data is involved.
Well, you could simply have a shared memory segment between your processes, using the linux shared memory aka SHM.
It's quite easy to use, look at the link for some examples.
posix message queues are pretty fast but they have some limitations
I'm looking for some data to help me decide which would be the better/faster for communication between two independent processes on Linux:
TCP
Named Pipes
Which is worse: the system overhead for the pipes or the tcp stack overhead?
Updated exact requirements:
only local IPC needed
will mostly be a lot of short messages
no cross-platform needed, only Linux
In the past I've used local domain sockets for that sort of thing. My library determined whether the other process was local to the system or remote and used TCP/IP for remote communication and local domain sockets for local communication. The nice thing about this technique is that local/remote connections are transparent to the rest of the application.
Local domain sockets use the same mechanism as pipes for communication and don't have the TCP/IP stack overhead.
I don't really think you should worry about the overhead (which will be ridiculously low). Did you make sure using profiling tools that the bottleneck of your application is likely to be TCP overhead?
Anyways as Carl Smotricz said, I would go with sockets because it will be really trivial to separate the applications in the future.
I discussed this in an answer to a previous post. I had to compare socket, pipe, and shared memory communications. Pipes were definitely faster than sockets (maybe by a factor of 2 if I recall correctly ... I can check those numbers when I return to work). But those measurements were just for the pure communication. If the communication is a very small part of the overall work, then the difference will be negligible between the two types of communication.
Edit
Here are some numbers from the test I did a few years ago. Your mileage may vary (particularly if I made stupid programming errors). In this specific test, a "client" and "server" on the same machine echoed 100 bytes of data back and forth. It made 10,000 requests. In the document I wrote up, I did not indicate the specs of the machine, so it is only the relative speeds that may be of any value. But for the curious, the times given here are the average cost per request:
TCP/IP: .067 ms
Pipe with I/O Completion Ports: .042 ms
Pipe with Overlapped I/O: .033 ms
Shared Memory with Named Semaphore: .011 ms
There will be more overhead using TCP - that will involve breaking the data up into packets, calculating checksums and handling acknowledgement, none of which is necessary when communicating between two processes on the same machine. Using a pipe will just copy the data into and out of a buffer.
I don't know if this suites you, but a very common way of IPC (interprocess communication) under linux is by using the shared memory. It's actually ultra fast (I didn't profiled this, but this is just shared data on RAM with strong processing around it).
The main problem around this approuch is the semaphore, you must build a little system around it so you must make sure a process is not writing at the same time the other one is trying to read.
A very simple starter tutorial is at here
This is not as portable as using sockets, but the concept would be the same, so if you're migrating this to Windows, you will just have to change the shared memory create/attach layer.
Two things to consider:
Connection setup cost
Continuous Communication cost
On TCP:
(1) more costly - 3way handshake overhead required for (potentially) unreliable channel.
(2) more costly - IP level overhead (checksum etc.), TCP overhead (sequence number, acknowledgement, checksum etc.) pretty much all of which aren't necessary on the same machine because the channel is supposed to be reliable and not introduce network related impairments (e.g. packet reordering).
But I would still go with TCP provided it makes sense (i.e. depends on the situation) because of its ubiquity (read: easy cross-platform support) and the overhead shouldn't be a problem in most cases (read: profile, don't do premature optimization).
Updated: if cross-platform support isn't required and the accent is on performance, then go with named/domain pipes as I am pretty sure the platform developers will have optimize-out the unnecessary functionality deemed required for handling network level impairments.
unix domain socket is a very goog compromise. Not the overhead of tcp, but more evolutive than the pipe solution. A point you did not consider is that socket are bidirectionnal, while named pipes are unidirectionnal.
I think the pipes will be a little lighter, but I'm just guessing.
But since pipes are a local thing, there's probably a lot less complicated code involved.
Other people might tell you to try and measure both to find out. It's hard to go wrong with this answer, but you may not be willing to invest the time. That would leave you hoping my guess is correct ;)
I'm working on a loosely coupled cluster for some data processing. The network code and processing code is in place, but we are evaluating different methodologies in our approach. Right now, as we should be, we are I/O bound on performance issues, and we're trying to decrease that bottleneck. Obviously, faster switches like Infiniband would be awesome, but we can't afford the luxury of just throwing out what we have and getting new equipment.
My question posed is this. All traditional and serious HPC applications done on clusters is typically implemented with message passing versus sending over sockets directly. What are the performance benefits to this? Should we see a speedup if we switched from sockets?
MPI MIGHT use sockets. But there are also MPI implementation to be used with SAN (System area network) that use direct distributed shared memory. That of course if you have the hardware for that. So MPI allows you to use such resources in the future. On that case you can gain massive performance improvements (on my experience with clusters back at university time, you can reach gains of a few orders of magnitude). So if you are writting code that can be ported to higher end clusters, using MPI is a very good idea.
Even discarding performance issues, using MPI can save you a lot of time, that you can use to improve performance of other parts of your system or simply save your sanity.
I would recommend using MPI instead of rolling your own, unless you are very good at that sort of thing. Having wrote some distributed computing-esque applications using my own protocols, I always find myself reproducing (and poorly reproducing) features found within MPI.
Performance wise I would not expect MPI to give you any tangible network speedups - it uses sockets just like you. MPI will however provide you with much the functionality you would need for managing many nodes, i.e. synchronisation between nodes.
Performance is not the only consideration in this case, even on high performance clusters. MPI offers a standard API, and is "portable." It is relatively trivial to switch an application between the different versions of MPI.
Most MPI implementations use sockets for TCP based communication. Odds are good that any given MPI implementation will be better optimized and provide faster message passing, than a home grown application using sockets directly.
In addition, should you ever get a chance to run your code on a cluster that has InfiniBand, the MPI layer will abstract any of those code changes. This is not a trivial advantage - coding an application to directly use OFED (or another IB Verbs) implementation is very difficult.
Most MPI applications include small test apps that can be used to verify the correctness of the networking setup independently of your application. This is a major advantage when it comes time to debug your application. The MPI standard includes the "pMPI" interfaces, for profiling MPI calls. This interface also allows you to easily add checksums, or other data verification to all the message passing routines.
Message Passing is a paradigm not a technology. In the most general installation, MPI will use sockets to communicate. You could see a speed up by switching to MPI, but only in so far as you haven't optimized your socket communication.
How is your application I/O bound? Is it bound on transferring the data blocks to the work nodes, or is it bound because of communication during computation?
If the answer is "because of communication" then the problem is you are writing a tightly-coupled application and trying to run it on a cluster designed for loosely coupled tasks. The only way to gain performance will be to get better hardware (faster switches, infiniband, etc. )... maybe you could borrow time on someone else's HPC?
If the answer is "data block" transfers then consider assigning workers multiple data blocks (so they stay busy longer) & compress the data blocks before transfer. This is a strategy that can help in a loosely coupled application.
MPI has the benefit that you can do collective communications. Doing broadcasts/reductions in O(log p) /* p is your number of processors*/ instead of O(p) is a big advantage.
I'll have to agree with OldMan and freespace. Unless you know of a specific and improvement to some useful metric (performance, maintainability, etc.) over MPI, why reinvent the wheel. MPI represents a large amount of shared knowledge regarding the problem you are trying to solve.
There are a huge number of issues you need to address which is beyond just sending data. Connection setup and maintenance will all become your responsibility. If MPI is the exact abstraction (it sounds like it is) you need, use it.
At the very least, using MPI and later refactoring it out with your own system is a good approach costing the installation and dependency of MPI.
I especially like OldMan's point that MPI gives you much more beyond simple socket communication. You get a slew of parallel and distributed computing implementation with a transparent abstraction.
I have not used MPI, but I have used sockets quite a bit. There are a few things to consider on high performance sockets. Are you doing many small packets, or large packets? If you are doing many small packets consider turning off the Nagle algorithm for faster response:
setsockopt(m_socket, IPPROTO_TCP, TCP_NODELAY, ...);
Also, using signals can actually be much slower when trying to get a high volume of data through. Long ago I made a test program where the reader would wait for a signal, and read a packet - it would get a bout 100 packets/sec. Then I just did blocking reads, and got 10000 reads/sec.
The point is look at all these options, and actually test them out. Different conditions will make different techniques faster/slower. It's important to not just get opinions, but to put them to the test. Steve Maguire talks about this in "Writing Solid Code". He uses many examples that are counter-intuitive, and tests them to find out what makes better/faster code.
MPI uses sockets underneath, so really the only difference should be the API that your code interfaces with. You could fine tune the protocol if you are using sockets directly, but thats about it. What exactly are you doing with the data?
MPI Uses sockets, and if you know what you are doing you can probably get more bandwidth out of sockets because you need not send as much meta data.
But you have to know what you are doing and it's likely to be more error prone. essentially you'd be replacing mpi with your own messaging protocol.
For high volume, low overhead business messaging you might want to check out
OAMQ with several products. The open source variant OpenAMQ supposedly runs the trading at JP Morgan, so it should be reliable, shouldn't it?