I am looking into writing a self contained http server using Qt libraries, although many people have the view that QtCore is too bloated and that the overhead would be too large. Would a QtCore http server manage a load of about 50 concurrent connections, using a thread pool.
The QtCore library is dynamically linked on arch Linux compiled for release with optimization o2
There is no reason that one could not write a server with Qt, however, there is really no way to tell beforehand whether the performance will be what you want (depends on what your server does). Note that the optimal number of concurrent threads is typically dependent on the number of hardware cores, as well as the level of parallelism in your program. My suggestion would be to implement whatever you can in the least amount of time, and then tune the performance as needed afterwards. Even if the server cannot handle that many concurrent connections, you can use process-level parallelism (running multiple instances of your multithreaded server), until you have improved the performance.
Your question is very broad and the answer depends on how you want to design your http server. You could design it as a "single threaded reactor" or "multi-threaded proactor" or "half-synch half asynch" server.
QT mostly uses little wrapper classes over native or posix APIs and brings its own overweight for sure and 50 connections does not sound too many but again the answer depends on what will these connections do ? Serve simple pages or perform heavy calculations ?
I think the difficulty of the project lies in implementing a full http server that is secure ,reliable and scalable. You will have to do a lot of coding just to provide the life cycle of a simple Java servlet model. Many interfaces/abstractions are required.
You can find open source http servers already tested. I would not even bother writing my own for production software.
50 connections isn't much.
But I hope you will add the QtNetwork module :-)
Related
I've to develop a server that has to make a lot of connections to receive and send small files. The question is if the increment of performance with C++ worth the time to spend on develop the code or if is better to use Python and debug the code time to time to speed it up. Maybe is a little abstract question without giving a number of connections but I don't really know. At least 10,000 connections/minute to update clients status.
With that many connections, your server will be I/O bound. The frequently cited speed differences between languages like C and C++ and languages like Python and (say) Ruby lie in the interpreter and boxing overhead which slow down computation, not in the realm of I/O.
Not only can use make good and reasonably use of concurrency (both via processes and threads, the GIL is released during I/O and thus does not matter much for I/O-bound programs), there is also a wealth of asynchronous servers. In addition, web servers in general have much better Python integration (e.g. mod_wsgi for Apache) than C and C++. This frees you from writing your own server loop, socket management, etc. which you likely won't do as well as the major servers anyway. This is assuming we're talking about a web service, and not something more arcane which Apache etc. cannot do out of the box.
I'd expect that the server time would be dominated by I/O- network, disk, etc. You'd want to prove that the CPU consumption of the Python program is problematic and that you've grasped all the low-hanging CPU fruit before considering a change.
I am looking for a framework to be used in a C++ distributed number crunching application.
The setup looks as follows:
There is a master node which divides the problem domain into small independent tasks. The tasks are distibuted to worker nodes of different capability (e.g. CPU type/GPU-enabled).
Worker nodes are dynamically added to the compute grid, as they become available. It may also happen that a worker node dies, without saying good bye.
I am searching for a fast C/C++ framework to accomplish this setup.
To summarize, my main requirements are:
Worker/Task-scheduling paradigm
Dynamically add/remove nodes
Target network: 1G - 10G ethernet (corporate network, good performance over internet not required)
Optional: Encrypted and authenticated communication
You can certainly do what you want with MPI. MPI-2 added dynamic process management features, and I think most of the currently widely-used implementations offer these.
One of the advantages of using C++ + MPI is that the combination is quite widely used in scientific and technical computing, though my impression is that within this niche dynamic process management is not used very much. Since MPI is used on the very largest supercomputers tackling the bleeding-edge problems of computational science, one might hazard a guess that it would be fast enough for your purposes.
One of the disadvantages of using C++ + MPI is that MPI was not designed to tolerate failure of processes during execution. There is debate on SO about whether or not the dynamic process management features allow you to program your own fault tolerance. But no debate that it might be difficult.
You would get the first 3 of your requirements 'out-of-the-box'. As for encrypted and authenticated communication, you'd have to do most of that yourself, MPI just passes messages around. I'd guess that for most MPI users, running parallel applications on clusters or supercomputers with private interconnects (often themselves isolated from corporate or enterprise networks), encryption and authentication are matters of little concern.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Over the last couple of months I've been working on some implementations of sockets servers in C++ and Java. I wrote a small server in Java that would handle & process input from a flash application hosted on a website and I managed to successfully write a server that handles input from a 2D game client with multiple players in C++. I used TCP in one project & UDP in the other one. Now, I do have some questions that I couldn't really find on the net and I hope that some of the experts could help me. :)
Let's say I would like to build a server in C++ that would handle the input from thousands of standalone and/or web applications, how should I design my server then? So far, I usually create a new & unique thread for each user that connects, but I doubt this is the way to go.
Also, How does one determine the layout of packets sent over the network; is data usually sent over the network in a binary or text state? How do you handle serializated objects when you send data to different media (eg C++ server to flash application)?
And last, is there any easy to use library which is commonly used that supports portability (eg development on a windows machine & deployment on a linux box) other than boost asio.
Thank you.
Sounds like you have a couple of questions here. I'll do my best to answer what I can see.
1. How should I handle threading in my network server?
I would take a good look at what kind of work you're doing on the worker threads that are being spawned by your server. Spawning a new thread for each request isn't a good idea...but it might not hurt anything if the number of parallel requests is small and and tasks performed on each thread are fast running.
If you really want to do things the right way, you could have a configurable/dynamic thread pool that would recycle the worker threads as they became free. That way you could set a max thread pool size. Your server would then work up to the pool size...and then make further requests wait until a worker thread was available.
2. How do I format the data in my packets?
Unless you're developing an entirely new protocol...this isn't something you really need to worry about. Unless you're dealing with streaming media (or another application where packet loss/corruption is acceptable), you probably won't be using UDP for this application. TCP/IP is probably going to be your best bet...and that will dictate the packet design for you.
3. Which format do I use for serialization?
The way you serialize your data over the wire depends on what kind of applications are going to be consuming your service. Binary serialization is usually faster and results in a smaller amount of data that needs to be transfered over the network. The downside to using binary serialization is that the binary serialization in one language may not work in another. Therefore the clients connecting to your server are, most likely, going to have to be written in the same language you are using.
XML Serialization is another option. It will take longer and have a larger amount of data to be transmitted over the network. The upside to using something like XML serialization is that you won't be limited to the types of clients that can connect to your server and consume your service.
You have to choose what fits your needs the best.
...play around with the different options and figure out what works best for you. Hopefully you'll find something that can perform faster and more reliably than anything I've mentioned here.
As far as server design concern, I would say that you are right: although ONE-THREAD-PER-SOCKET is a simple and easy approach, it is not the way to go since it won't scale as well as other server design patterns.
I personally like the COMMUNICATION-THREADS/WORKER-THREADS approach, where a pool of a dynamic number of worker threads handle all the work generated by producer threads.
In this model, you will have a number of threads in a pool waiting for tasks that are going to be generated from another set of threads handling network I/O.
I found UNIX Network Programming by Richard Stevens and amazing source for this kind on network programming approaches. And, despite its name, it will be very useful in windows environments as well.
Regarding the layout of the packets (you should have post a different question for this since it is a totally different question, in my opinion), there are tradeoffs when selecting TEXT vs BINARY approach.
TEXT (i.e. XML) is probably easier to parse and document, and more simple in general, while a BINARY protocol should give you better performance in terms of speed of processing and size of network packets, but you will have to deal with more complicated issues such as ENDIANNES of the words and stuff like that.
Hope it helps.
Though previous answers provide good direction, just for completeness, I'd like to point out that threads are not an absolute requirement for great socket server performance. Some examples are here. There are many approaches to scalability too - thread pools, pre-forked processes, server pools, etc.
1) And last, is there any easy to use library which is commonly used that supports portability (eg development on a windows machine & deployment on a linux box) other than boost asio.
The ACE library is another alternative. It's very mature (been around since the early 90s) and widely deployed. A brief discussion about how it compares to Boost ASIO is available on the Riverace website here. Keep in mind that ACE has had to support a large number of legacy platforms for long time so it doesn't utilize modern C++ features as much as Boost ASIO, for example.
2) Let's say I would like to build a server in C++ that would handle the input from thousands of standalone and/or web applications, how should I design my server then? So far, I usually create a new & unique thread for each user that connects, but I doubt this is the way to go.
There are a number of commonly used approaches including but not limited to: thread-per-connection (the approach you describe) and thread pool (the approach Justin described). Each have their pros and cons. Many have a looked at the trade-offs. A good starting point might be the links on the Thread Pool Pattern Wikipedia page.
Dan Kegel's "The C10K Problem" web page has lots of useful notes about improving scalability as well.
3) Also, How does one determine the layout of packets sent over the network; is data usually sent over the network in a binary or text state? How do you handle serializated objects when you send data to different media (eg C++ server to flash application)?
I agree with others that sending binary data is generally going to be most efficient. The boost serialization library can be used to marshal data into a binary form (as well as text). Mature binary formats include XDR and CDR. CDR is the format used by CORBA, for instance. The company ZeroC defines the ICE encoding, which is supposed to be much more efficient than CDR.
There are lots of binary formats to choose from. My suggestion would be to avoid reinventing the wheel by at least reading about some of these binary formats so that you don't end up running into the same pitfalls these existing binary formats were designed to address.
That said, lots of middleware exists that already provides a canned solution for most of your needs. For example, OpenSplice and OpenDDS are both implementations of the OMG Data Distribution Service standard. DDS focuses on efficient distribution of data such as through a publish-subscribe model, rather than remote invocation of functions. I'm more familiar with the OMG defined technologies but I'm sure there are other middleware implementations that will fit your needs.
you're still going to need a socket to handle every client, but the idea would be to create a pool of X sockets (say 50) and then, when you get close (say 90%) to consuming all those sockets, create another pool of X sockets. At some point, after clients have connected, sent data and disconnected, some of your sockets will be available for use and you can use them (google socket pools for this info)
The layout of data is always difficult. If all your clients and servers will be using the same hardware and operating system, you can send data in binary format, but there are many trips and traps there (byte alignment is at the top of the list). sending formatted text is always easier, but certainly more expensive in terms of bandwidth and processing power because you have to change format from machine to text before sending and, of course, back again at the receiver.
re: serialized, I'm sorry, I can't help you, nor with libraries (I'm too embedded to have used much of these)
About server sockets and serialization(marshaling). The most important problem is growing sockets number is readable and writable state in select. I am not about limitation in the FD_SET. This is solvable simply. I am about growth of time of signaling and problem data accumulation in not read sockets while processing data available in evaluated socket. So the solution may be even out of SW boundaries and require multiple processor model,when roles of processors are limited: one reads and writes, N are processing. In this case all available socket data should has been read when select returned and sent to another processing units.
The same is about incoming data.
About marshaling. Of coarse a binary format is preferable because performance.By the way XML in the terms of UNICODE has the same problem. But,... comrades, it is not simply copying long or integer value into a socket stream. But in this case even htons, htonl could help (it sends/receives in NW format and OS is responsible for data convert). But it is safe more sending data following representation header, where exposed format of most/least significant bits placed, bytes order and IEEE data type. This works, I had not a case when not.
Kind regards and great success for everyone.
Simon Cantor
my web server has a lot of dependencies for sending back data, when it gets a request. i am testing one of these dependency applications within the web server. the application is decoupled from the main web server, and only queries are going to it in the form of api's exposed.
my question is, if i wish to check these api's in a multithreaded environment (c++ functions with a 2 quadcore processor machine), what is the best wy to go about doing it?
do i call each api in a separate thread or process? if so, how do i implement such code? from what i can figure out, i would be duplicating the functioning of the web server, but i can find no other better way to figure out the performance improvements given by that component alone.
It depends on whether your app deails with data that's shared if it is run in parallel processes because that'll most likely determine where the speed bottleneck awaits.
E.g, if the app accesses a database or disk files, you'll probably have to simulate multiple threads/processes querying the app in order to see how they get along with each other, i.e. whether they have to wait for each other while accessing the shared resource.
But if the app only does some internal calculation, all by its own, then it may scale well, as long as all its data fits into memory (i.e. not virtual memory access, e.g. disk access, necessary). Then you can test the performance of just one instance and focus on optimizing its speed.
It also might help to state the OS you're planning to use. Mac OS X offers tools for performance testing and optimization that Windows and Linux may not, and vice versa.
I'm working on a loosely coupled cluster for some data processing. The network code and processing code is in place, but we are evaluating different methodologies in our approach. Right now, as we should be, we are I/O bound on performance issues, and we're trying to decrease that bottleneck. Obviously, faster switches like Infiniband would be awesome, but we can't afford the luxury of just throwing out what we have and getting new equipment.
My question posed is this. All traditional and serious HPC applications done on clusters is typically implemented with message passing versus sending over sockets directly. What are the performance benefits to this? Should we see a speedup if we switched from sockets?
MPI MIGHT use sockets. But there are also MPI implementation to be used with SAN (System area network) that use direct distributed shared memory. That of course if you have the hardware for that. So MPI allows you to use such resources in the future. On that case you can gain massive performance improvements (on my experience with clusters back at university time, you can reach gains of a few orders of magnitude). So if you are writting code that can be ported to higher end clusters, using MPI is a very good idea.
Even discarding performance issues, using MPI can save you a lot of time, that you can use to improve performance of other parts of your system or simply save your sanity.
I would recommend using MPI instead of rolling your own, unless you are very good at that sort of thing. Having wrote some distributed computing-esque applications using my own protocols, I always find myself reproducing (and poorly reproducing) features found within MPI.
Performance wise I would not expect MPI to give you any tangible network speedups - it uses sockets just like you. MPI will however provide you with much the functionality you would need for managing many nodes, i.e. synchronisation between nodes.
Performance is not the only consideration in this case, even on high performance clusters. MPI offers a standard API, and is "portable." It is relatively trivial to switch an application between the different versions of MPI.
Most MPI implementations use sockets for TCP based communication. Odds are good that any given MPI implementation will be better optimized and provide faster message passing, than a home grown application using sockets directly.
In addition, should you ever get a chance to run your code on a cluster that has InfiniBand, the MPI layer will abstract any of those code changes. This is not a trivial advantage - coding an application to directly use OFED (or another IB Verbs) implementation is very difficult.
Most MPI applications include small test apps that can be used to verify the correctness of the networking setup independently of your application. This is a major advantage when it comes time to debug your application. The MPI standard includes the "pMPI" interfaces, for profiling MPI calls. This interface also allows you to easily add checksums, or other data verification to all the message passing routines.
Message Passing is a paradigm not a technology. In the most general installation, MPI will use sockets to communicate. You could see a speed up by switching to MPI, but only in so far as you haven't optimized your socket communication.
How is your application I/O bound? Is it bound on transferring the data blocks to the work nodes, or is it bound because of communication during computation?
If the answer is "because of communication" then the problem is you are writing a tightly-coupled application and trying to run it on a cluster designed for loosely coupled tasks. The only way to gain performance will be to get better hardware (faster switches, infiniband, etc. )... maybe you could borrow time on someone else's HPC?
If the answer is "data block" transfers then consider assigning workers multiple data blocks (so they stay busy longer) & compress the data blocks before transfer. This is a strategy that can help in a loosely coupled application.
MPI has the benefit that you can do collective communications. Doing broadcasts/reductions in O(log p) /* p is your number of processors*/ instead of O(p) is a big advantage.
I'll have to agree with OldMan and freespace. Unless you know of a specific and improvement to some useful metric (performance, maintainability, etc.) over MPI, why reinvent the wheel. MPI represents a large amount of shared knowledge regarding the problem you are trying to solve.
There are a huge number of issues you need to address which is beyond just sending data. Connection setup and maintenance will all become your responsibility. If MPI is the exact abstraction (it sounds like it is) you need, use it.
At the very least, using MPI and later refactoring it out with your own system is a good approach costing the installation and dependency of MPI.
I especially like OldMan's point that MPI gives you much more beyond simple socket communication. You get a slew of parallel and distributed computing implementation with a transparent abstraction.
I have not used MPI, but I have used sockets quite a bit. There are a few things to consider on high performance sockets. Are you doing many small packets, or large packets? If you are doing many small packets consider turning off the Nagle algorithm for faster response:
setsockopt(m_socket, IPPROTO_TCP, TCP_NODELAY, ...);
Also, using signals can actually be much slower when trying to get a high volume of data through. Long ago I made a test program where the reader would wait for a signal, and read a packet - it would get a bout 100 packets/sec. Then I just did blocking reads, and got 10000 reads/sec.
The point is look at all these options, and actually test them out. Different conditions will make different techniques faster/slower. It's important to not just get opinions, but to put them to the test. Steve Maguire talks about this in "Writing Solid Code". He uses many examples that are counter-intuitive, and tests them to find out what makes better/faster code.
MPI uses sockets underneath, so really the only difference should be the API that your code interfaces with. You could fine tune the protocol if you are using sockets directly, but thats about it. What exactly are you doing with the data?
MPI Uses sockets, and if you know what you are doing you can probably get more bandwidth out of sockets because you need not send as much meta data.
But you have to know what you are doing and it's likely to be more error prone. essentially you'd be replacing mpi with your own messaging protocol.
For high volume, low overhead business messaging you might want to check out
OAMQ with several products. The open source variant OpenAMQ supposedly runs the trading at JP Morgan, so it should be reliable, shouldn't it?