C++ Fastest Way to Hit a URL

C++ Fastest Way to Hit a URL - c++

I'm trying to ping a URL on a server in the middle of my high-performance C++ application, where every millisecond is critical. I don't care about the return data from the query... I just need to send a HTTP request to a specific URL (to cause it to load), and I'm trying to find the most effective, non-blocking method to accomplish this.
My application uses Boost::ASIO, but most methods to do this seem to involve building and tearing down sockets each time (which might unfortunately be necessary), but I'm hoping there's a basic C/C++ socket one-liner that won't cause any overhead, memory leaks, blocking, etc. Just quickly open a socket, shoot the HTTP request off, and move along.
And this will need to happen thousands of times per second, so sockets and overhead is important (don't want to flood the OS).
Anyone have any advice on the most efficient way to accomplish this?
Thanks so much!

With thousands of notifications sent per second, I can't imagine opening a socket connection for each one. That would probably be too inefficient due to the overhead. So, as Casey suggested, try using a dedicated connection.
Since it sounds like you are doing quite a bit of processing on your main thread, you might consider creating a worker thread for the socket work. You will probably need to use thread synchronization objects like a mutex or critical section to single thread the code - at least when updating a container (probably a queue) from your main thread and reading it from the worker thread.

Related

What's the most efficient way to async send data while async receiving with 0MQ?

I've got a ROUTER/DEALER setup where both ends need to be able to receive and send data asynchronously, as soon as it's available. The model is pretty much 0MQ's async C++ server: http://zguide.zeromq.org/cpp:asyncsrv
Both the client and the server workers poll, when there's data available they call a callback. While this happens, from another thread (!) I'm putting data in a std::deque. In each poll-forever thread, I check the deque (under lock), and if there are items there, I send them out to the specified DEALER id (the id is placed in the queue).
But I can't help thinking that this is not idiomatic 0MQ. The mutex is possibly a design problem. Plus, memory consumption can probably get quite high if enough time passes between polls (and data accumulates in the deque).
The only alternative I can think of is having another DEALER thread connect to an inproc each time I want to send out data, and just have it send it and exit. However, this implies a connect per item of data sent + construction and destruction of a socket, and it's probably not ideal.
Is there an idiomatic 0MQ way to do this, and if so, what is it?

I dont fully understand your design but I do understand your concern about using locks.
In most cases you can redesign your code to remove the use of locks using zeromq PAIR sockets and inproc.
Do you really need a std::deque? If not you could just use a zerom queue as its just a queue that you can read/write from from different threads using sockets.
If you really need the deque then encapsulate it into its own thread (a class would be nice) and make its API (push etc) accessible via inproc sockets.
So like I said before I may be on the wrong track but in 99% of cases I have come across you can always remove the locks completely with some ZMQ_PAIR/inproc if you need signalling.

0mq queue has limited buffer size and it can be controlled. So memory issue will get to some point and then dropping data will occur. For that reason you may consider using conflate option leaving only most recent data in queue.
In a case of single server and communication within single machine with many threads I suggest using publish/subscribe model where with conflate option you will receive new data as soon as you read buffer and won't have to worry about memory. And it removes blocking queue problem.
As for your implementation you are quite right, it is not best design but it is quite unavoidable. I suggest checking question Access std::deque from 3 threads while it answers your problem, it may not be the best approach.

Designing a multi-client tcp server to process data

I am attempting to rewrite my current project to include more features and stability, and need some help designing it. Here is the jist of it (for linux):
TCP_SERVER receives connection (auth packet)
TCP_SERVER starts a new (thread/fork) to handle the new client
TCP_SERVER will be receiving many packets from client > which will be added to a circular buffer
A separate thread will be created for that client to process those packets and build a list of objects
Another thread should be created to send parts of the list of objects to another client
The reason to separate all the processing into threads is because server will be getting many packets and the processing wont be able to keep up (which needs to be quick, as its time sensitive) (im not sure if tcp will drop packets if the internal buffer gets too large?), and another thread to send to another client to keep the processing fast as possible.
So for each new connection, 3 threads should be created. 1 to receive packets, 1 to process them, and 1 to send the processed data to another client (which is technically the same person/ip just on a different device)
And i need help designing this, as how to structure this, what to use (forks/threads), what libraries to use.

Trying to do this yourself is going to cause you a world of pain. Focus on your actual application, and leverage an existing socket handling framework. For example, you said:
for each new connection, 3 threads should be created
That statement says the following:
1. You haven't done this before, at scale, and haven't realized the impact all these threads will have.
2. You've never benchmarked thread creation or synchronous operations.
3. The number of things that can go wrong with this approach is pretty overwhelming.
Give some serious thought to using an existing library that does most of this for you. Getting the scaffolding right around this can literally take years, and you're better off focusing on your code rather than all the random plumbing.
The Boost C++ libraries seem to have a nice Async C++ socket handling infrastructure. Combine this with some of the existing C++ thread pools and you could likely have a highly performant solution up fairly quickly.
I would also question your use of C++ for this. Java and C# both do highly scalable socket servers pretty well, and some of the higher level language tooling (Spring, Guarva, etc) can be very, very valuable. If you ever want to secure this, via TLS or another mechanism, you'll also probably find this much easier in Java or C# than in C++.
Some of the major things you'll care about:
1. True Async I/O will be a huge perf and scalability win. Try really hard to do this. The boost asio library looks pretty nice.
2. Focus on your features and stability, rather than building a new socket handling platform.
3. Threads are expensive, avoid creating them. Thread pools are your friend.

You plan to create one-or-more threads for every connection your server handles. Threads are not free, they come with a memory and CPU overhead, and when you have many active threads you also begin to have resource contention.
What usage pattern do you anticipate? Do you expect that when you have 8 connections, all 8 network threads will be consuming 100% of a cpu core pushing/pulling packets? Or do you expect them to have a relatively low turn-around?
As you add more threads, you will begin to have to spend more time competing for resources in things like mutexes etc.
A better pattern is to have one or more thread for network io - most os'es have mechanisms for saying "tell me when one or more of these network connections has io" which is an efficiency saving over having lots of individual threads all doing the same thing for just one connection.
Then for actual processing, spin up a pool of worker threads to do actual work, allowing you to minimize the competition for resources. You can monitor work load to determine if you need to spin up more to meet delivery requirements.
You might also want to look into something to implement the network IO infrastructure for you; I've had really good performance results with libevent but then I've only had to deal with very high performance/reliability networking systems.

Calling boost::asio::read() in a thread blocks calling thread or process?

I'm quite new to network programming and I'm writing a program that should accept many TCP connections and receive data from them. To make things go parallel, the agent should read data from each socket in a new thread. I decided to use boost::asio instead of raw *nix sockets to make things simpler. Though this seems to be a wrong decision...
I wonder if I calling boost::asio::read or boost::asio::read_some blocks only its calling thread or blocks process? Yes I should write my own small test and see results myself, but I have no access to my Linux box right now. Just thinking about code that I should write tomorrow at university.
So if it blocks the process, what's correct way of implementing a server/client architecture that accepts many clients at same time?
Notes:
I'm having difficulties about design decisions. Any suggestion is appropriate.

The read and read_some calls are both blocking, and will only block the current thread for Linux and Win32 (and probably most others, just don't have direct expericence).
You might want to look into using async_read instead though if you are having a large number of incoming connections, as you might acctually do better performance wise using a smaller number of threads than number of connections. Boost does provide examples of using the thread pool to handle client connections.

Multiuser chat server c++

I am building a Chat Server (which allows private messages between users) in c++ ... just as a challenge for me, and I've hit a dead point... where I don't know what may be better.
By the way: I am barely new to C++; that's why I want a challenge... so if there are other optimal ways, multithreading, etc... let me know please.
Option A
I have a c++ application running, that has an array of sockets, reads all the input (looping through all the sockets) in every loop (1second loop I guess) and stores it to DB (a log is required), and after that, loops again over all the sockets sending what's needed in every socket.
Pros: One single process, contained. Easy to develop.
Cons: I see it hardly scalable, and a single focus of failure ... I mean, what about performance with 20k sockets?
Option B
I have a c++ application listening to connections.
When a connection is received, it forks a subprocess that handles that socket... reading and saving to a DB all the input of the user. And checking all the required output from DB on every loop to write to the socket.
Pros: If the daemon is small enough, having a process per socket is likely more scalable. And at the same time if a process fails, all the others are kept online.
Cons: Harder to develop. May be it consumes too much resources to maintain a process for each connection.
What option do you think is the best? Any other idea or suggestion is welcome :)

As mentioned in the comments, there is an additional alternative which is to use select() or poll() (or, if you don't mind making your application platform-specific, something like epoll()). Personally I would suggest poll() because I find it more convenient, but I think only select() is available on at least some versions of Windows - I don't know whether running on Windows is important to you.
The basic approach here is that you first add all your sockets (including a listen socket, if you're listening for connections) to a structure and then call select() or poll() as appropriate. This call will block your application until at least one of the socket has some data to read, and then you get woken up and you go through the socket(s) that are ready for reading, process the data and then jump back into blocking again. You generally do this in a loop, something like:
while (running) {
int rc = poll(...);
// Handle active file descriptors here.
}
This is a great way to write an application which is primarily IO-bound - i.e. it spends much more time handling network (or disk) traffic than it does actually processing the data with the CPU.
As also mentioned in the comments, another approach is to fork a thread per connection. This is quite effective, and you can use simple blocking IO in each thread to read and write to that connection. Personally I would advise against this approach for several reasons, most of which are largely personal preference.
Firstly, it's fiddly to handle connections where you need to write large amounts of data at a time. A socket can't guarantee to write all pending data at once (i.e. the amount that it sent may not be the full amount you requested). In this case you have to buffer up the pending data locally and wait until there's room in the socket to send it. This means at any given time, you might be waiting for two conditions - either the socket is ready to send, or the socket is ready to read. You could, of course, avoid reading from the socket until all the pending data is sent, but this introduces latency into handling the data. Or, you could use select() or poll() on just that connection - but if so, why bother using threads at all, just handle all the connections that way. You could also use two threads per connection, one for reading and one for writing, which is probably the best approach if you're not confident whether you can always send all messages in a single call, although this doubles the number of threads you need which could make your code more complicated and slightly increase resource usage.
Secondly, if you plan to handle many connections, or a high connection turnover, threads are somewhat more of a load on the system than using select() or friends. This isn't a particularly big deal in most cases, but it's a factor for larger applications. This probably isn't a practical issue unless you were writing something like a webserver that was handling hundreds of requests a second, but I thought it was relevant to mention for reference. If you're writing something of this scale you'd likely end up using a hybrid approach anyway, where you multiplexed some combination of processes, threads and non-blocking IO on top of each other.
Thirdly, some programmers find threads complicated to deal with. You need to be very careful to make all your shared data structures thread-safe, either with exclusive locking (mutexes) or using someone else's library code which does this for you. There are a lot of examples and libraries out there to help you with this, but I'm just pointing out that care is needed - whether multithreaded coding suits you is a matter of taste. It's relatively easy to forget to lock something and have your code work fine in testing because the threads don't happen to contend that data structure, and then find hard-to-diagnose issues when this happens under higher load in the real world. With care and discipline, it's not too hard to write robust multithreaded code and I have no objection to it (though opinions vary), but you should be aware of the care required. To some extent this applies to writing any software, of course, it's just a matter of degree.
Those issues aside, threads are quite a reasonable approach for many applications and some people seem to find them easier to deal with than non-blocking IO with select().
As to your approaches, A will work but is wasteful of CPU because you have to wake up every second regardless of whether there's actual useful work to do. Also, you introduce up to a second's delay in handling messages, which could be irritating for a chat server. In general I would suggest that something like select() is a much better approach than this.
Option B could work although when you want to send messages between connections you're going to have to use something like pipes to communicate between processes and that's a bit of a pain. You'll end up having to wait on both your incoming pipe (for data to send) as well as the socket (for data to receive) and thus you end up effectively with the same problem, having to wait on two filehandles with something like select() or threads. Really, as others have said, threads are the right way to process each connection separately. Separate processes are also a little more expensive of resources than threads (although on platforms such as Linux the copy-on-write approach to fork() means it's not actually too bad).
For small applications with only, say, tens of connections there's not an awful lot technically to choose between threads and processes, it largely depends on which style appeals to you more. I would personally use non-blocking IO (some people call this asynchronous IO, but that's not how I would use the term) and I've written quite a lot of code that does that as well as lots of multithreaded code, but it's still only my personal opinion really.
Finally, if you want to write portable non-blocking IO loops I strongly suggest investigating libev (or possbily libevent but personally I find the former easier to use and more performant). These libraries use different primitives such as select() and poll() on different platforms so your code can remain the same, and they also tend to offer slightly more convenient interfaces.
If you have any more questions on any of that, feel free to ask.

VNC viewer implementation

Our team is implementing a VNC viewer (=VNC client) on Windows. The protocol (called RFB) is stateful, meaning that the viewer has to read 1 byte, see what it is, then read either 3 or 10 bytes more, parse them, and so on.
We've decided to use asynchronous sockets and a single (UI) thread. Consequently, there are 2 ways to go:
1) state machine -- if we get a block on socket reading, just remember the current state and quit. Later on, a socket notification will arrive and the interrupted logic will resume from the proper stage;
2) inner message loop -- once we determine that reading from the socket would block, we enter an inner message loop and spin there until all the necessary data is finally received.
UI is not thus frozen in case of a block.
As experience showed, the second approach is bad, as any message can come while we're in the inner message loop. I cannot tell the full story here, but it simply is not reliable enough. Crashes and kludges.
The first option seems to be quite acceptable, but it is not easy to program in such a style. One has to remember the state of an algorithm and values of all the local variables required for further processing.
This is quite possible to use multiple threads, but we just thought that the problems in this case would be even much harder: synchronization of frame-buffer access, multi-threading issues, etc. Moreover, even in this variant it seems necessary to use asynchronous sockets as well.
So, what way is in your opinion the best ?
The problem is quite a general one. This is the problem of organizing asynchronous communication through stateful protocols.
Edit 1: We use C++ and MFC as UI framework.

I've done a few parallel computing projects and it seems that MPI (Message Passing Interface) might be helpful to your VNC project. You're probably not so interested in the parallel computing power provided by MPI, but you may want to use the simplified socket-like interface for asynchronous communication over a network.
http://www.open-mpi.org/
You can find other implementations of MPI and tons of use examples from google.

Don't bother with CSocket, you'll move to CAsyncSocket in the end because of the extra control you get (interrupting, shutting down etc.). I'd also recommend using a separate thread to manage the communication, it adds complexity but keeping the UI responsive should be a top priority.

I think you will find that your design will be simplified greatly by using a separate thread to handle a blocking socket.
The main reason for this is you don't need to spin and wait. The UI remains responsive while the network thread(s) block when it has nothing to do and comes back when it has stuff to do. You are effectively offloading a large portion of your overhead to the OS.
Remember, RFB does not require a whole lot of state info to work. Because client to server messages are short; there is nothing requiring you to receive a frame buffer before you send your next pointer input.
My point being is messages in RFB can be intermixed; the server will work on your schedule.
Now, Windows provides easy to use synchronization API's that while not always the most efficient, are more than enough for your purposes and will ease getting a proof of concept up and going.
Take a look at Windows Synchronization and specifically Critical Sections
Just my 2cents, I've implemented both a vnc server and client on windows, these were my impressions.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js