C++ Concurrent GET requests - c++

I am writing a C++ application and would like to request several data files through a HTTP GET request simultaneously, where should I look to get started (needs to be cross-platform).
Run Application
Create a list of URLs { "http://host/file1.txt", "http://host/file2.txt", "http://host/file3.txt"}
Request all the URLs simultaneously and load the contents to variables (don't want disk writes). Each file has about 10kB of data.
What libraries would you recommend I use? libcurl? curlpp? boost asio? would I need to roll-my-own multi threading to request all the files simultaneously? is there a easier way?
Edit: I will need to request about 1000 GET requests simultaneously. Most likely I will do this in batches (100 at a time, and creating more connections as made one are completed).

I would recommend libcurl. I'm not super-familiar with it, but it does have a multi-interface for performing multiple simultaneous HTTP operations.
Depending on what solution you go with, it's possible to do asynchronous I/O without using multithreading. The key is to use the select(2) system call. select() takes a set of file descriptors and tells you if any of them have data available. If they do, you can then proceed to use read(2) or recv(2) on them without worrying about blocking.

Web browsers often maintain a pool of worker threads to do downloads, and assign downloads to them as they become free. IIRC the HTTP RFC has something to say about how many simultaneous connections you should make to the same server at the same time: too many is rude.
If several of the requests are to the same server, and it supports keep-alive (which almost everyone does), then that may be better behaviour than spamming it with multiple simultaneous requests. The general idea is that you use one TCP/IP connection for multiple requests in series, thus saving the handshaking overhead. The practical result, in my experience of implementing Java HTTPConnection classes, is that you introduce a subtle bug to do with not always clearing the state correctly when you re-use the connection for a new request, and spend considerable time staring at logging/sniffer data ;-)
libcurl certainly supports keepalive (enabled by default, I think).

Related

Does the grpc async server has a better performance compared to the sync server with a restricted number of threads?

I need to implement a server to which multiple clients can send requests simultaneously. The code processing an individual request might block (with the thread going to sleep) in the middle.
At the moment I am using the C++ GRPC synchronised server. Each time a client sends a request, a new thread is spawned on the server's side. This is a problem since a server can create too many threads simultaneously.
I am considering two solutions to avoid the problem:
1) Use the sync server with the ResourceQuota (e.g. restrict the max number of threads to 10).
2) Use the async server.
Implementing the second solution is considerably more difficult than implementing the first solution. What advantage (if any) the second solution would give compared to the first one? Which solution would give better results in terms of:
The amount of time an individual client needs to wait to get a response to RPC
The resources (memory, threads) used on the server.

Design a multi client - server application, where client send messages infrequent

I have to design a server which can able to send a same objects to many clients. clients may send some request to the server if it wants to update something in the database.
Things which are confusing:
My server should start the program (where I perform some operation and produce 'results' , this will be send to the client).
My server should listen to the incoming connection from the client, if any it should accept and start sending the ‘results’.
Server should accept as many clients as possible (Not more than 100).
My ‘result' should be secured. I don’t want some one take my ‘result' and see what my program logics look like.
I thought point 1. is one thread. And point 2. is another thread and it going to create multiple threads within its scope to serve point 3. Point 4 should be taken by my application logic while serialising the 'result' rather the server.
Is it a bad idea? If so where can i improve?
Thanks
Putting every connection on a thread is very bad, and is apparently a common mistake that beginners do. Every thread costs about 1 MB of memory, and this will overkill your program for no good reason. I did ask the very same question before, and I got a very good answer. I used boost ASIO, and the server/client project is finished since months, and it's a running project now beautifully.
If you use C++ and SSL (to secure your connection), no one will see your logic, since your programs are compiled. But you have to write your own communication protocol/serialization in that case.

how can we tell if the remote server is multi-threaded?

My customer did not gave me details regarding the nature of it's application. It might
be multithreaded it might be not. His server serves SOAP messages (http requests)
Is there any special trick in order to understand if the peer is single or multi threaded?
I don't want to ask the customer and I don't have access to his server/machine. I want to find it myself.
It's irrelevant. Why do you feel it matters to you?
A more useful question would be:
Can the server accept multiple
simultaneous sessions?
The answer is likely to be 'yes, of course' but it's certainly possible to implement a server that's incapable of supporting multiple sessions.
Just because a server supports multiple sessions, it doesn't mean that it's multi-threaded. And, just because it's multi-threaded doesn't mean it will have good performance. When servers need to support many hundreds or thousands of sessions, multi-threading may be a very poor choice for performance.
Are you asking this question because you want to 'overlap' SOAP messages on the same connection - in other words, have three threads send requests, and then all three wait for a response? That won't work, because (like HTTP) request and response messages are paired together on each connection. You would need to open three connections in order to have three overlapped messages.
Unfortunately, no, at least not without accessing the computer directly. Multiple connections can even be managed by a single thread, however the good news is that this is highly unlikely. Most servers use thread pooling and assign a thread to a connection upon a handshake. Is there a particular reason why you need to know? If you're presumably going to work on this server, you'll know first-hand how it works.
It doesn't matter if the server is multithreaded or not. There are good and efficient ways to implement I/O multiplexing without threads [like select(2) and suchlike], if that's what worries you.

Increasing SSL handshaking performance

I've got a short-lived client process that talks to a server over SSL. The process is invoked frequently and only runs for a short time (typically for less than 1 second). This process is intended to be used as part of a shell script used to perform larger tasks and may be invoked pretty frequently.
The SSL handshaking it performs each time it starts up is showing up as a significant performance bottleneck in my tests and I'd like to reduce this if possible.
One thing that comes to mind is taking the session id and storing it somewhere (kind of like a cookie), and then re-using this on the next invocation, however this is making me feel uneasy as I think there would be some security concerns around doing this.
So, I've got a couple of questions,
Is this a bad idea?
Is this even possible using OpenSSL?
Are there any better ways to speed up the SSL handshaking process?
After the handshake, you can get the SSL session information from your connection with SSL_get_session(). You can then use i2d_SSL_SESSION() to serialise it into a form that can be written to disk.
When you next want to connect to the same server, you can load the session information from disk, then unserialise it with d2i_SSL_SESSION() and use SSL_set_session() to set it (prior to SSL_connect()).
The on-disk SSL session should be readable only by the user that the tool runs as, and stale sessions should be overwritten and removed frequently.
You should be able to use a session cache securely (which OpenSSL supports), see the documentation on SSL_CTX_set_session_cache_mode, SSL_set_session and SSL_session_reused for more information on how this is achieved.
Could you perhaps use a persistent connection, so the setup is a one-time cost?
You could abstract away the connection logic so your client code still thinks its doing a connect/process/disconnect cycle.
Interestingly enough I encountered an issue with OpenSSL handshakes just today. The implementation of RAND_poll, on Windows, uses the Windows heap APIs as a source of random entropy.
Unfortunately, due to a "bug fix" in Windows 7 (and Server 2008) the heap enumeration APIs (which are debugging APIs afterall) now can take over a second per call once the heap is full of allocations. Which means that both SSL connects and accepts can take anywhere from 1 seconds to more than a few minutes.
The Ticket contains some good suggestions on how to patch openssl to achieve far FAR faster handshakes.

Message queuing solutions?

(Edited to try to explain better)
We have an agent, written in C++ for Win32. It needs to periodically post information to a server. It must support disconnected operation. That is: the client doesn't always have a connection to the server.
Note: This is for communication between an agent running on desktop PCs, to communicate with a server running somewhere in the enterprise.
This means that the messages to be sent to the server must be queued (so that they can be sent once the connection is available).
We currently use an in-house system that queues messages as individual files on disk, and uses HTTP POST to send them to the server when it's available.
It's starting to show its age, and I'd like to investigate alternatives before I consider updating it.
It must be available by default on Windows XP SP2, Windows Vista and Windows 7, or must be simple to include in our installer.
This product will be installed (by administrators) on a couple of hundred thousand PCs. They'll probably use something like Microsoft SMS or ConfigMgr. In this scenario, "frivolous" prerequisites are frowned upon. This means that, unless the client-side code (or a redistributable) can be included in our installer, the administrator won't be happy. This makes MSMQ a particularly hard sell, because it's not installed by default with XP.
It must be relatively simple to use from C++ on Win32.
Our client is an unmanaged C++ Win32 application. No .NET or Java on the client.
The transport should be HTTP or HTTPS. That is: it must go through firewalls easily; no RPC or DCOM.
It should be relatively reliable, with retries, etc. Protection against replays is a must-have.
It must be scalable -- there's a lot of traffic. Per-message impact on the server should be minimal.
The server end is C#, currently using ASP.NET to implement a simple HTTP POST mechanism.
(The slightly odd one). It must support client-side in-memory queues, so that we can avoid spinning up the hard disk. It must allow flushing to disk periodically.
It must be suitable for use in a proprietary product (i.e. no GPL, etc.).
How is your current solution showing its age?
I would push the logic on to the back end, and make the clients extremely simple.
Messages are simply stored in the file system. Have the client write to c:/queue/{uuid}.tmp. When the file is written, rename it to c:/queue/{uuid}.msg. This makes writing messages to the queue on the client "atomic".
A C++ thread wakes up, scans c:\queue for "*.msg" files, and if it finds one it then checks for the server, and HTTP POSTs the message to it. When it receives the 200 status back from the server (i.e. it has got the message), then it can delete the file. It only scans for *.msg files. The *.tmp files are still being written too, and you'd have a race condition trying to send a msg file that was still being written. That's what the rename from .tmp is for. I'd also suggest scanning by creation date so early messages go first.
Your server receives the message, and here it can to any necessary dupe checking. Push this burden on the server to centralize it. You could simply record every uuid for every message to do duplication elimination. If that list gets too long (I don't know your traffic volume), perhaps you can cull it of items greater than 30 days (I also don't know how long your clients can remain off line).
This system is simple, but pretty robust. If the file sending thread gets an error, it will simply try to send the file next time. The only time you should be getting a duplicate message is in the window between when the client gets the 200 ack from the server and when it deletes the file. If the client shuts down or crashes at that point, you will have a file that has been sent but not removed from the queue.
If your clients are stable, this is a pretty low risk. With the dupe checking based on the message ID, you can mitigate that at the cost of some bookkeeping, but maintaining a list of uuids isn't spectacularly daunting, but again it does depend on your message volume and other performance requirements.
The fact that you are allowed to work "offline" suggests you have some "slack" in your absolute messaging performance.
To be honest, the requirements listed don't make a lot of sense and show you have a long way to go in your MQ learning. Given that, if you don't want to use MSMQ (probably the easiest overall on Windows -- but with [IMO severe] limitations), then you should look into:
qpid - Decent use of AMQP standard
zeromq - (the best, IMO, technically but also requires the most familiarity with MQ technologies)
I'd recommend rabbitmq too, but that's an Erlang server and last I looked it didn't have usuable C or C++ libraries. Still, if you are shopping MQ, take a look at it...
[EDIT]
I've gone back and reread your reqs as well as some of your comments and think, for you, that perhaps client MQ -> server is not your best option. I would maybe consider letting your client -> server operations be HTTP POST or SOAP and allow the HTTP endpoint in turn queue messages on your MQ backend. IOW, abstract away the MQ client into an architecture you have more control over. Then your C++ client would simply be HTTP (easy), and your HTTP service (likely C# / .Net from reading your comments) can interact with any MQ backend of your choice. If all your HTTP endpoint does is spawn MQ messages, it'll be pretty darned lightweight and can scale through all the traditional load balancing techniques.
Last time I wanted to do any messaging I used C# and MSMQ. There are MSMQ libraries available that make using MSMQ very easy. It's free to install on both your servers and never lost a message to this day. It handles reboots etc all by itself. It's a thing of beauty and 100,000's of message are processed daily.
I'm not sure why you ruled out MSMQ and I didn't get point 2.
Quite often for queues we just dump record data into a database table and another process lifts rows out of the table periodically.
How about using Asynchronous Agents library from .NET Framework 4.0. It is still beta though.
http://msdn.microsoft.com/en-us/library/dd492627(VS.100).aspx