I am looking to use libcurl for asynchronous HTTP requests. For that I am using multi interface provided by libcurl. My application will be having many requests coming periodically for which I am looking to use single easy interface and add it to the multi interface. I am not planning to use new easy interface handles for each and every http requests because it opens up a new connection with a new session. I need to make all requests in a single connection/session. So, I am looking to use single easy interface handle for all requests.
With this model, I am getting issues in making multiple http requests. The first request goes through using curl_multi_perform with no issue. Response is processed. The second request does not go through with curl_multi_perform. When curl_multi_perform is called second time, the second parameter running_handles is returned as 0 and not 1.
This is the flow of APIs I am using at high level.
curl_easy_init()
curl_multi_init()
curl_multi_add_handle()
curl_multi_perform() // running_handles returned is 1.
//look for response (curl_multi_timeout, curl_multi_fdset, select, curl_multi_info_read, ...)
curl_multi_perform() // This does not work and running_handles returned is 0
...
curl_multi_cleanup()
curl_easy_cleanup()
Can't libcurl multi interface be used with single easy interface added for multiple requests coming over a period of time?
Please help. Thanks in advance.
When an easy handle has completed its transfer and you want to re-use that same handle for a subsequent transfer, you need to first remove it from the multi handle (curl_multi_remove_handle) and (possible set new options and then) re-add it with curl_multi_add_handle to make it start another transfer.
But note that when using the multi interface, the connection pool and reuse mechanism is owned by the multi handle and the easy handle so connections can and will be re-used across easy handles as long as you keep the multi handle alive.
Related
I am working on a module which uses 10 queues to handle threads and each of them send curl requests using curl_easy interface (along with Lock) so that; a single connection is maintained till the response is not received. I want to enhance request handling by using curl_multi interface where curl requests are sent by the thread and handled in parallel fashion.
I have created a separate code to implement it. I created 3 threads for instance, being handled one by one, the first thread sends request to curl_multi till it's running and there are transfers existing, which allocates resources using curl_easy interface for each transfer.
I have gone through a lot of examples but cannot figure out how to implement it in C++. Also because I have recently learnt multi threading and curl concepts in C++ I need assistance with the approach.
I expect a single thread should be able to send curl requests till the user doesn't stop sending.
Update - I have added two threads and each sends two requests simultaneously. curl_multi is being handled by an array of handles, for curl_easy.
I want to keep it free of arrays because that is limiting the number of requests.
Can it be made asynchronous and accept all transfers and exit only when the client/user does. There are enough examples of curl_multi therefore I am not clear of its implementation.
Reading the curl_multidocumentation, it doesn't seem as you have to create different threads for this, as it works via your multiple easy handles added to the multi handle object. You then call curl_multi_perform to start all transfers in a non-blocking way.
I expect a single thread should be able to send curl requests till the user doesn't stop sending.
I don't understand what you mean by this, do you mean that you just want to keep those connections alive until everything is transferred? If so, curl_multi already gives you info on the progress of your transfers which can help you determine what to do.
Hope it helps
Context (C++): I need to develop a network server, which can handle more than 1000 clients per second, with more than 100 requests per second.
Each request starts a state machine between the client and server, wherein the client and server exchange further data, before the server sends a final response.
Problem : Some of the processing is done by a third party library that requests callbacks from us and calls these callbacks when it requires some data from the client. So, we dont controll this thread and must wait for the data from client before we can process further.
Question: With such a high amount of messages, we decided we would use libevent or some of its derivatives e.g. https://github.com/facebook/wangle or https://github.com/Qihoo360/evpp.
The problem is that libevent is based on reactor pattern and we do not have a way to leave processing in a thread as soon as it enters the state machine.
So, my question is if the proactor pattern would be better here, and is there any library that can give us this behavior?
[Edit1]
OK, so after much deliberation, we decided that we should go ahead and make a "proxy" in front of our application. this proxy can then distribute the load to multiple running instances of our application using this 3rd party. Then we can use reactor pattern.
Any other suggestions are welcome..
From the examples and documentation, it seems libcurl multi interface provides asynchronous support in batch mode i.e. easy handles are added to multi and then finally the requests are fired simultaneously with curl_multi_socket_action. Is it possible to trigger a request, when easy handle is added but the control returns to application after request is written on the socket?
EDIT:
It'll help in firing request in the below model, instead of firing requests in batch(assuming request creation on client side and processing on the server takes same duration)
Client -----|-----|-----|-----|
Server < >|-----|-----|-----|----|
The multi interface returns "control" to the application as soon as it would otherwise block. It will therefor also return control after it has sent off the request.
But I guess you're asking how you can figure out exactly when the request has been sent? I think that's only really possibly by using CURLOPT_DEBUGFUNCTION and seeing when the request is sent. Not really a convenient way...
you can check the documents this:
https://curl.haxx.se/libcurl/c/hiperfifo.html
It's combined with libevent and libcurl.
When running, the program creates the named pipe "hiper.fifo"
Whenever there is input into the fifo, the program reads the input as a list
of URL's and creates some new easy handles to fetch each URL via the
curl_multi "hiper" API.
The fifo buffer is handled almost instantly, so you can even add more URL's while the previous requests are still being downloaded.
Then libcurl will download all easy handles asynchronously by calling curl_multi_socket_action ,so the control will return to system.
I am debugging an ASMX web service that receives "bursts" of requests. i.e., it is likely that the web service will receive 100 asynchronous requests within about 1 or 2 seconds. Each request seems to take about a second to process (this is expected and I'm OK with this performance). What is important however, is that each request is dealt with sequentially and no parallel processing takes places. I do not want any concurrent request processing due to the external components called by the web service. Is there any way I can force the web service to only handle each response sequentially?
I have seen the maxconnection attribute in the machine.config but this seems to only work for outbound connections, where as I wish to throttle the incoming connections.
Please note that refactoring into WCF is not an option at this point in time.
We are usinng IIS6 on Win2003.
What I've done in the past is to simply put a lock statement around any access to the external resource I was using. In my case, it was a piece of unmanaged code that claimed to be thread-safe, but which in fact would trash the C runtime library heap if accessed from more than one thread at a time.
Perhaps you should be queuing the requests up internally and processing them one by one?
It may cause the clients to poll for results (if they even need them), but you'd get the sequential pipeline you wanted...
In IIS7 you can set up a limit of connections allowed to a web site. Can you use IIS7?
I am writing a C++ application and would like to request several data files through a HTTP GET request simultaneously, where should I look to get started (needs to be cross-platform).
Run Application
Create a list of URLs { "http://host/file1.txt", "http://host/file2.txt", "http://host/file3.txt"}
Request all the URLs simultaneously and load the contents to variables (don't want disk writes). Each file has about 10kB of data.
What libraries would you recommend I use? libcurl? curlpp? boost asio? would I need to roll-my-own multi threading to request all the files simultaneously? is there a easier way?
Edit: I will need to request about 1000 GET requests simultaneously. Most likely I will do this in batches (100 at a time, and creating more connections as made one are completed).
I would recommend libcurl. I'm not super-familiar with it, but it does have a multi-interface for performing multiple simultaneous HTTP operations.
Depending on what solution you go with, it's possible to do asynchronous I/O without using multithreading. The key is to use the select(2) system call. select() takes a set of file descriptors and tells you if any of them have data available. If they do, you can then proceed to use read(2) or recv(2) on them without worrying about blocking.
Web browsers often maintain a pool of worker threads to do downloads, and assign downloads to them as they become free. IIRC the HTTP RFC has something to say about how many simultaneous connections you should make to the same server at the same time: too many is rude.
If several of the requests are to the same server, and it supports keep-alive (which almost everyone does), then that may be better behaviour than spamming it with multiple simultaneous requests. The general idea is that you use one TCP/IP connection for multiple requests in series, thus saving the handshaking overhead. The practical result, in my experience of implementing Java HTTPConnection classes, is that you introduce a subtle bug to do with not always clearing the state correctly when you re-use the connection for a new request, and spend considerable time staring at logging/sniffer data ;-)
libcurl certainly supports keepalive (enabled by default, I think).