Curl easy perform using multiple threads C++ Programming - c++

So far, I've been successful in pulling information from a service provider. However, I need to invoke this over parallel process with multiple threads for millions of requests.
Following is the piece of code
size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
((std::string*)userp)->append((char*)contents, size * nmemb);
return size * nmemb;
}
int main()
{
CURL *curl = curl_easy_init();
std::string readBuffer;
if(curl) {
CURLcode res;
curl_easy_setopt(curl, CURLOPT_URL, "service-url");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
}
}
Here are my two options
a) One is thread pool (Visual studio C++ 2010 - Thus no access to C++ 11)
b) Using curl_multi_perform
When I use thread pool -> Does invoking curl become a worker thread. How do I make user that the WriteCallback is specific to the thread so that no two threads overwrite the contents.
If I use curl_multi_perform, what do I need to do, to make sure that WriteCallback gives me the output for that particular handle?

You can use the multi interface to send X simultaneously requests and handle each request when response given.
Have a look at that C Example.

Related

libcurl downloads no data to buffer

I am using following code to download data from an url to memory (stream). Around 2% chance, the size of the stream is zero. I can download proper data from the same failing url if I try it another time. I am not sure if this is a network issue, CPU usage issue, or it's just the code not covering some corner cases. Please advice. Thanks!
static size_t write_data(char *ptr, size_t size, size_t nmemb, void *userdata)
{
std::vector<uchar> *stream = (std::vector<uchar>*)userdata;
size_t count = size * nmemb;
stream->insert(stream->end(), ptr, ptr + count);
return count;
}
static void CurlUrl(const char* img_url, std::vector<uchar>* stream) {
CURL *curl = curl_easy_init(); // curl_global_init is called eleswhere.
curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1);
curl_easy_setopt(curl, CURLOPT_URL, img_url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, stream);
curl_easy_setopt(curl, CURLOPT_TIMEOUT, 10);
CURLcode res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
}
If it didn't deliver any download data into the buffer via the callback, it means that the transfer either failed or that there was exactly zero bytes to transfer.
Check the return code from curl_easy_perform() as it might actually tell you exactly what happened.
Use CURLOPT_VERBOSE to see what's going on if (1) is not enough.
Use CURLOPT_ERRORBUFFER to get a better error description if it fails if (2) is not enough.

Issue with CURLOPT_WRITEDATA

I am using libcurl to fetch json data using GET request from a webserver.
This is my sample code:
char *DownloadedResponse;
static int writer(char *data, size_t size, size_t nmemb, char *buffer_in)
{
if (buffer_in != NULL)
{
buffer_in = new char[size*nmemb];
strcpy(buffer_in,data);
DownloadedResponse = buffer_in;
return size * nmemb;
}
return 0;
}
char * DownloadJSON(string URL)
{
CURL *curl;
CURLcode res;
struct curl_slist *headers=NULL;
curl_slist_append(headers, "Accept: application/json");
curl_slist_append( headers, "Content-Type: application/json");
curl_slist_append( headers, "charsets: utf-8");
curl = curl_easy_init();
if (curl)
{
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl, CURLOPT_URL, URL.c_str());
curl_easy_setopt(curl, CURLOPT_HTTPGET,1);
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl,CURLOPT_WRITEFUNCTION,writer);
res = curl_easy_perform(curl);
if (CURLE_OK == res)
{
char *ct;
res = curl_easy_getinfo(curl, CURLINFO_CONTENT_TYPE, &ct);
if((CURLE_OK == res) && ct)
{
cout<<"\nresponse received: "<<DownloadedResponse;
}
else
{
curl_slist_free_all(headers);
curl_easy_cleanup(curl);
curl = NULL;
return NULL;
}
}
}
curl_slist_free_all(headers);
curl_easy_cleanup(curl);
curl = NULL;
}
Here I am able to get json data in DownloadedResponse in callback "writer" of CURLOPT_WRITEFUNCTION.
But if I print using custom pointer of CURLOPT_WRITEDATA,
char *dataPointer = NULL;
CURLcode curl_easy_setopt(curl, CURLOPT_WRITEDATA, dataPointer);
cout<<dataPointer;
Output of dataPointer is empty.
What is the issue here since i able to print json data in callback of CURLOPT_WRITEFUNCTION but not in the pointer of CURLOPT_WRITEDATA
You write a function that takes data read from the network, and writes it to where you want it.
static int writer(char *data, size_t size, size_t nmemb, char *buffer_in){
if (buffer_in != NULL) {
// very bad code which is never executed
}
return 0;
}
In order for that function to write the data, it has to know where to write it, so you tell it to write to NULL
char *dataPointer = NULL;
CURLcode curl_easy_setopt(curl, CURLOPT_WRITEDATA, dataPointer);
What value do you tell it to use as buffer_in? You pass it dataPointer, which is NULL, so you just told it buffer_in = NULL. I think instead you meant to say "the address of dataPointer", which would be &dataPointer.
Technically, I have answered your question now. You passed it NULL for the buffer, so the write function exited immediately. But there's more. Now you get to execute that really bad code in writer().
if (buffer_in != NULL)
{
// if buffer_in already has allocated memory then leak it immediately
// create a new buffer of memory to leak later
buffer_in = new char[size*nmemb];
// store the data in buffer_in
// assume it is null terminated (it is not)
// rather than using the length we already know
strcpy(buffer_in,data);
// remember buffer_in? We don't use it so assign that data pointer to a global variable.
DownloadedResponse = buffer_in;
// return size of this particular chunk of data
return size * nmemb;
}
This function MUST use the length of the data, and not assume data is null terminated (see https://curl.haxx.se/libcurl/c/CURLOPT_WRITEFUNCTION.html).
This function MUST be able to handle the data in multiple small pieces by adding them to what it has already read. You can't call new and then discard the new memory. And you can't do that anyway because you just leaked that memory -- every new must be matched with exactly one delete. In fact, you would be very well advised not to use new or delete at all, now that we have the standard library.
This function should use the buffer_in argument you give it rather than a global variable, but you can use a global variable if you want, it's just error prone. It's not literally an error like the other stuff.
The whole point of buffer_in is to give you a persistent data structure where you can accumulate the answers. It probably should be in local scope around the curl_easy_execute, so you can then just return the content from that data structure if you got CURLE_OK. I strongly recommend you write the data to std::vector, so you don't have to keep track of memory allocation. You have trouble with it, but you don't need to do it at all. Modern style says everybody has trouble with it, so just let the standard library handle it.
You claim to follow the example in the docs, which links to https://curl.haxx.se/libcurl/c/getinmemory.html If you look again, you will see what they are doing, and how your code doesn't match. In particular, they pass &chunk (the address of chunk) and then write data into chunk so they keep what was there before.
struct MemoryStruct {
char *memory;
size_t size;
};
static size_t
WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
// here is where they get access to the buffer
struct MemoryStruct *mem = (struct MemoryStruct *)userp;
In the call to curl, you will find the struct locally defined, then the remote call:
struct MemoryStruct chunk;
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&chunk);
res = curl_easy_perform(curl_handle);
if (stuff)
printf("%lu bytes retrieved\n", (long)chunk.size);

Posting with libcurl in C++, write callback params are garbage

Developing on Win64 with Visual Studio 2013 Community, deploying to both Win64 and Linux with cross platform wxWidgets. I am trying to emulate the following curl.exe command line with C++ using libcurl:
curl.exe -X POST -g "single-url-string"
This is for an IoT feature of an app, where an end-user supplies the single-url-string to control their device. The reason this logic is not just executing curl.exe as an external process is because this logic runs in its own thread, and wxWidgets does not support launching external executables when outside the main thread.
Normally when performing a POST with curl.exe, the post data is supplied as an option. This tells curl.exe the operation is a POST to the supplied url, and here is the data for that POST. As you can see, what I'm trying to do is a GET style url (with the parameters embedded in the url) but then changing the operation to a POST. It's done this way because research shows asking end-users to supply two separate url and data strings is simply too complex for them. So we came up with this easier single string end-users must supply, which is usually just copying a string from their device manual without having to interpret the string, much less break it into separate meaningful strings.
So, the issue at hand is: I have my simple C++ libcurl POST routine in two versions, but in both versions the parameters received by the write callback are bad. The two versions are a POST with a single url string, and a POST with the post data provided as a separate option to the url string.
The problems are 1) using the single string version does not execute a POST, and it's write callback params are bad; and 2) using the two string version does execute a POST, but the write callback params are bad, in a different way.
The data pointer parameter in the write callback points to memory address 1 in both versions, the size parameter appears good in both versions, but the nmemb parameter is either a huge random value (single string version) or zero (two string POST version).
Here's my code, and yes I call curl_global_init() at app start.
size_t CX_IOT_THREAD::curl_write_callback(char *ptr, size_t size, size_t nmemb, void *userdata)
{
// storage for transferred data:
const int dataStoreSize = CURL_MAX_WRITE_SIZE + 1;
char dataStore[dataStoreSize];
memset(dataStore, 0, dataStoreSize); // zeroed out
size_t dataSize = size * nmemb; // bytes sent
if (dataSize)
{
memcpy(dataStore, ptr, dataSize); // copy into buffer sized so we'll have a terminating NULL char
wxString msg = wxString::Format(wxT("%s"), dataStore); // send as event, eventually to the log
mp_queue->Report(CX_IOTTHR_CMD_ACCESS_JOB, msg);
// must return byte count processed for libcurl to be happy:
return dataSize; /**/
}
return size; // should be dataSize, but because nmemb is bad, I’m using size; it works.
}
cx_int CX_IOT_THREAD::Post(std::string& url)
{
if (url.length() == 0)
return -1;
char errBuf[CURL_ERROR_SIZE];
errBuf[0] = '\0';
static const char *postthis = "name=Bloke&age=67";
CURLcode ret;
CURL *hnd = curl_easy_init();
curl_easy_setopt(hnd, CURLOPT_URL, url.c_str());
curl_easy_setopt(hnd, CURLOPT_POSTFIELDS, postthis);
curl_easy_setopt(hnd, CURLOPT_POSTFIELDSIZE, (long)strlen(postthis));
curl_easy_setopt(hnd, CURLOPT_ERRORBUFFER, errBuf);
curl_easy_setopt(hnd, CURLOPT_WRITEFUNCTION, &CX_IOT_THREAD::curl_write_callback);
curl_easy_setopt(hnd, CURLOPT_WRITEDATA, NULL);
curl_easy_setopt(hnd, CURLOPT_NOPROGRESS, 1L);
curl_easy_setopt(hnd, CURLOPT_USERAGENT, "curl/7.49.1");
curl_easy_setopt(hnd, CURLOPT_MAXREDIRS, 50L);
// curl_easy_setopt(hnd, CURLOPT_CUSTOMREQUEST, "POST");
ret = curl_easy_perform(hnd);
curl_easy_cleanup(hnd);
if (ret != CURLE_OK)
{
wxString msg = wxString::Format(wxT("Attempted POST failed, libcurl return code '%d'."), (cx_int)ret);
mp_queue->Report(CX_IOTTHR_CMD_ACCESS_JOB, msg, (cx_int)ret);
cx_int len = strlen(errBuf);
if (len > 0)
msg = wxString::Format("%s%s", errBuf, ((errBuf[len - 1] != '\n') ? "\n" : ""));
else msg = wxString::Format("%s\n", curl_easy_strerror(ret));
mp_queue->Report(CX_IOTTHR_CMD_ACCESS_JOB, msg, (cx_int)ret);
}
return (cx_int)ret;
}
Any ideas why the write callback parameters are bad? Any idea why the single string version does not even do a post? (The single string version is the above with the 2 POSTFIELDS options commented out and the CUSTOMREQUEST one enabled.)
As Igor Tandetnik points out, the callback must be static.

Still reachable leak summary in Valgrind for libcurl c++ code

The following functions in libcurl saves a file and returns the http status code. However, when I run this using valgrind, it is reporting 0 bytes for "definitely lost", "indirectly lost", "possibly lost", but it is reporting 47448 bytes for "still reachable". I'm trying to resolve the "still reachable" bytes.
Are there any potential memory leaks in the code below?
size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream){
size_t written = fwrite(ptr, size, nmemb, stream);
return written;
}
void connectAndSaveFile(char* url, char* output_file_name){
CURL *curl;
curl = curl_easy_init();
if (curl) {
FILE *fp = fopen(output_file_name,"wb");
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
curl_easy_perform(curl);
curl_easy_cleanup(curl);
fclose(fp);
}
}
string get_http_status_code(string URL) {
CURL *session;
session = curl_easy_init();
curl_easy_setopt(session, CURLOPT_URL, URL.c_str());
curl_easy_setopt(session, CURLOPT_NOBODY, true);
CURLcode curl_code = curl_easy_perform (session);
long http_code = 0;
curl_easy_getinfo (session, CURLINFO_RESPONSE_CODE, &http_code);
curl_easy_cleanup(session);
std::ostringstream buff;
buff << http_code;
return buff.str();
}
"still reachable" is most frequently not actually a leak
you might get slightly less memory reachable if you use curl_global_init and curl_global_cleanup
The most of the code mention above uses libcurl. So I think we would have to look the documentation and read about API and what are the recommended steps.
However in the below method, client is passing pointer in which fwrite API is writing and returns back to the caller. This memory needs to be released in client(who would be calling this function) code once usage is complete.
size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream)
However in pure C++ ways, we should use std::fstream & std::string so that we need not worry about the memory management. For more informtion you may refer to following link:
https://stackoverflow.com/a/22048298/2724703

Reading the content of a PHP from C++

I am trying to read the content of a PHP / HTML file on a remote web server using C++, but haven't found a way to do it. I want to pass GET statements to it, so http://example.com/login.php?user=abc&password=def.
How would I do it?
Your best bet is to use an external library. libcurl is popular and fairly easy to use.
Here's a simple example, you need to add error checking though:
string data;
CURL *curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, url_.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &data);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curlWrite);
curl_easy_perform(curl);
Your callback would look something like this:
size_t curlWrite(void *ptr, size_t size, size_t nmemb, void *usrPtr)
{
size_t bytes = size * nmemb;
string *data = static_cast<string *>(usrPtr);
data->append(static_cast<const char *>(ptr), bytes);
return bytes;
}
You can add your GET parameters on the end of the URL.