I'm working on a program which will download lyrics from sites like AZLyrics. I'm using libcurl.
It's my code
lyricsDownloader.cpp
#include "lyricsDownloader.h"
#include <curl/curl.h>
#include <cstring>
#include <iostream>
#define DEBUG 1
/////////////////////////////////////////////////////////////////////////////
size_t lyricsDownloader::write_data_to_var(char *ptr, size_t size, size_t nmemb, void *userdata) // this function is a static member function
{
ostringstream * stream = (ostringstream*) userdata;
size_t count = size * nmemb;
stream->write(ptr, count);
return count;
}
string AZLyricsDownloader::toProviderCode() const
{ /*this creates an url*/ }
CURLcode AZLyricsDownloader::download()
{
CURL * handle;
CURLcode err;
ostringstream buff;
handle = curl_easy_init();
if (! handle) return static_cast<CURLcode>(-1);
// set verbose if debug on
curl_easy_setopt( handle, CURLOPT_VERBOSE, DEBUG );
curl_easy_setopt( handle, CURLOPT_URL, toProviderCode().c_str() ); // set the download url to the generated one
curl_easy_setopt(handle, CURLOPT_WRITEDATA, &buff);
curl_easy_setopt(handle, CURLOPT_WRITEFUNCTION, &AZLyricsDownloader::write_data_to_var);
err = curl_easy_perform(handle); // The segfault should be somewhere here - after calling the function but before it ends
cerr << "cleanup\n";
curl_easy_cleanup(handle);
// copy the contents to text variable
lyrics = buff.str();
return err;
}
main.cpp
#include <QString>
#include <QTextEdit>
#include <iostream>
#include "lyricsDownloader.h"
int main(int argc, char *argv[])
{
AZLyricsDownloader dl(argv[1], argv[2]);
dl.perform();
QTextEdit qtexted(QString::fromStdString(dl.lyrics));
cout << qPrintable(qtexted.toPlainText());
return 0;
}
When running
./maelyrica Anthrax Madhouse
I'm getting this logged from curl
* About to connect() to azlyrics.com port 80 (#0)
* Trying 174.142.163.250... * connected
* Connected to azlyrics.com (174.142.163.250) port 80 (#0)
> GET /lyrics/anthrax/madhouse.html HTTP/1.1
Host: azlyrics.com
Accept: */*
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.0.12
< Date: Thu, 05 Jul 2012 16:59:21 GMT
< Content-Type: text/html
< Content-Length: 185
< Connection: keep-alive
< Location: http://www.azlyrics.com/lyrics/anthrax/madhouse.html
<
Segmentation fault
Strangely, the file is there. The same error is displayed when there's no such page (redirect to azlyrics.com mainpage)
What am I doing wrong?
Thanks in advance
EDIT: I made the function for writing data static, but this changes nothing.
Even wget seems to have problems
$ wget http://www.azlyrics.com/lyrics/anthrax/madhouse.html
--2012-07-06 10:36:05-- http://www.azlyrics.com/lyrics/anthrax/madhouse.html
Resolving www.azlyrics.com... 174.142.163.250
Connecting to www.azlyrics.com|174.142.163.250|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.
Why does opening the page in a browser work and wget/curl not?
EDIT2: After adding this:
curl_easy_setopt(handle, CURLOPT_FOLLOWLOCATION, 1);
And making the function static everything's OK.
Your code
curl_easy_setopt(handle,CURLOPT_WRITEFUNCTION,&AZLyricsDownloader::write_data_to_var);
and the following quote from the documentation from libcurl
There's basically only one thing to keep in mind when using C++
instead of C when interfacing libcurl:
The callbacks CANNOT be non-static class member functions
Example C++ code:
class AClass { static size_t write_data(void *ptr, size_t size, size_t nmemb, void* ourpointer) { /* do what you want with the data */ } }
could be the source of your problem as your function is not a static member. Even if not you are breaking this rule.
This may not solve your problem but given the amount of code you have posted in your example, that was the first thing that immediately came to mind and it is worth changing this as recommended by libcurl. If it does not solve your problem I would suggest identifying the error you are getting in more detail so that you can pose a more specific question next time (with a lot less code displayed).
Related
Goal:
To slightly slow down the sending of requests when multiplexing with libcurl, possibly by introducing small time delays in between the sending of each of the HTTP/2 request to a server. The multiplexing program needs to listen out for any changes from one webpage for around 3 seconds at a set time once a day. However, the multiplexing program finishes execution in under a second even when setting the variable num_transfers to the thousands (variable seen in the code further below).
It would be useful if there was a way to introduce for example, a 3 millisecond delay in between the transmission of a group of multiplex requests. This would mean the program could still send requests asynchronously (so it be won't blocked / won't have to wait for a response from the server before sending the next request) but at a slightly slower rate.
A definition of multiplexing taken from this resource:
Multiplexing is a method in HTTP/2 by which multiple HTTP requests can be sent and responses can be received asynchronously via a single TCP connection. Multiplexing is the heart of HTTP/2 protocol.
Ideal outcome:
An ideal program for this situation would be one that could send a few non-blocking/ multiplex requests every approx. 3 milliseconds. The program would run for around 3-4 seconds in total.
Current problem:
Currently the program is too fast when multiplexing, meaning a few thousand requests could be sent and received within around 350 milliseconds which can lead to the sending IP address to be blocked for a few minutes..
Please note it is not an option in this scenario to use a synchronous / blocking approach - a requirement of this program is that it must not be forced to wait for a response to be returned before sending another request. The issue lies in the fact that the program is too fast at sending a high number of requests.
Attempts at solving:
In the DO...WHILE loop seen in the code below, an attempt was made to introduce some artificial time delays at various locations within the loop using usleep(microseconds) from unistd.h, but this introduced a time delay either before or after all of the requests were sent, rather than an interleaved time delay between sending requests.
Current code:
#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <chrono>
#include <string>
/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>
/* curl stuff */
#include <curl/curl.h>
#include <curl/mprintf.h>
#ifndef CURLPIPE_MULTIPLEX
#define CURLPIPE_MULTIPLEX 0
#endif
struct CURLMsg *msg;
struct transfer {
CURL *easy;
unsigned int num;
//FILE *out;
std::string contents;
};
struct MemoryStruct {
char *memory;
size_t size;
};
struct MemoryStruct chunk;
#define NUM_HANDLES 1000
static size_t WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp) {
transfer *t = (transfer *)userp;
size_t realsize = size * nmemb;
t->contents.append((const char*)contents, realsize);
return realsize;
}
static void setup(struct transfer *t, int num)
{
CURL *hnd;
hnd = t->easy = curl_easy_init();
curl_easy_setopt(hnd, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
curl_easy_setopt(hnd, CURLOPT_WRITEDATA, (void *)t);
/* set the same URL */
curl_easy_setopt(hnd, CURLOPT_URL, "https://someurl.xyz");
/* HTTP/2 please */
curl_easy_setopt(hnd, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
/* we use a self-signed test server, skip verification during debugging */
curl_easy_setopt(hnd, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(hnd, CURLOPT_SSL_VERIFYHOST, 0L);
#if (CURLPIPE_MULTIPLEX > 0)
/* wait for pipe connection to confirm */
curl_easy_setopt(hnd, CURLOPT_PIPEWAIT, 1L);
#endif
}
int main() {
struct transfer trans[NUM_HANDLES];
CURLM *multi_handle;
int i;
int still_running = 0; /* keep number of running handles */
int num_transfers = 3;
chunk.memory = (char*)malloc(1);
chunk.size = 0;
/* init a multi stack */
multi_handle = curl_multi_init();
for(i = 0; i < num_transfers; i++) {
setup(&trans[i], i);
/* add the individual transfer */
curl_multi_add_handle(multi_handle, trans[i].easy);
}
curl_multi_setopt(multi_handle, CURLMOPT_PIPELINING, CURLPIPE_MULTIPLEX);
// curl_multi_setopt(multi_handle, CURLMOPT_MAX_TOTAL_CONNECTIONS, 1L);
// Main loop
do {
CURLMcode mc = curl_multi_perform(multi_handle, &still_running);
if(still_running) {
/* wait for activity, timeout or "nothing" */
mc = curl_multi_poll(multi_handle, NULL, 0, 1000, NULL);
}
if(mc) {
break;
}
// Get response
do {
int queued;
msg = curl_multi_info_read(multi_handle, &queued);
if ((msg) && (msg->msg == CURLMSG_DONE) && (msg->data.result == CURLE_OK)) {
// Get size of payload
curl_off_t dl_size;
curl_easy_getinfo(msg->easy_handle, CURLINFO_SIZE_DOWNLOAD_T, &dl_size);
for (int i = 0; i < num_transfers; i++) {
std::cout << trans[i].contents;
}
std::cout << std::flush;
}
} while (msg);
} while (still_running);
for(i = 0; i < num_transfers; i++) {
curl_multi_remove_handle(multi_handle, trans[i].easy);
curl_easy_cleanup(trans[i].easy);
}
free(chunk.memory);
curl_multi_cleanup(multi_handle);
return 0;
}
Summary question:
Is there a way to send a group of multiplexed requests to a single URL approximately every 3 milliseconds? Another idea to attempt to solve this was to wrap the entire functionality contained in main() within a FOR loop, and putting a time delay at the end of each iteration of the FOR loop.
Goal: To send requests to the same URL without having to wait for the request-sending function to finish executing.
Currently when I send a request to a URL, I have to wait around 10 ms for the server's response before sending another request using the same function. The aim is to detect changes on a webpage slightly faster than the program currently is doing, so for the WHILE loop to behave in a non-blocking manner.
Question: Using libcurl C++, if I have a WHILE loop that calls a function to send a request to a URL, how can I avoid waiting for the function to finish executing before sending another request to the SAME URL?
Note: I have been researching libcurl's multi-interface but I am struggling to determine if this interface is more suited to parallel requests to multiple URLs rather than sending requests to the same URL without having to wait for the function to finish executing each time. I have tried the following and looked at these resources:
an attempt at multi-threading a C program using libcurl requests
How to do curl_multi_perform() asynchronously in C++?
http://www.godpatterns.com/2011/09/asynchronous-non-blocking-curl-multi.html
https://curl.se/libcurl/c/multi-single.html
https://curl.se/libcurl/c/multi-poll.html
Here is my attempt at sending a request to one URL, but I have to wait for the request() function to finish and return a response code before sending the same request again.
#include <vector>
#include <iostream>
#include <curl/curl.h>
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
auto res = curl_easy_perform(curl);
// ...
// Print variable "response"
// ...
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
while (true) {
// blocking: request() must complete before executing again
long response_code = request(curl, "https://example.com");
// ...
// Some condition breaks loop
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
I'm at a point where I have tried to understand the multi-interface documentation as best as possible, but still struggle to fully understand it / determine if it's actually suited to my particular problem. Apologies if this question appears to have not provided enough of my own research, but there are gaps in my libcurl knowledge I'm struggling to fill.
I'd appreciate it if anyone could suggest / explain ways in which I can modify my single libcurl example above to behave in a non-blocking manner.
EDIT:
From libcurl's C implemented example called "multi-poll", when I run the below program the URL's content is printed, but because it only prints once despite the WHILE (1) loop I'm confused as to whether or not it is sending repeated non-blocking requests to the URL (which is the aim), or just one request and is waiting on some other change/event?
#include <stdio.h>
#include <string.h>
/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>
/* curl stuff */
#include <curl/curl.h>
int main(void)
{
CURL *http_handle;
CURLM *multi_handle;
int still_running = 1; /* keep number of running handles */
curl_global_init(CURL_GLOBAL_DEFAULT);
http_handle = curl_easy_init();
curl_easy_setopt(http_handle, CURLOPT_URL, "https://example.com");
multi_handle = curl_multi_init();
curl_multi_add_handle(multi_handle, http_handle);
while (1) {
CURLMcode mc; /* curl_multi_poll() return code */
int numfds;
/* we start some action by calling perform right away */
mc = curl_multi_perform(multi_handle, &still_running);
if(still_running) {
/* wait for activity, timeout or "nothing" */
mc = curl_multi_poll(multi_handle, NULL, 0, 1000, &numfds);
}
// if(mc != CURLM_OK) {
// fprintf(stderr, "curl_multi_wait() failed, code %d.\n", mc);
// break;
// }
}
curl_multi_remove_handle(multi_handle, http_handle);
curl_easy_cleanup(http_handle);
curl_multi_cleanup(multi_handle);
curl_global_cleanup();
return 0;
}
You need to move curl_multi_add_handle and curl_multi_remove_handle inside the
while loop. Below is the extract from curl documentation https://curl.se/libcurl/c/libcurl-multi.html
When a single transfer is completed, the easy handle is still left added to the >multi stack. You need to first remove the easy handle with curl_multi_remove_handle >and then close it with curl_easy_cleanup, or possibly set new options to it and add >it again with curl_multi_add_handle to start another transfer.
i am trying to implement a quite specialized email-client using libcurl (7.66) for sending the email.
Sadly, it always crashes on calling curl_easy_perform(curl), no matter what i try whatsoever.
Currently in the process_mail callback i have commented everything out; just returning 0. I am pretty shure, this should be enough for at least not crasing the app ...
The real goal would be having the mail as a string in the memory and just copy it over to the stream.
I really appreciate any kind of help ... i have quite a delay because of this issue.
Here is the header-file:
#ifndef MAILMESSAGE_H
#define MAILMESSAGE_H
#include <QObject>
#include <QDebug>
#include <QFile>
#include <QIODevice>
#include <QByteArray>
#include <curl/curl.h>
#include <stdio.h>
class MailMessage : public QObject
{
Q_OBJECT
public:
explicit MailMessage(QObject *parent = nullptr);
void setSMTPPort(QString smtp_port);
void setSMTPAddress(QString smtp_address);
void setSMTPUser(QString smtp_user);
void setSMTPPassword(QString smtp_password);
void addTo(QString address_to);
void setFrom(QString address_from);
void setSubject(QString subject);
void setAlternativeText(QString text);
void setHTML(QString html);
void addAttachment(QString attachment_path, QString filename);
void generateMessage();
// return POSIX-style: 0: ok; 1: error
int sendMail();
QString message;
int message_size;
const char* message_char;
QString smtp_port;
QString smtp_address;
QString smtp_user;
QString smtp_password;
//QStringList addresses_to;
QString address_to;
QString address_from;
QString subject;
QString payload_text;
QString payload_html;
QStringList attachments_base64;
double upload_speed;
double upload_time;
protected:
static size_t process_mail(void *ptr, size_t size, size_t nmemb, void *userp);
signals:
public slots:
};
#endif // MAILMESSAGE_H
And the relevant parts of the implementation:
void MailMessage::generateMessage()
{
this->message = //"To: "+ this->address_to +"\r\n"
"From: "+ this->address_from +"\r\n"
"Subject: "+ this->subject +"\r\n"
"Content-Type: multipart/mixed; boundary=MixedBoundary"
"\r\n"
"--MixedBoundary"
"Content-Type: multipart/alternative; boundary=AlternativeBoundary"
"\r\n"
"--AlternativeBoundary"
"Content-Type: text/plain; charset=\"utf-8\""
"\r\n"+ this->payload_text +"\r\n"
"--AlternativeBoundary"
"Content-Type: text/html; charset=\"utf-8\""
"\r\n"+ this->payload_html +"\r\n"
"--AlternativeBoundary--"
"\r\n"
"--MixedBoundary";
/*
"Content-Type: application/pdf; name=\"HelloWorld.pdf\""
"Content-Transfer-Encoding: base64"
"Content-Disposition: attachment; filename=\"HelloWorld.pdf\"";
*/
this->message.append("\r\n"
"--MixedBoundary--");
QByteArray array = this->message.toLocal8Bit();
this->message_char = array.data();
this->message_size = sizeof(this->message_char);
}
size_t MailMessage::process_mail(void *ptr, size_t size, size_t nmemb, void *userprocess)
{
size_t retcode = 0;
/*
MailMessage *self = reinterpret_cast<MailMessage*>(userprocess);
if (sizeof(self->message_char) == 0)
return retcode;
retcode = (size * nmemb >= sizeof(self->message_char)) ? sizeof(self->message_char) : size * nmemb;
self->message_size -= retcode;
memcpy(ptr, self->message_char, retcode);
self->message_char += retcode;
*/
return retcode;
}
int MailMessage::sendMail()
{
//qDebug() << this->message;
CURLcode res = CURLE_OK;
CURL *curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L);
// CURLUSESSL_ALL, CURLUSESSL_NONE, CURLUSESSL_TRY, CURLUSESSL_CONTROL
curl_easy_setopt(curl, CURLOPT_USE_SSL, CURLUSESSL_ALL);
curl_easy_setopt(curl, CURLOPT_URL, this->smtp_address.toStdString().c_str());
curl_easy_setopt(curl, CURLOPT_USERNAME, this->smtp_user.toStdString().c_str());
curl_easy_setopt(curl, CURLOPT_PASSWORD, this->smtp_password.toStdString().c_str());
curl_easy_setopt(curl, CURLOPT_MAIL_FROM, this->address_from.toStdString().c_str());
curl_easy_setopt(curl, CURLOPT_MAIL_RCPT, this->address_to.toStdString().c_str());
curl_easy_setopt(curl, CURLOPT_READFUNCTION, &MailMessage::process_mail);
curl_easy_setopt(curl, CURLOPT_READDATA, this);
/* Send the message */
res = curl_easy_perform(curl);
/* Check for errors */
if(res != CURLE_OK)
{
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(res));
}
else
{
curl_easy_getinfo(curl, CURLINFO_SPEED_UPLOAD, &upload_speed);
curl_easy_getinfo(curl, CURLINFO_TOTAL_TIME, &upload_time);
qDebug() << "Speed: "+QString::number(upload_speed) + " Time: "+QString::number(upload_time);
}
//curl_slist_free_all(this->address_to.toStdString().c_str());
curl_easy_cleanup(curl);
return 0;
}
And finally some excerpt of the console output:
* start date: May 8 08:01:21 2019 GMT
* expire date: May 13 23:59:59 2021 GMT
* subjectAltName: host "mail.gmx.net" matched cert's "mail.gmx.net"
* issuer: C=DE; O=T-Systems International GmbH; OU=T-Systems Trust Center; ST=Nordrhein Westfalen; postalCode=57250; L=Netphen; street=Untere Industriestr. 20; CN=TeleSec ServerPass Extended Validation Class 3 CA
* SSL certificate verify ok.
> EHLO cf-19
* SMTP 0x5572580cd700 state change from UPGRADETLS to EHLO
< 250-gmx.com Hello cf-19 [2.202.204.204]
* Curl_pp_readresp_ 55 bytes of trailing server response left
< 250-8BITMIME* Curl_pp_readresp_ 41 bytes of trailing server response left
< 250-AUTH LOGIN PLAIN
* Curl_pp_readresp_ 19 bytes of trailing server response left
< 250 SIZE 69920427
> AUTH PLAIN
* SASL 0x5572580cd780 state change from STOP to PLAIN
* SMTP 0x5572580cd700 state change from EHLO to AUTH
< 334
> AHNhbXVlbF9iZWNrZXJAZ214LmRlADI4OTZhY2Q2
* SASL 0x5572580cd780 state change from PLAIN to FINAL
< 235 Authentication succeeded
* SASL 0x5572580cd780 state change from FINAL to STOP
* SMTP 0x5572580cd700 state change from AUTH to STOP
* STATE: PROTOCONNECT => DO handle 0x5572580d0678; line 1701 (connection #0)
* DO phase starts
> MAIL FROM:<xxxxxxx#xxx.xx>
* SMTP 0x5572580cd700 state change from STOP to MAIL
* STATE: DO => DOING handle 0x5572580d0678; line 1743 (connection #0)
< 250 Requested mail action okay, completed
03:37:55: The program has unexpectedly finished.
03:37:55: The process was ended forcefully.
QByteArray array = this->message.toLocal8Bit();
This creates a local QByteArray object, a local variable in the generateMessage() method. When generateMessage() returns, this object and everything in it will, of course, be destroyed just like any other function local variable. That's how local function variables work in C++.
this->message_char = array.data();
this->message_size = sizeof(this->message_char);
This saves a pointer to array's internal buffer in message_char. This also saves sizeof(this->message_char) in message_size. Since message_char is a const char *, this will always save sizeof(const char *), which will typically be 4 or 8 (depending on your operating system), in message_size, no matter how big the array is. That's already an obvious fail at this point, but not the ultimate fail that crashes the shown code.
The ultimate fail is that, as I've explained earlier, as soon as this function returns -- which is right about now (since this is the last statement in this function) -- array and all of its contents gets immediately destroyed. The aforementioned pointer to array's internal buffer will become a dangling pointer, to the internal contents of an object that no longer exists, and any further attempts to dereference this pointer results in undefined behavior, and a likely crash.
This is the obvious reason for your crash: improper usage of dangling pointers to destroyed objects. Additionally contributory factor is improper usage of sizeof (in the event your message's actual size happens to be less than the size of a pointer on your operating system).
I'm very new to HTTP commands and the libcurl library. I know how to get the HTTP response code but not the HTTP response string. Following is the code snippet that I wrote to get the response code. Any help on how to get the response string will be highly appreciated!!!
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
CURLcode ret = curl_easy_perform(curl);
if (ret != CURLE_OK) {
LOG(INFO) << "Failed to perform the request. "
<< "Return code: " << ret;
return false;
}
std::unique_ptr<int64_t> httpCode(new int64_t);
// Get the last response code.
ret = curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, httpCode.get());
if (ret != CURLE_OK) {
LOG(INFO) << "curl_easy_getinfo failed to retrieve http code. "
<< "Return code: " << ret;
return false;
}
I tried doing this as well to get the HTTP response string in readBuffer.
static size_t WriteCallback(char *contents, size_t size, size_t nmemb, void *userp)
{
((std::string*)userp)->append((char*)contents, size * nmemb);
return size * nmemb;
}
std::string readBuffer;
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
CURLcode ret = curl_easy_perform(curl);
cout << readBuffer << "\n";
But the readBuffer is empty. I don't understand where I am going wrong. Any pointers on how to solve this will be really nice!
There doesn't look to be much wrong with your code. I ran the following code, based on yours, to read the front page of the BBC news website:
#include <iostream>
#include <string>
#include <curl/curl.h>
size_t WriteCallback(char *contents, size_t size, size_t nmemb, void *userp)
{
((std::string*)userp)->append((char*)contents, size * nmemb);
return size * nmemb;
}
int main(int argc, char * argv[])
{
curl_global_init(CURL_GLOBAL_ALL);
CURL* easyhandle = curl_easy_init();
std::string readBuffer;
curl_easy_setopt(easyhandle, CURLOPT_URL, "http://www.bbc.co.uk/news");
curl_easy_setopt(easyhandle, CURLOPT_VERBOSE, 1L);
curl_easy_setopt(easyhandle, CURLOPT_PROXY, "http://my.proxy.net"); // replace with your actual proxy
curl_easy_setopt(easyhandle, CURLOPT_PROXYPORT, 8080L);
curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(easyhandle, CURLOPT_WRITEDATA, &readBuffer);
curl_easy_perform(easyhandle);
std::cout << readBuffer << std::endl;
return 0;
}
... and I got the full HTML response. NB I had to use a proxy, and I enabled verbose mode to see what was happening. Also NB; the HTTP server might be fussy about the URL you give it: if I replace http://www.bbc.co.uk/news with http://www.bbc.co.uk/news/ (i.e. with a trailing slash) then I get no data back (a response length of zero), as noted by the verbose curl output:
Host: www.bbc.co.uk
Accept: */*
Proxy-Connection: Keep-Alive
< HTTP/1.1 301 Moved Permanently
< X-Cache-Action: PASS (no-cache-control)
< Vary: Accept-Encoding
< X-Cache-Age: 0
< Content-Type: text/html;charset=utf-8
< Date: Mon, 10 Jul 2017 16:42:20 GMT
< Location: http://www.bbc.co.uk/news
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< Content-Length: 0
< X-Cache: MISS from barracuda1.[my proxy].net
< Connection: keep-alive
<
* Connection #0 to host [my proxy] left intact
Here, Content-Length is 0 and the WriteCallback() function is never called. Hope this helps.
For the numerical response code, getinfo with CURLINFO_RESPONSE_CODE is the way to go:
long response_code;
curl_easy_getinfo(handle, CURLINFO_RESPONSE_CODE,&response_code);
However there is no equivalent getinfo capture for the server's response text. If you need the server's text, inspect the raw HTTP headers. There are two ways to do this:
Enable writing headers to the payload with CURLOPT_HEADER, then extract the headers from the combined payload, splitting the body on \n\n
Set a header callback function with CURLOPT_HEADERFUNCTION and parse directly from that
My current curl setup to call a webpage, save it into a string, and reiterate the process after sleeping for a second. This is the code to write into the string:
#include <curl/curl.h>
#include <string>
#include <iostream>
#include <thread>
#include <chrono>
size_t curl_writefunc(void* ptr, size_t size, size_t nmemb, std::string* data)
{
data->append((const char*)ptr, size * nmemb);
return size * nmemb;
}
void curl_handler(std::string& data)
{
int http_code = 0;
CURL* curl;
// Initialize cURL
curl = curl_easy_init();
// Set the function to call when there is new data
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curl_writefunc);
// Set the parameter to append the new data to
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &data);
// Set the URL to download; just for this question.
curl_easy_setopt(curl, CURLOPT_URL, "http://www.example.com/");
// Download
curl_easy_perform(curl);
// Get the HTTP response code
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_code);
// Clean up
curl_easy_cleanup(curl);
curl_global_cleanup();
}
int main()
{
bool something = true;
std::string data;
while (something)
{
curl_handler(data);
std::cout << data << '\n';
data.clear();
std:: this_thread:: sleep_for (std:: chrono:: seconds(1));
}
}
However it runs into a problem about 20 minutes into runtime and this is the message it confronts me with:
140377776379824:error:02001018:system library:fopen:Too many open files:bss_file.c:173:fopen('/etc/ssl/openssl.cnf','rb')
140377776379824:error:2006D002:BIO routines:BIO_new_file:system lib:bss_file.c:178:
140377776379824:error:0E078002:configuration file routines:DEF_LOAD:system lib:conf_def.c:199:
It seems to stem from an openssl file, that does not close once it has fullfilled its task in the single iteration. If iterated more than once, the open files add up and are bound to enter into an error at some point.
I am still much of a beginner programmer, and therefore don't want to start messing with openSSL, so I came here to ask, wether there is a solution for this kind of problem. Could it be solved by declaring the curl object outside of the recalled function?
What has to be done is simply declaring the handle and its settings before getting the data. Only the actual download and its accompanying response is then reiterated in the loop. It is encouraged to re-use a handler as often as needed, since part of its resources (like the files opened in this session), may need to redeployed again.