I am trying to read the content of a PHP / HTML file on a remote web server using C++, but haven't found a way to do it. I want to pass GET statements to it, so http://example.com/login.php?user=abc&password=def.
How would I do it?
Your best bet is to use an external library. libcurl is popular and fairly easy to use.
Here's a simple example, you need to add error checking though:
string data;
CURL *curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, url_.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &data);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curlWrite);
curl_easy_perform(curl);
Your callback would look something like this:
size_t curlWrite(void *ptr, size_t size, size_t nmemb, void *usrPtr)
{
size_t bytes = size * nmemb;
string *data = static_cast<string *>(usrPtr);
data->append(static_cast<const char *>(ptr), bytes);
return bytes;
}
You can add your GET parameters on the end of the URL.
Related
I am using following code to download data from an url to memory (stream). Around 2% chance, the size of the stream is zero. I can download proper data from the same failing url if I try it another time. I am not sure if this is a network issue, CPU usage issue, or it's just the code not covering some corner cases. Please advice. Thanks!
static size_t write_data(char *ptr, size_t size, size_t nmemb, void *userdata)
{
std::vector<uchar> *stream = (std::vector<uchar>*)userdata;
size_t count = size * nmemb;
stream->insert(stream->end(), ptr, ptr + count);
return count;
}
static void CurlUrl(const char* img_url, std::vector<uchar>* stream) {
CURL *curl = curl_easy_init(); // curl_global_init is called eleswhere.
curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1);
curl_easy_setopt(curl, CURLOPT_URL, img_url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, stream);
curl_easy_setopt(curl, CURLOPT_TIMEOUT, 10);
CURLcode res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
}
If it didn't deliver any download data into the buffer via the callback, it means that the transfer either failed or that there was exactly zero bytes to transfer.
Check the return code from curl_easy_perform() as it might actually tell you exactly what happened.
Use CURLOPT_VERBOSE to see what's going on if (1) is not enough.
Use CURLOPT_ERRORBUFFER to get a better error description if it fails if (2) is not enough.
I am using libcurl to fetch json data using GET request from a webserver.
This is my sample code:
char *DownloadedResponse;
static int writer(char *data, size_t size, size_t nmemb, char *buffer_in)
{
if (buffer_in != NULL)
{
buffer_in = new char[size*nmemb];
strcpy(buffer_in,data);
DownloadedResponse = buffer_in;
return size * nmemb;
}
return 0;
}
char * DownloadJSON(string URL)
{
CURL *curl;
CURLcode res;
struct curl_slist *headers=NULL;
curl_slist_append(headers, "Accept: application/json");
curl_slist_append( headers, "Content-Type: application/json");
curl_slist_append( headers, "charsets: utf-8");
curl = curl_easy_init();
if (curl)
{
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl, CURLOPT_URL, URL.c_str());
curl_easy_setopt(curl, CURLOPT_HTTPGET,1);
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl,CURLOPT_WRITEFUNCTION,writer);
res = curl_easy_perform(curl);
if (CURLE_OK == res)
{
char *ct;
res = curl_easy_getinfo(curl, CURLINFO_CONTENT_TYPE, &ct);
if((CURLE_OK == res) && ct)
{
cout<<"\nresponse received: "<<DownloadedResponse;
}
else
{
curl_slist_free_all(headers);
curl_easy_cleanup(curl);
curl = NULL;
return NULL;
}
}
}
curl_slist_free_all(headers);
curl_easy_cleanup(curl);
curl = NULL;
}
Here I am able to get json data in DownloadedResponse in callback "writer" of CURLOPT_WRITEFUNCTION.
But if I print using custom pointer of CURLOPT_WRITEDATA,
char *dataPointer = NULL;
CURLcode curl_easy_setopt(curl, CURLOPT_WRITEDATA, dataPointer);
cout<<dataPointer;
Output of dataPointer is empty.
What is the issue here since i able to print json data in callback of CURLOPT_WRITEFUNCTION but not in the pointer of CURLOPT_WRITEDATA
You write a function that takes data read from the network, and writes it to where you want it.
static int writer(char *data, size_t size, size_t nmemb, char *buffer_in){
if (buffer_in != NULL) {
// very bad code which is never executed
}
return 0;
}
In order for that function to write the data, it has to know where to write it, so you tell it to write to NULL
char *dataPointer = NULL;
CURLcode curl_easy_setopt(curl, CURLOPT_WRITEDATA, dataPointer);
What value do you tell it to use as buffer_in? You pass it dataPointer, which is NULL, so you just told it buffer_in = NULL. I think instead you meant to say "the address of dataPointer", which would be &dataPointer.
Technically, I have answered your question now. You passed it NULL for the buffer, so the write function exited immediately. But there's more. Now you get to execute that really bad code in writer().
if (buffer_in != NULL)
{
// if buffer_in already has allocated memory then leak it immediately
// create a new buffer of memory to leak later
buffer_in = new char[size*nmemb];
// store the data in buffer_in
// assume it is null terminated (it is not)
// rather than using the length we already know
strcpy(buffer_in,data);
// remember buffer_in? We don't use it so assign that data pointer to a global variable.
DownloadedResponse = buffer_in;
// return size of this particular chunk of data
return size * nmemb;
}
This function MUST use the length of the data, and not assume data is null terminated (see https://curl.haxx.se/libcurl/c/CURLOPT_WRITEFUNCTION.html).
This function MUST be able to handle the data in multiple small pieces by adding them to what it has already read. You can't call new and then discard the new memory. And you can't do that anyway because you just leaked that memory -- every new must be matched with exactly one delete. In fact, you would be very well advised not to use new or delete at all, now that we have the standard library.
This function should use the buffer_in argument you give it rather than a global variable, but you can use a global variable if you want, it's just error prone. It's not literally an error like the other stuff.
The whole point of buffer_in is to give you a persistent data structure where you can accumulate the answers. It probably should be in local scope around the curl_easy_execute, so you can then just return the content from that data structure if you got CURLE_OK. I strongly recommend you write the data to std::vector, so you don't have to keep track of memory allocation. You have trouble with it, but you don't need to do it at all. Modern style says everybody has trouble with it, so just let the standard library handle it.
You claim to follow the example in the docs, which links to https://curl.haxx.se/libcurl/c/getinmemory.html If you look again, you will see what they are doing, and how your code doesn't match. In particular, they pass &chunk (the address of chunk) and then write data into chunk so they keep what was there before.
struct MemoryStruct {
char *memory;
size_t size;
};
static size_t
WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
// here is where they get access to the buffer
struct MemoryStruct *mem = (struct MemoryStruct *)userp;
In the call to curl, you will find the struct locally defined, then the remote call:
struct MemoryStruct chunk;
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&chunk);
res = curl_easy_perform(curl_handle);
if (stuff)
printf("%lu bytes retrieved\n", (long)chunk.size);
Basically this is my code :
int main()
{
CURL *curl;
FILE *fp;
CURLcode res;
std::string readBuffer;
curl = curl_easy_init();
char outfilename[FILENAME_MAX] = "C:\\Users\\admin\\desktop\\test.txt";
if(curl) {
fp = fopen(outfilename,"wb");
curl_easy_setopt(curl, CURLOPT_URL, "http://www.example.com");
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "user=123&pass=123");
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
res = curl_easy_perform(curl);
Sleep(1000);
curl_easy_cleanup(curl);
fclose(fp);
}
return EXIT_SUCCESS;
}
The output is successfully saved in the text file.
My concern is how to extract specific content in between specific tags.
For example i want only the content between < bla> .............. < /bla> .
Whats the easiest way and thank you.
In your Example, you are dumping the response from the website to a file, libcURL writes the data returned by the webpage that you hit as it is, it does not take efforts for restructuring the returned data.
You can obtain the data in a memory, by defining the write_data function, which needs the following format only:
size_t write_data(char *ptr, size_t size, size_t nmemb, void *userdata);
Once you get the data in a memory, you can parse it and restructure it as required.
See Example Here for using write_data function.
For XML Parsing you may use This sample code
I would like to dowload some page content of wikitionary. I use curl in a loop. The first iteration is ok but the others give me the same result as the first. What is missing/wrong?. Thank you. This is the loop:
std::string buffer;
size_t curl_write( void *ptr, size_t size, size_t nmemb, void *stream)
{
buffer.append((char*)ptr, size*nmemb);
return size*nmemb;
}
int main(int argc, char **argv)
{
CURL *curl = curl_easy_init();
string data;
data="http://fr.wiktionary.org/w/api.php?format=json&action=query&titles=";
//Page titles are read from local file. The code is not shown to make short.
while ( not_end_of_file){
//list_of_page_title is pages requested for the current iteration.
data=data+list_of_page_title+"prop=revisions&rvprop=content";
curl_easy_setopt(curl, CURLOPT_URL, data.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curl_write);
curl_easy_perform(curl);
curl_easy_reset(curl);
}
curl_easy_cleanup(curl);
return 0;
}
I am new to curl. May be many things are missed. Thank you for the help.
data=data+list_of_page_title will append the new title onto your previous URL instead of replacing the previous. By the end you'll have a gigantic URL full of garbage. The server is probably paying attention to the first title and ignoring the rest.
And this would be obvious if you just output your URL as the first step of debugging... "Am I requesting what I think I'm requesting?"
One problem is that you are not resetting your buffer variable.
while ( not_end_of_file){
buffer = ""; // reset buffer to empty string
//list_of_page_title is pages requested for the current iteration.
data="http://fr.wiktionary.org/w/api.php?format=json&action=query&titles=" +
list_of_page_title +
"prop=revisions&rvprop=content";
curl_easy_setopt(curl, CURLOPT_URL, data.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curl_write);
curl_easy_perform(curl);
curl_easy_reset(curl);
}
And as Peter points out your handling of the data variable has a very similar problem.
Take a look at the following code
static size_t reader(void *ptr, size_t size, size_t nmemb, FILE *stream) {
size_t retcode = fread(ptr, size, nmemb, stream);
cout << "*** We read " << retcode << " bytes from file" << endl;
return retcode;
}
void upload() { //upload() is called from ouside
FILE *pFile;
pFile = fopen("map.txt" , "r");
struct stat file_info;
stat("map.txt", &file_info);
size_t size = (size_t)file_info.st_size;
uploadFile(pFile, size);
}
bool uploadFile(void* data, size_t datasize) {
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if (curl) {
char *post_params = ...;
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, post_params);
curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, (long) strlen(post_params));
curl_easy_setopt(curl, CURLOPT_READFUNCTION, reader);
curl_easy_setopt(curl, CURLOPT_READDATA, data);
curl_easy_setopt(curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t) datasize);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
}
return true;
}
When the code is executed, the following is outputed
*** We read 490 bytes from file
*** We read 0 bytes from file
after that the app does nothing (even not exiting).
Can someone point out at what's wrong here?
Will be grateful for any help!!!
There's some serious confusions shown in this code. Let me try to explain:
CURLOPT_UPLOAD - this will ask libcurl to PUT the file when the protocol of choice is HTTP
CURLOPT_POSTFIELDS - tells libcurl to POST the data that is provided in the additional argument (which has the size set with CURLOPT_POSTFIELDSIZE)
CURLOPT_READFUNCTION - provides libcurl an alternative way to get data than CURLOPT_POSTFIELDS to allow a POST that reads the data from a file. When using CURLOPT_UPLOAD this is the only way to provide data.
So in the end the questions left for you are:
Do you want PUT or POST?
Do you want to provide the data as a string or do you want it provided with a callback?