Read .html file into string in c++ - c++

I am looking for some tips how to read strings from a remote server over HTTP and save it into a string. Currently, I aim towards reading the first character.
#edit:
what I have already done:
I thought that curl might be the right tool to achieve that.
compilation:
1>------ Rebuild All started: Project: cURL, Configuration: Debug Win32 ------
1> stdafx.cpp
1> cURL.cpp
1> cURL.vcxproj -> C:\Users\Lukasz\Documents\Visual Studio 2010\Projects\cURL\Debug\cURL.exe
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========
but after debug:
The procedure entry point sasl_errdetail could not be located in the dynamic link library libsasl.dll
my main file.cpp:
// cURL.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <curl.h>
#include <cstdio>
#include <string>
std::string buffer;
size_t curl_write( void *ptr, size_t size, size_t nmemb, void *stream)
{
buffer.append((char*)ptr, size*nmemb);
return size*nmemb;
}
int main(int argc, char **argv)
{
CURL *curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, "http://google.com");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curl_write);
curl_easy_perform(curl);
curl_easy_cleanup(curl);
fwrite( buffer.c_str(), buffer.length(), sizeof(char), stdout);
return 0;
}

I had same problem!!
downloaded libsasl.dll from dlldll.com and getting same error. I assumed they have old version of dll so I downloaded newer version and it worked like a charm.
I got never version from http://theetrain.ca/tech/files/libsasl.dll

Open the file.
Use the fgetc or istream::read functions.
Convert character to proper bool value.
Close the file.
HTML files can be read like text files.
Parsing an HTML file is another issue.

Simply to receive html in a server/client setup requires that you connect to a socket and handle the messages that come over the socket.
Here is a well done example that introduces Sockets on Windows:
Programming Windows TCP Sockets in C++ for the Beginner
If you want to completely avoid learning sockets and getting really good control of what is going on, you can use just a URL Downloader of sorts:
Download a URL in C
But using URLOpenStream will probably give you the fastest results, if you need speed over anything else.
So after you have a socket that receives the text of html, then you can parse the html and convert what you find into the appropriate variable.

Related

Using libcurl for a POST request

I'll preface this by saying I'm still a new C/C++ programmer, so please excuse me for what may be a redundant question.
I'm writing a program in C/C++ to interact with this website: http://www.youtube-mp3.org/.
From what I understand, to get my program to download a link for me I'll have to send a POST request to the server containing the URL I want to convert, then find a way of getting it to follow the URL that is generated allowing me to download the file. I also understand that libcurl is a good way of doing this sort of thing in C/C++.
I've tried using the POST examples on the libcurl website (http://curl.haxx.se/libcurl/c/simplepost.html and one other) but neither seems to work. In addition, I'm not sure how to then get my program to follow the link that appears saying 'Download' . I've tried sending a POST request, then telling my program to get the html source of the page and store this in a file, but that file doesn't seem to contain any download link. When this is done through a browser, the page source definitely includes a working download link.
Would really appreciate some help, as I'm not sure whether I've got completely the wrong idea!
EDIT: My question wasn't very clear at all. Here is the relevant code I'm using for the POST request:
static const char *postthis="http://www.youtube.com/watch?v=KMU0tzLwhbE";
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, "http://www.youtube-mp3.org/");
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, postthis);
curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, (long)strlen(postthis));
/* Perform the request, res will get the return code */
res = curl_easy_perform(curl);
/* Check for errors */
if(res != CURLE_OK)
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(res));
}
And for writing the html source to file:
static size_t write_data(void *ptr, size_t size, size_t nmemb, void *stream)
{
int written = fwrite(ptr, size, nmemb, (FILE *)stream);
return written;
}
{
static const char *filename = "head.txt";
FILE *htmlfile;
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
// open the file
htmlfile = fopen(filename,"w");
if (htmlfile == NULL) {
curl_easy_cleanup(curl);
return -1;
}
curl_easy_setopt(curl,CURLOPT_WRITEDATA, htmlfile);
curl_easy_perform(curl);
/* close the header file */
fclose(htmlfile);
/* always clean up */
curl_easy_cleanup(curl);
}
Your code does not work because you are assuming the wrong logic to begin with.
http://www.youtube-mp3.org does NOT use POST, in fact its download form doesn't even submit to a server-side URL at all. When you click on the "Convert Video" button, a client-side JavaScript is invoked to process the input URL, download the relevant information from YouTube, and modify the calling page's HTML to display the actual download link and video preview image. This is why you don't see the download link when you simply retrieve the HTML - you are not invoking the JavaScript that performs the actual work of preparing the download link. And you will not be able to do that from an application (without a LOT of extra work), it has to be done inside of a web browser that has a real JavaScript engine and a real DOM for the script to manipulate.

cURL downloading images of camera (http request)

I have installed cURL, and I was able to download an image from website, and it works fine.
Here is the code:
#define CURL_STATICLIB
#include <stdio.h>
#include <stdlib.h>
#include </usr/include/curl/curl.h>
#include </usr/include/curl/stdcheaders.h>
#include </usr/include/curl/easy.h>
size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
size_t written = fwrite(ptr, size, nmemb, stream);
return written;
}
int main(void) {
CURL *curl;
FILE *fp;
CURLcode res;
char *url = "http://www.example.com/test_img.png";
char outfilename[FILENAME_MAX] = "/home/c++_proj/output/web_req_img.png";
curl = curl_easy_init();
if (curl) {
fp = fopen(outfilename,"wb");
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
res = curl_easy_perform(curl);
/* always cleanup */
curl_easy_cleanup(curl);
fclose(fp);
}
return 0;
}
I also have a dlink DCS-930L camera. I can easily connect my camera a static IP address, and I was able to view live video on the camera, by logging into the camera (e.g. http://192.168.1.5).
I don't need any special software or anything to start watch video.
Now, I would like to use cURL to download images from camera. But I am not sure how to do it.
Could someone please tell me, or provide some piece of code for it?
All I want to do is to capture (sample) few of the images that are being streamed.
How do I know when to make a request, and when would be the boundary between the images.
I would truly appreciate some advise and piece of code that could get me going.
T
According to the manual for this camera [1], you need use a Java or ActiveX plugin to receive and watch the video:
Please make sure that you have the latest version of Java application
installed on your computer to ensure proper operation when viewing the
video in Java mode. The Java application can be downloaded at no cost
from Sun’s web site (http://www.java.com).
When you connect to the home page of your camera, you will be prompted
to download ActiveX. If you want to use ActiveX to view your video
images instead of Java, then you must download ActiveX.
This suggests that grabbing the image is going to be more difficult than simply making an HTTP request.
[1] http://www.dlink.com/us/en/support/product/-/media/Consumer_Products/DCS/DCS%20930L/Manual/DCS%20930L_Manual_EN_US.pdf

Sending and receiving strings over http via curl

I have a situation where my program on a server (windows machine) outputs some strings. I need to send those strings from the server to the client via HTTP using curl. Once sent I am to receive the data on the client side as string, decode it and perform subsequent actions.
I already achieved this functionality using C Sockets using berkely API as I had familiarity with that. But for some reason I am not allowed to use a program of my own.
I poked around and seems CURL can be my solution. However I am very new to curl and cant seem to figure out how to achieve this functionality. On the Client side I found this to be useful may be:
#include <stdio.h>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
/* Perform the request, res will get the return code */
res = curl_easy_perform(curl);
/* Check for errors */
if(res != CURLE_OK)
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(res));
/* always cleanup */
curl_easy_cleanup(curl);
}
return 0;
}
I understand that you have to use the write back functions to receive data ?
Also on the client side I need to develop a program using curl that whenever the server sends over a string, it should receive it and decode it. Any pointers to tutorials related to the specific problems will be highly appreciated. Or if someone has already tried this I'll highly appreciate any help here.
Thanks.
Take a look at this example code from their site. It details how to get your response data written to a region of memory rather than a file:
http://curl.haxx.se/libcurl/c/getinmemory.html
also take a look at the generic tutorial on the curl website:
http://curl.haxx.se/libcurl/c/libcurl-tutorial.html
one final thing to consider, if using C++ you need to make sure your callbacks are not non static member functions (see here libcurl - unable to download a file)
This should get you started at least.

Downloading a file from URL to disk in C++

I have a simple question. Is it possible to write simple code to download a file from the internet (from URL to disk) without using C++ (for mac osx) libraries like curl?
I have seen some examples but all of these use the Curl library.
i use this code on my xcode projet..but i have some compilation (linking) errors
#define CURL_STATICLIB
#include <stdio.h>
#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>
#include <string>
size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
size_t written;
written = fwrite(ptr, size, nmemb, stream);
return written;
}
int main(void) {
CURL *curl;
FILE *fp;
CURLcode res;
char *url = "http://localhost/aaa.txt";
char outfilename[FILENAME_MAX] = "bbb.txt";
curl = curl_easy_init();
if (curl) {
fp = fopen(outfilename,"wb");
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
fclose(fp);
}
return 0;
}
how can i link the curl library to my xcode project?
You can launch a console command, it is very simple :D
system("curl -o ...")
or
system("wget ...")
"Downloading a file from URL" means basically doing an GET request to some remote HTTP server. So you need to have your application know how to do that HTTP request.
But HTTP is now a quite complex protocol. Its specification alone is long and complex (more than a hundred pages). libcurl is a good library implementing it.
Why do you want to avoid using a good free library implementing a complex protocol? Of course, you could implement the complex HTTP protocol by yourself (probably that needs years of work), or make a minimal program which don't implement all the details of HTTP protocol but might work (but won't work with weird HTTP servers).
You have to learn bits of "socket programming" and implement a very basic HTTP protocol; the minimalist thing is to send string like "GET /this/path/to/file.png HTTP/1.0\r\n" to the site; then, likely it will answer with an HTTP header you have to parse to know at least the length of the binary data following (if the request succeeded, otherwise you have to handle HTTP errors, or a unexpected contet-type like a html page).
This guide should give you the basic to start with; about HTTP, it depends on your need, sometimes sending a "raw" GET could suffice, sometimes not.
EDIT
Changed to pretend that the request comes from a HTTP/1 compliant client, since HTTP/1.1 wants the Host header to be sent, as commenter has rightly pointed.
EDIT2
The OP changed the question, which became something about how to link with a library in Xcode. There's already a similar question on SO.

Read HTML source to string

I hope you don't frown on me too much, but this should be answerable by someone fairly easily. I want to read a file on a website into a string, so I can extract information from it.
I just want a simple way to get the HTML source read into a string. After looking around for hours I see all these libraries and curl and stuff. All I need is the raw HTML data. I don't even need a definite answer. Just something that will help me refine my search.
Just to be clear I want the raw code in a string I can manipulate, don't need any parsing etc.
You need an HTTP Client library, one of many is libcurl. You would then issue a GET request to a URL and read the response back how ever your chosen library provides it.
Here is an example to get you started, it is C so I am sure you can work it out.
#include <stdio.h>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
res = curl_easy_perform(curl);
/* always cleanup */
curl_easy_cleanup(curl);
}
return 0;
}
But you tagged this C++ so if you want a C++ wrapper for libcurl then use curlpp
#include <curlpp/curlpp.hpp>
#include <curlpp/Easy.hpp>
#include <curlpp/Options.hpp>
using namespace curlpp::options;
int main(int, char **)
{
try
{
// That's all that is needed to do cleanup of used resources
curlpp::Cleanup myCleanup;
// Our request to be sent.
curlpp::Easy myRequest;
// Set the URL.
myRequest.setOpt<Url>("http://example.com");
// Send request and get a result.
// By default the result goes to standard output.
myRequest.perform();
}
catch(curlpp::RuntimeError & e)
{
std::cout << e.what() << std::endl;
}
catch(curlpp::LogicError & e)
{
std::cout << e.what() << std::endl;
}
return 0;
}
HTTP is built on top of TCP. If you know socket programming, you can write a simple networking application that opens a socket to the desired server and issues an HTTP GET command. Whatever the server responds with, you'll have to remove the HTTP headers that precede the actual document you want.
If that sounds complicated, then just stick with libcurl.
if it is a hack - then just grab the source from show source, and save as txt. then you can open it with a normal file io stream.
all thos pesky libraries are a hint that it is a common and non-trivial excercise to do it right... :)
If all you want to do is grab the entire HTML code without any kind of parsing and extern libraries, my sugestion would be copying the code with a IO stream into a string.
It is the simplest way that I have in mind but be aware that it isn't the most efficient way to do it.