I want to download files that I have the full URL address using my c++ program, which library is the best to do this?
I was wondering for a command like:
system("download [URL] [DESTINATION]");
or
download(URL,DESTINATION);
I am using Windows, sorry that I forgot to mention.
You have not mentioned the operating system, anyways you can do that using system function in c.
#include <stdio.h>
int main()
{
system("wget url");
return 0;
}
Change url to get the file you needed.
have a look at curl
from curl site
curl is a command line tool for transferring data with URL syntax, supporting DICT, FILE, >FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, >SFTP, SMTP, SMTPS, Telnet and TFTP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos...), file transfer resume, proxy tunneling and a busload >of other useful tricks.
Its the best client side library available
http://curl.haxx.se/
libcurl is one of the widely used multi-protocol network transfer libraries. It can be used for downloading files or pages over HTTP and lot of other protocols.
The command line client curl is built on top of the libcurl library.
for a call to system("command"); you could use some command line tools that can achieve a download. wget and curl are such tools. the later even provides an API for C so you can write you own download(...) function.
Related
I wanna download *.txt files from server using curl, but unfortunately I can't understand how to do it, because I'm beginner in curl, I though to use recursive iterator from boost::filesystem, maybe you have any different ways to solve my problem? thank you)
boost::filesystem only works with file paths on the local machine, and UNC paths on a local network. You cannot use it to iterate remote files over the Internet.
If the files are on an HTTP server, libCURL cannot iterate the files directly, as that capability is not part of the HTTP protocol. Your only hope is if the HTTP server provides an HTML file (or other format) containing the file listing when you request a URL for the actual directory that the files reside in. However, for obvious security reasons, many HTTP servers disable this feature! Most servers will instead redirect to a specific file, like index.html, index.php, default.aspx, etc. But, if the HTTP server does allow to retreive a file listing, you would have to retrieve and parse that listing data manually in order to determine the URLs of the individual files, and then you can download them as needed.
If the files are on an FTP server, then that is more desirable, as directory listings are part of the FTP protocol, and libCURL can retrieve an FTP directory listing (make sure the requested URL ends with a backslash so libCURL knows a directory is being requested and not a specific file). However, you are responsible for parsing the listing data, which can be in any format the FTP server decides to send (and there are MANY MANY listing formats used by various online FTP servers!). Modern FTP servers support the FTP MLSD and MLST commands (as opposed to the original LIST command) to facilitate easier parsing of listing data, so you can try instructing libCURL to use those commands (via libCURL's CURLOPT_CUSTOMREQUEST option) if the FTP server supports those commands. Or, if you are interested in only the file names and not the file details (like timestamps, sizes, etc), you can use the FTP NLST command instead (via libCURL's CURLOPT_DIRLISTONLY option).
I'm using Dev-C++ and i'm looking for a mode to open(...or better...i need to load a browser intance in the background) the default browser (Example I.E.) and send a request to get the source code of the page I requested.
Can I do something like this in C++?
Thank you!
P.S. I need this for Windows
You seem to have imagined the wrong solution for your problem. If you want to get the HTML source for a web page, you don't need to somehow do it through the browser. You need to do whatever the browser does to get it.
When you enter an address into a browser, the browser sends a HTTP GET request to the server that hosts the resource you're requesting (often a web page) and the server sends a HTTP response back containing the resource content (often HTML) back.
You want to do the same in your application. You need to send a HTTP request to the server and read the response. A popular library for doing this is libcurl.
If you don't have to send a POST request (i.e. just a simple web request or with parameters passed on the URL (GET), then you could just use URLDownloadToFile().
If you don't want to use callbacks etc. and just download the file, you can call it rather simple:
URLDownloadToFile(0, "http://myserver/myfile", "C:\\mytempfile", 0, 0);
There are also a few other functions provided that will automatically push the downloaded data to a stream rather than a file.
It can't be done in pure C++. You should use native Windows library or other (like Qt Framework) and use it's capabilities of getting and parsing website. In Qt, you'd use QtWebkit.
edit: also if you want only the source code of a page, you can do this without using browser or their engines, you can use Winsock.
I'm making an HTTP server in c++, I notice that the way apache works is if you request a directory without adding a forward slash at the end, firefox still somehow knows that it's a directory you are requesting (which seems impossible for firefox to do, which is why I'm assuming apache is doing a redirect).
Is that assumption right? Does apache check to see that you are requesting a directory and then does an http redirect to a request with the forward slash? If that is how apache works, how do I implement that in c++? Thanks to anyone who replies.
Determine if the resource represents a directory, if so reply with a:
HTTP/1.X 301 Moved Permanently
Location: URI-including-trailing-slash
Using 301 allows user agents to cache the redirect.
If you wanted to do this, you would:
call stat on the pathname
determine that it is a directory
send the necesssary HTTP response for a redirect
I'm not at all sure that you need to do this. Install the Firefox 'web developer' add-on to see exactly what goes back and forth.
Seriously, this should not be a problem. Suggestions for how to proceed:
Get the source code for Apache and look at what it does
Build a debug build of Apache and step through the code in a debugger in such a case; examine which pieces of code get run.
Install Wireshark (network analysis tool), Live HTTP Headers (Firefox extension) etc, and look at what's happening on the network
Read the relevant RFCs for HTTP - which presumably you should be keeping under your pillow anyway if you're writing a server.
Once you've done those things, it should be obvious how to do it. If you can't do those things, you should not be trying to develop a web server in C++.
The assumption is correct and make sure your response includes a Location header to the URL that allows directory listing and a legal 301/302 first line. It is not a C++ question, it is more of a HTTP protocol question, since you are trying to write a HTTP server, as one of the other posts suggests, read the RFC.
You should install Fiddler and observe the HTTP headers sent by other web servers.
Your question is impossible to answer precisely without more details, but you want to send an HTTP 3xx status code with a Location header.
I'm writing a small content server as a web service. There are 2 units - one authenticates the application requesting content and when authentication succeeds, the request is forwarded to the other unit that serves the content.
[1] If I want to do this using CGI
scripts, is there any equivalent of
jsp:forward in CGI?
[2] Suppose if
forwarding is not possible, the
client application shouldn't be able
to request the second unit directly.
What is the proper way to do this?
Another attempt, since you are not after HTTP redirect...
The short answer is: Yes, it is possible.
However, it is highly dependent on the tools you are using. What web server and CGI scripting language you are using?
CGI scripts can do practically anything they want to do, for example they could execute code from other CGI scripts. Thus, they can provide the behavior you are looking for.
CGI (Common Gateway Interface) just describes how a web server starts a CGI script and gives the script input data via environment variables. CGI also describes how the script returns data to web server. That's all.
So if your authorization script wants to delegate some operation to other some script, it is up to that authorization script to implement it somehow. The CGI protocol does not help here.
The concept you might be looking for is called HTTP redirect, where the server sends a response to browser's request, telling the browser to fetch a new page from another URL.
CGI can do HTTP redirects just fine just like jsp:forward. You need just to output the right HTTP headers.
You need to return a 302 response code in HTTP headers, and provide location URL where browser should go next. Have your CGI script output these kind of headers:
HTTP/1.1 302 Redirect
Location: http://www.example.org/
These headers tell browser to fetch a page from URL http://www.example.org/ .
I'm trying to send a file and other POST variables to a PHP script on my server. There are no good resources on Google and the code samples I've found don't work. Preferably without using cURL.
If you're going to roll your own you'd need the relevant RFC for HTTP file uploading (googling on "rfc http file upload" will yield the same result). This RFC also shows how to handle a mix of files and other FORM-data (or POST variables). The problem is of course that you'll probably want to read the MIME RFC as well...
Just a couple of resources make it pretty easy to roll your own
Here is an example of a GET request via ASIO (the C++ networking library in Boost)
Here is the HTTP protocol made really easy
The GET request is how you can view any page on your site. With that code you can download any page and get it as raw text. As you can see it sends a GET header to the server. As explained in that HTTP protocol page, the POST request looks like this
POST /path/script.cgi HTTP/1.0 From:
frog#jmarshall.com User-Agent:
HTTPTool/1.0 Content-Type:
application/x-www-form-urlencoded
Content-Length: 32
home=Cosby&favorite+flavor=flies
To send a file:
You put your URL after post
change the content type to the type of file you are trying to upload.
Set Content-Length to the number of bytes in that file
Append the file after a carrage return (replace "home=Cosby&favorite+flavor=flies")
Another (more quick-n-dirty) solution is to use a utility, via a system() or similar call.
For example the wget utility has a --post-file option.
I'd say roll your own. Its not too complicated.
Capture an HTTP post sent from a browser in Wireshark and reverse engineer as necessary using the spec as your guide. (See Andreas Magnusson's answer below for perhaps more relevant specs.)
I would recommend this approach personally for learning the protocol rather than just going pure spec. Its pretty difficult to learn things just from the spec. I would rather explore the different behaviors by known http clients and try to figure out how things are working by using the spec as my guide.
Format and send the data accordingly over a socket once you're comfortable with HTTP.
Also, If you are not familiar with socket programming, check out Beej's guide to socket programming.
this worked great for me on debian (http get, http post):
http://cpp-netlib.github.com
I use v 0.9.3 that requires boost 1.49