What is the most basic way to go to a webpage and download its contents? the webpage i wish to get only has text, most of which is in tables.
is there a std library that does it (like urllib in python)?
There's no official C++ network library, no. There are many different APIs available, though. Which is best for you would depend on what platform(s) you were targeting and what framework(s) you might already be using.
That said, cpp-netlib is a platform-neutral API that follows C++ idioms nicely. I've used it and it works.
A large number of tasks that are not covered by the C++ standard library can be done using boost, the collection of peer-reviewed portable libraries, which are used by pretty much every C++ project today. For networking, we use boost.asio.
Their tutorials include HTTP clients: http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio/example/http/client/sync_client.cpp and http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio/example/http/client/async_client.cpp
However, although this is highly portable and may end up becoming part of the C++ standard library in future, it is a bit too low-level for your task. libCURL is the today's default library for HTTP downloads.
Related
I think it is better if I explain the situation so this doesn't seem too arcane a question. I want to release some starter code for a project I want some of my students to work on. The project involves scraping through some internet webpages and as such, I want to provide them with a URLStream class that will download the html of an input url and return it as a string to them.
The issue is that I can't seem to find a particularly nice way to deal with networking in a way that will be cross platform (the students have mac/windows/linux machines). I know of libraries like Boost asio and libCurl, but the issue with using these is that I can't enforce all my students download them. So my question is twofold:
Is there any nice way to provide them this cross platform networking code?
If a library is the only way to do this, is there any way to attach the library to the starter project so that students don't have to download it? I know this might be a stupid question but I can't seem to find out if this is possible.
Boost.Asio is really not suitable for your needs as it involves huge Boost and building at least some of its non-header-only libs. You can still consider Asio lib that can be used w/o Boost and is header-only lib, so much less hassle for you and your students. As it's probably the most popular and modern networking C++ lib this exercise can provide some useful experience to the students. Asio examples also have a simple HTTP client.
As a side note, are you bound to C++ for this assignment? It would be much simpler in Python or similar languages that provide networking out of the box.
The Berkeley sockets API is the most common low-level socket API. It is supported on all POSIX platforms which means both Linux and macOS will have it.
Even Windows have it, but with a slight twist since sockets aren't descriptors like they are on POSIX systems.
Using sockets directly will lead to more boiler-plate code, but it is definitely possible to use it to make a simple HTTP client that supports only simple GET requests.
There are many tutorials and references on using sockets. Beej's Guide to Network Programming seems to be a popular tutorial, which should have notes about the tweaks needed for Windows.
cross-platform C++ library for network programming
asio is a cross-platform C++ library for network programming that provides
developers with a consistent asynchronous I/O model using a modern C++
approach. It has recently been accepted into Boost.
I copied that from the info window in Synaptic. If you're using Linux, install the library (and its documentation) thus:
sudo apt-get install libasio-dev libasio-doc
I depend heavily on Python's standard library, both for useful data structures and manipulators (e.g., collections and itertools) and for utilities (e.g., optparse, json, and logging), to skip the boilerplate and just Get Things Done. Looking through documentation on the C++ standard library, it seems entirely about data structures, with little in the way of the "batteries included" in Python's standard library.
The Boost library is the only open-source C++ library collection I know of that resembles the Python standard library, however while it does have utility libraries such as Regular Expression support, most of it is also dedicated to data structures. I'm just really surprised that even something as simple as assured parsing and writing a CSV file, made delightfully simple with the Python csv module, looks to require rolling-your-own in C++ (even if you leverage some parsing library by Boost).
Are there other open-source libraries out there for C++ that provide "batteries"? If not, what do you do as a C++ programmer: hunt for individual utility libraries (and if so, how), or just roll your own (which seems annoying and wasteful)?
The Poco library is more like other languages' standard libraries.
Actually the Poco web site's logo says "C++ now comes with batteries included!", which seems to be precisely what you're asking for.
I didn't like it when I tried because I found it too C-like and with too many dependencies between parts (difficult to single out just the functionality you want).
But there are many people & firms using it, so it seems I'm in minority and you will perhaps find it very useful.
In addition, as others have mentioned, for data structures, parsers, and indeed an interface to Python!, and such stuff, check out Boost.
Cheers & hth.,
While C++ does offer many of the comforts extended by OO it keeps a very simple standard library. C++ has STL and Boost. These are very good, and have more then just datastructures.
If your needs are these sorts of higher order functions for prototyping or making application without intense (relative term) speed requirements then C/C++ is probably not the right choice for you. I believe you will find that for most projects that high level languages will be fast enough for your needs. If you are working on an application that requires C/C++ speed (and accompanying standard deviations) then you should probably invest your time carefully picking each individual library you will be using.
http://beta.boost.org/community/sandbox.html
http://www.boostpro.com/vault/
also you can google for "boost+bar", eg
boost log ->
http://boost-log.sourceforge.net/libs/log/doc/html/index.html
boost threadpool ->
http://threadpool.sourceforge.net/
http://www.boost.org/doc/libs/1_45_0/?view=categorized
Boost isn't just about data structures - it has lots of the batteries you want - parsing, threads, collections, logging, etc.
With C and C++ you typically won't find a "do it all" library, instead you'll use individual libraries that do different things. You can use one library that does JSON parsing, one that does crypto, one that does logging, etc.
Boost and Qt are the only ones that would be more of a "do it all" type library.
I am a student, and new to C++. I am searching for a standard C++ API that is as comprehensive as the Java API. So far I have been using cplusplus.com, and cppreference.com.
Please any help would be greatly appreciated.
C++ and Java have very different standard libraries because they make very different assumptions about what they are going to be used for.
Java assumes that applications or applets will be running on a host with a full featured OS, with a defined way of doing most normal things.
There's a lot of content in that, for instance, in java, the output will be an application or applet. C++ does not make that assumption, because C++ can be used for building OS Kernels and drivers for kernels, it can be used for programming full stack real time applications on microcontrollers, or processing blocks in supercomputers.
C++ can be used for implementing the very operating system on which it will run.
For these reasons, the standard library assumes almost nothing about what it will have available, and so the standard library doesn't make any dependencies on those features.
The only exception is with files and streaming, because almost any operating system like stack has something that looks like a file stream if it has anything like files at all.
If you want a richer set of OS Specific api's you need to look at something non-standard. A great choice is the Qt framework, which provides many tools similar to what is found in the Java libraries, is cross platform, and works well with native C++ idioms.
C++ has a standard library.
You can try reading the "The C++ Standard Library: A Tutorial and Reference". While I don't own it myself, it's on our book list (which I recommend you check out), so it shouldn't be bad.
Note C++ isn't Java, so the libraries don't necessarily have the same functionality. Another resource you'll want to look at is Boost, which serves as a source for well-written C++ libraries for things the standard library lacks.
GNU C++ Standard library documentation is the one I refer to most often.
Java is a virtual machine language and as such attempts to have a comprehensive api to provide a platform independent method of drawing/wrtinging to files / anything. IN the guts of JRE they are taking these generic inputs and using them to do platform specific things. In C++ you are the one that does that work. Many c++ libraries are platform specific see MFC, ATL or code that is written for XWindows it your job to decide how you want to implement a feature and see if that is a platform specific feature or can be done in a platform independent manner.
If you are writing on windows or unix I can assure that the OS API is very complete and will allow you to do what ever your trying to accomplish. Also take a look at cross platfom libraries like lib qt.
Java's standard library is aimed at providing ready-to-use functionality, while the C++ standard library is aimed at providing building blocks that aren't defined by the core language. The Boost library has mainly the same orientation as the standard library (with a few exceptions such as image processing). I think the closest you can get to something like Java's standard library is the Poco library.
However, when I tried on the Poco library I found that it was a bit too C-oriented for my taste.
That is, it's not "modern". You get that impression straight away without even looking at the APIs, because the online docs uses 1990's frames. :-) However, it may fill your needs.
If you mean the c++ standard library I'd look at www.cplusplus.com. It covers the current standards. After familiarizing yourself with that, you could try looking at boost.
There are a number of changes in the upcoming c++0x standard. Wikipedia has info on a number of these as does SO.
The number one book, IMO, for c++ is Effective C++ by Scott Meyers.
Is there a good, simple library which allows C++ to load a webpage? I just want to grab the source as text. I'm not using any IDE or significant library, just straight command line.
Tangentially, is there something fundamental I'm missing about programming in C++? I would think any language in common use today would have droves of web-based functionality, being so central to computer usage, but I can find next to no discussion on how to accomplish it. I realise C++ significantly predates the modern internet, so it lacking any core ability in the regard is reasonable, but the fact that relevant libraries seem so sparse is baffling.
Thanks for your help.
Sure, for example libcurl is powerful and popular.
Internet-related libraries for C++ are extremely abundant -- they're just not part of the C++ standard, partly because the current version of that standard is so old, though I'm sure that's not the only reason. But turn to the world of open sources and you'll find more than you can shake a stick at.
libcurl is a popular C library for fetching HTTP and other URLs. There's also cURLpp a C++ binding .
On Windows you have the WinINet and WinHTTP APIs.
I think HTTP is a bit too complex to be part of the C++ Standard Library. The specification would have to take a lot of details into account such as proxy servers and MIME types.
I'm coming to C++ from a .Net background. Knowing how to use the Standard C++ Libraries, and all the syntax, I've never ventured further. Now I'm looking learning a bit more, such as what libraries are commonly used? I want to start getting into Threading but have no idea to start. Is there a library (similar to how .net has System.Threading) out there that will make it a bit easier? I'm specifically looking to do Linux based network programming.
For C++, Boost is your everything. Threading and networking are among the things it offers. But there's much more:
Smart pointers
Useful containers not found in the STL, such as fixed-size arrays and hashtables
Closures
Date/time classes
A foreach construct
Min/max functions
Command line option parsing
Regular expressions
As the others have said, Boost is great. It implements the C++ Technical Report 1 in addition to tons of other stuff, including some mind-blowing template metaprogramming tricks.
For other cross-platform features not provided by Boost, I've had very good luck with a library called Poco. I've worked on commercial projects that incorporated its simple HTTP server, for instance, and it treated us quite well.
lots of boost suggestions, but Qt is another good option. It's got great support for threading and networking along with pretty much everything else.
http://qt.nokia.com/products
If you are looking into network programming and are not interested into GUI, I suggest Boost libraries: in particular, Asio.
There's no standard multithreading library, but the boost library includes a platform-independent multithreading abstraction that works very well.