C++, Get text from a website - c++

I was told I have to use winsock, but I dont know where to start. For example, I am trying to access, lets say http://www.newegg.com/, I am trying to get the text title of just the three front page products. Any help is greatly appreciated. :D

I'd also recommend libcurl for this sort of thing.
You can use the cURL command line tool to generate sample code as well, which is helpful for experimentation.

W3.org themselves provide sample C / C++ librarys for Http requests.
Find them here
Specifically, look for HTTPReq.c

Use boost library and poco. They both provide solutions for network programming. Boost also provide spirit library which you can use for parsing data from websites. Poco libraru also provides NetSSL, crypto solutions.
P.S. boost::spirit is not a library for parsing data from websites, it provides solution for parsing strings ...

you need to open a socket.
then you need to do an http get
somewhat like :-
http://www.esqsoft.com/examples/troubleshooting-http-using-telnet.htm

You could use the QNetworkAccessmanager class from Qt framework.

I'm assuming you need to use c++ for a reason, such as integration with existing software, otherwise, as per some of the other suggestions, choosing a language with a more convenient framework (eg: scripting language) would be better suited for the task.
If you would like to avoid getting your hands dirty with WINSOCK, or have the need to run on a platform other than windows, you could look at the using the boost asio library.
The following page contains links to simple sync and async http clients:
http://www.boost.org/doc/libs/1_37_0/doc/html/boost_asio/examples.html
You can find documentation on the library itself at:
http://www.boost.org/doc/libs/1_37_0/doc/html/boost_asio.html

Use c++ if you must, but it might be a lot less painful to use python.
Look at the Python httplib module for how to set the host you want to pull from etc. Python's available for free for most platforms and is enough like C++ that you can probably learn python a heck of a lot faster than you can learn to write a program controlled browser in c++. Well, maybe that's not true for everyone on this site, but I'll bet it's true for "most" of us. I used to get stock quotes updated in near real time from CNN Money years ago and IIRC it was around 100 lines of python code.
Hotei

Related

GET/POST using boos-asio for a REST API

I am new to network programming and got started with boost, REST etc. I wanted to know if I could use REST API's with boost-asio such as using Google Maps' Distance Matrix in my program. But I couldn't find a proper documentation for boost.
I don't expect you to give me complete working code rather I need idea or some sort of guidance as to what to do, where to find things etc.Also this program will be in C++ purely (I don't know if it can even be done in C++, given this answer https://stackoverflow.com/a/28736632/4846740). Thanks
Note: This post was not very helpful Integrating Google maps with C++ Program
You'd want to
use a REST library (e.g. cpprestsdk or some other frameworks like autobahn-cpp?), or
= at least write the REST requests on top of a HTTP library, such as Boost Beast
The library examples show you everything you need to send requests and receive responses. If you want, you can use additional libraries like https://github.com/0xdead4ead/BeastHttp to make it even more high-level/instant.
I would 100% recommend that you do not do this in C++. While I'm not a huge fan of Python, it's undoubtedly the hammer that is made for this nail. Check out BeautifulSoup, Mechanize, and Scrapy (+XPath), for really convenient ways of obtaining/parsing HTML, filling out web forms, and gaining responses. Typically, unless you're doing realtime target tracking, you do not need the latency gained from running everything in C/C++. You can get away with quarter-second, or even half-second updates.
I'm not sure what you're trying to do, but I would say save yourself the headache, and just work with Python.

Simple HTTP Server lib

What is a good choice for a simple http Server lib? It doesn't need high performance. I rather look for something simple for some REST/JSON communication ("API").
It must be able though to work in a multithreaded environment and must be able to handle large POST request.
Any suggestions? I already tried cpp-netlib but this seems to be much too complicated for such an easy task...
Edit: I am looking for something really light-weight and simple. E.g. like Sinatra in the Ruby world. Poco is for me another example of a too heavy-weight library.
The first one that comes to mind is Poco Library ( http://pocoproject.org/ )
Cross platform, stable, well documented. While the library itself offers more than you probably need you can build and omit the portions you aren't planning on using to reduce bloat.
They have a fully featured Net library that includes several salient classes and utilities.
Here is a pdf of slides from that library, of particular interest is the HTTPServer class:
http://pocoproject.org/slides/200-Network.pdf
Not sure about large POST data, but I've previously used mongoose: https://github.com/cesanta/mongoose/.
If the LGPL license is unwanted there is a MIT fork from when the project was MIT that also add a C++ API https://github.com/bel2125/civetweb
I would encourage you to start with http server samples in boost.asio. They are so simple and easy to understand, that you should be able to easily extend them as needed.
However, if you want to jump onto something more polished than just sample code, I know of 3 http servers in C++ which you may like to try:
"x0 - HTTP Web Server Framework" to me personally this one seems most promising, because it's lightweight and simple
"highpower / xiva" is a simple http server framework for delivering notifications to browsers
"Pion, a project of Atomic Labs" is a part of elaborate framework for handling large amounts of data
Pretty late answer; but hope this helps.
For your interest of a server that can handle REST, here is the easiest HTTP Server library to use (in my opinion): https://github.com/yhirose/cpp-httplib.
For JSON parsing, you may search for another library to use it in conjunction.
Personally, I'd go for Arachnida but that might be because I wrote it.

What works for web dev in C++

I want to create a web application that runs with very little RAM and I think C++ can help me achieve that.
Now, many people say C++ is unsuited for web development because:
there is no easy string manipulation
is an unsafe language (overflows, etc)
long change / build / test cycles
etc.
But I'm sure the C++ community have found ways to alleviate all those (maybe not the compile time) however since I'm not a regular so it is hard for me to put a value on what I find in Google.
So I'm asking for some guidance. I would appreciate if you share what works, what tools/libs are current and alive. What strategies can help with web dev in C++? FastCGI or embedded server (Asio / POCO / Pion / etc.)? How do you address security concerns?
Thanks a lot for any help
Have you looked at http://www.tntnet.org/. They have created a... well let me cut and paste from their website:
Tntnet is a modular, multithreaded,
high performance webapplicationserver
for C++. To create webapplications
Tntnet has a template-language called
ecpp similar to php, jsp or mason,
where you can embed c++-code inside a
html-page to generate active content.
The ecpp-files are precompiled to
c++-classes called components and
compiled and linked into a shared
library. This process is done at
compiletime.
I've used it and it has quite a small overhead plus it has screamingly fast dynamic page generation. Makes PHP, Ruby etc snails in comparison because with tntnet you are running compiled C/C++ code.
There's the Wt Project. It uses a paradigm similar to Qt's signals/slots.
There is nothing wrong with trying to build a web app in C++. It's actually a lot of fun. What you need is a:
Templating system
A CGI lib
A database API wrapper, most likely, to avoid dealing with something like the low-level MySQL API
A logger
ATL Server is a library of C++ classes that allow developers to build internet based applications.
ATL Server. It's open source too! And of course there is always ISAPI. Ah, the bad old days. :)
In your other question you mention that your embedded system is openwrt. As this router firmware already comes with a embedded web server (for it's administration UI), why don't you use that for you app as well?
Our web app backend is in C++ via CGI and we use Clearsilver templates along with the HDF that comes with it.
Give us some more hints about what you're trying to do.
You can write a good old-fashioned cgi program in C++ easily enough, and run it with FastCGI. We used to do that all the time.
You could write a C++ program embedding a lightweight HTTP server as well.
Both of them are much bigger PITAs than using something like perl or ruby.
So for why C++?
Update
Okay, got it. The main thing about FastCGI is that it avoids a fork-exec to run your CGI program, but it is a little bit different API. That's good, but you still have the problem of handling the HTTP stuff.
There are, however, several very lightweight HTTP servers, like Cherokee and Lighttpd. In similar situations (building web interfaces for appliances) I've seen people use one of these and run their C/C++ programs under them as a CGI. Lighttpd in particular seems to concentrate on making CGI-like stuff fast and efficient.
Another update. I just had cgicc pointed out to me: http://www.gnu.org/software/cgicc/
That might solve some problems.
You can try Cutelyst a C++11 built with Qt, with one of the best positions on TechEmpower Benchmarks.
Even though it requires Qt 5.6+ a full CMS (CMlyst) uses around 6MB of RAM while serving around 3000 requests per seconds on a single core.
And for your string manipulation issue QString is just an amazing class for that.

What's the best way to parse RSS/Atom feeds for an iPhone application?

So I understand that there are a few options available as far as parsing straight XML goes: NSXMLParser, TouchXML from TouchCode, etc. That's all fine, and seems to work fine for me.
The real problem here is that there are dozens of small variations in RSS feeds (and Atom feeds too), so supporting all possible permutations of feeds available out on the Internet gets very difficult to manage. I searched around for a library that would handle all of these low-level details for me, but came out without anything.
Since one could link to an external C/C++ library in Objective-C, I was wondering if there is a library out there that would be best suited for this task? Someone must have already created something like this, it's just difficult to find the "right" option from the thousands of results in Google.
Anyway, what's the best way to parse RSS/Atom feeds in an iPhone application?
I've just released an open source RSS/Atom Parser for iPhone and hopefully it might be of some use.
I'd love to hear your thoughts on it too!
"Best" is relative. The best performance you'll need to go the SAX route and implement the handlers. I don't know of anything out there open source available (start a google code project and release it for the rest of us to use!)
Whatever you do, it's probably a really bad idea to try and load the whole XML file into memory and act on it like a DOM. Chances are you'll get feeds that are much larger than you can handle on the device leading to frequent memory warnings and crashes.
I'm currently trying out the MWFeedParser #Michael Waterfall is developing.
Quite easy to set up and use (I'm a beginner iPhone developer).
His sample code for using MWFeedParser to populate a UITableViewController implementation is helpful as well.
take a look at apple's XML Performance sample -- which points to using libXML directly -- for performance and quicker updates to the display. Which may be important if you are working with very large feeds.
Check out my library for parsing Atom feeds, (BSAtomParser) at GitHub. It doesn't care about validating the feed, it does its best at returning whatever is valid. The parser covers most of RFC 4287, even extensions.
Here's my solution: a really simple yet powerful RSS parsing library: https://github.com/H2CO3/RSSKit
Have you looked at TouchCode yet? I don't think it has an RSS processor, but it might give you a start.
http://code.google.com/p/touchcode/
I came accross igasus project on sourceforge today. I haven't used it or really checked it, but perhaps it might help.
From their site:
igagus is a web service for the iPhone that allows aggregation of RSS to be delivered in an iPhone friendly format.
Actually, I was trying to suggest you ask on the TouchCode discussion board, because I remember someone was trying to expand it to support RSS. That might be a decent starting point. But I was being rushed by my wife.
But I see now that TouchCode doesn't have a discussion board. I'd still ask the author, though, he might know what came of that effort.
This might be a reasonable starting point for you. Atom support isn't there yet, but you could help out?

Which C++ Library for CGI Programming?

I'm looking at doing some work (for fun) in a compiled language to run some simple tests and benchmarks against php.
Basically I'd like to see what other people use for C++ CGI programming. (Including backend database, like mysql++ or something else)
I'm not sure exactly what you're looking for, but there is a C++ web framework called wt (pronounced "witty"). It's been kept pretty much up to date and if you want robust C++ server-side code, this is probably what you're looking for.
You can check it out and read more at the wt homepage.
P.S. You may have some trouble installing wt if you don't have experience with *nix or C++ libraries. There are walkthroughs but since frameworks like these are the road less traveled, expect to hit a few bumps.
Another option is the Cgicc library which appears to be mature (currently at version 3.x):
http://www.gnu.org/software/cgicc/
If I were thinking of working at that level, I'd probably just write a straight-up Apache or IIS module instead of a CGI.
That said, if you do want to go with CGI, I'd suggest using the venerable cgic from Thomas Boutell. It's a "plain" C library, but it's been in constant use since the mid '90s so it's thoroughly tested and solid as a rock.
Check out Boost's C++ CGI class, which is not a part of boost yet.
In short, I don't think there is such a thing for generic server CGI programming (happy to be proven wrong of course).
Instead you'll probably have to target the server APIs, such as Apache's. This looks like a reasonable introduction to request processing, which will be a big part of what you're doing.
As an alternative, Lighttpd may be even more developer-friendly, and (particularly if you're looking at performance) faster.
I note there's a cpp-netlib under development but it seems to be HTTP client only.
I've found very pleasant to use CppCMS to develop a Fast CGI app to deploy in a nginx server - although it never went in production =( . The CppCMS project also includes some libraries for SQL connectivity.