I'm currently searching for a portable way of getting the local IP-addresses. Because I'm using Boost anyway I thought it would be a good idea to use Boost.Asio for this task.
There are several examples on the net which should do the trick. Examples:
Official Boost.Asio Documentation
Some Asian Page
I tried both codes with just slight modifications. The Code on Boost.Doc was changed to not resolve "www.boost.org" but "localhost" or my hostname instead. For getting the hostname I used boost::asio::ip::host_name() or typed it directly as a string.
Additionally I wrote my own code which was a merge of the above examples and my (little) knowledge I gathered from the Boost Documentation and other examples.
All the sources worked, but they did just return the following IP:
127.0.1.1 (That's not a typo, its .1.1 at the end)
I run and compiled the code on Ubuntu 9.10 with GCC 4.4.1
A colleague tried the same code on his machine and got
127.0.0.2 (Not a typo too...)
He compiled and run on Suse 11.0 with GCC 4.4.1 (I'm not 100% sure)
I don't know if it is possible to change the localhost (127.0.0.1), but I know that neither me or my colleague did it. ifconfig says loopback uses 127.0.0.1. ifconfig also finds the public IP I am searching for (141.200.182.30 in my case, subnet is 255.255.0.0)
So is this a Linux-issue and the code is not as portable as I thought? Do I have to change something else or is Boost.Asio not working as a solution for my problem at all?
I know there are much questions about similar topics on Stackoverflow and other pages, but I cannot find information which is useful in my case. If you got useful links, it would be nice if you could point me to it.
PS:
Here is the modified code I used from Boost.Doc:
#include <boost/asio.hpp>
using boost::asio::ip::tcp;
boost::asio::io_service io_service;
tcp::resolver resolver(io_service);
tcp::resolver::query query(boost::asio::ip::host_name(), "");
tcp::resolver::iterator iter = resolver.resolve(query);
tcp::resolver::iterator end; // End marker.
while (iter != end)
{
tcp::endpoint ep = *iter++;
std::cout << ep << std::endl;
}
Here's a trick I learned from python network programming (google) to figure out my machine's ip address. This only works if you have an internet connection and can connect to google.com and does give me my home machine's 192.168.x.x private address.
try {
boost::asio::io_service netService;
udp::resolver resolver(netService);
udp::resolver::query query(udp::v4(), "google.com", "");
udp::resolver::iterator endpoints = resolver.resolve(query);
udp::endpoint ep = *endpoints;
udp::socket socket(netService);
socket.connect(ep);
boost::asio::ip::address addr = socket.local_endpoint().address();
std::cout << "My IP according to google is: " << addr.to_string() << std::endl;
} catch (std::exception& e){
std::cerr << "Could not deal with socket. Exception: " << e.what() << std::endl;
}
You can find "your" address with the code you posted. BUT... it gets complicated. There may be multiple NICs, there may be LAN and WAN addresses, wired and wireless, loopback... On my desktop i had one NIC but two ips here from two diff DHCP servers on my lan...
I found it was better to let the user provide the IP to bind to as a command line parameter. And yes, that's a portable solution! :-)
If you edit your /etc/hosts file (this is *nix only, might work for windows too... I'm not sure) you can correct this issue.
Inside the hosts file you'll find something like: (this is Ubuntu, note the 1.1)
127.0.0.1 localhost
127.0.1.1 yourPcName.yourNetwork.tld
if you change this file to
127.0.0.1 localhost
127.0.1.1 yourPcName.yourNetwork.tld
your.real.ip.here yourPcName
then the hostname should resolve properly.
One method of testing proper resolution is with the "hostname -i" command which should print your ip address incorrectly before you change hosts, and then correctly afterwards.
Of course this is terrible solution for dynamic IPs... eh.
For me, resolving-based methods have always proven unreliable in various corner cases.
Operating systems provide APIs such as
getifaddrs on Linux (https://man7.org/linux/man-pages/man3/getifaddrs.3.html)
The same on macOS and BSD (https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/getifaddrs.3.html)
GetAdaptersAddresses on Win32 (https://learn.microsoft.com/en-us/windows/win32/api/iphlpapi/nf-iphlpapi-getadaptersaddresses).
Notice that at no point Boost.ASIO contains calls to these functions so you are stuck with the local-domain resolving method.
If you are looking for a cross-platform solution that does call the above OS functions, Qt provides it:
for (const QNetworkInterface& iface : QNetworkInterface::allInterfaces())
for (const QNetworkAddressEntry& entry : iface.addressEntries())
qDebug() << entry.ip();
Cross platform, but only because of the #ifdef _WIN32 … #else:
boost::asio::ip::address_v6 sinaddr_to_asio(sockaddr_in6 *addr) {
boost::asio::ip::address_v6::bytes_type buf;
memcpy(buf.data(), addr->sin6_addr.s6_addr, sizeof(addr->sin6_addr));
return boost::asio::ip::make_address_v6(buf, addr->sin6_scope_id);
}
#if defined(_WIN32)
#undef UNICODE
#include <winsock2.h>
// Headers that need to be included after winsock2.h:
#include <iphlpapi.h>
#include <ws2ipdef.h>
typedef IP_ADAPTER_UNICAST_ADDRESS_LH Addr;
typedef IP_ADAPTER_ADDRESSES *AddrList;
std::vector<boost::asio::ip::address> get_local_interfaces() {
// It's a windows machine, we assume it has 512KB free memory
DWORD outBufLen = 1 << 19;
AddrList ifaddrs = (AddrList) new char[outBufLen];
std::vector<boost::asio::ip::address> res;
ULONG err = GetAdaptersAddresses(AF_UNSPEC,
GAA_FLAG_INCLUDE_PREFIX | GAA_FLAG_SKIP_ANYCAST | GAA_FLAG_SKIP_DNS_SERVER, NULL, ifaddrs,
&outBufLen);
if (err == NO_ERROR) {
for (AddrList addr = ifaddrs; addr != 0; addr = addr->Next) {
if (addr->OperStatus != IfOperStatusUp) continue;
// if (addr->NoMulticast) continue;
// Find the first IPv4 address
if (addr->Ipv4Enabled) {
for (Addr *uaddr = addr->FirstUnicastAddress; uaddr != 0; uaddr = uaddr->Next) {
if (uaddr->Address.lpSockaddr->sa_family != AF_INET) continue;
res.push_back(boost::asio::ip::make_address_v4(ntohl(reinterpret_cast<sockaddr_in *>(addr->ifa_addr)->sin_addr.s_addr)));
}
}
if (addr->Ipv6Enabled) {
for (Addr *uaddr = addr->FirstUnicastAddress; uaddr != 0; uaddr = uaddr->Next) {
if (uaddr->Address.lpSockaddr->sa_family != AF_INET6) continue;
res.push_back(sinaddr_to_asio(reinterpret_cast<sockaddr_in6 *>(addr->ifa_addr)));
}
}
}
} else {
}
delete[]((char *)ifaddrs);
return res;
}
#elif defined(__APPLE__) || defined(__linux__)
#include <arpa/inet.h>
#include <ifaddrs.h>
#include <net/if.h>
#include <sys/types.h>
std::vector<boost::asio::ip::address> get_local_interfaces() {
std::vector<boost::asio::ip::address> res;
ifaddrs *ifs;
if (getifaddrs(&ifs)) {
return res;
}
for (auto addr = ifs; addr != nullptr; addr = addr->ifa_next) {
// No address? Skip.
if (addr->ifa_addr == nullptr) continue;
// Interface isn't active? Skip.
if (!(addr->ifa_flags & IFF_UP)) continue;
if(addr->ifa_addr->sa_family == AF_INET) {
res.push_back(boost::asio::ip::make_address_v4(ntohl(
reinterpret_cast<sockaddr_in *>(addr->ifa_addr)->sin_addr.s_addr)));
} else if(addr->ifa_addr->sa_family == AF_INET6) {
res.push_back(sinaddr_to_asio(reinterpret_cast<sockaddr_in6 *>(addr->ifa_addr)));
} else continue;
}
freeifaddrs(ifs);
return res;
}
#else
#error "..."
#endif
Assuming you have one network card / one local ip address:
#include <boost/asio.hpp>
namespace ip = boost::asio::ip;
std::string getAddress()
{
boost::asio::io_service ioService;
ip::tcp::resolver resolver(ioService);
return resolver.resolve(ip::host_name(), "")->endpoint().address().to_string();
}
Related
Whenever I attempt to run my app on macOS Catalina 10.15.2 (19C57) or Linux 5.4.3-arch1-1 x86_64 GNU/Linux, I get:
_rSockd: bind: Address already in use
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: failed to bind to socket
[1] 31181 abort Saol/saol
Now I have checked for open ports with sudo netstat -tulp udp and sudo lsof -i udp, but none of these report any applications of mine using these ports. Nothing is currently listening on the port(45100).
I have browsed several similar posts that mentioned adding the SO_REUSEADDR, but this did not change anything when I have it or when I don't.
Posts also mentioned checking for something using the port, but I was unable (I believe unless the above commands were the wrong ones to use) to find my ports locked up. I have tried several different ports(currently using 45100); the _port var is set to an int before the _Init function is called. getaddrinfo's second argument is a const char *service, and stated in the man pages:
service sets the port in each returned address structure. If this
argument is a service name (see services(5)), it is translated to the
corresponding port number. This argument can also be specified as a
decimal number, which is simply converted to binary. If service is
NULL, then the port number of the returned socket addresses will be
left uninitialized. If AI_NUMERICSERV is specified in hints.ai_flags
and service is not NULL, then service must point to a string contain‐
ing a numeric port number. This flag is used to inhibit the invoca‐
tion of a name resolution service in cases where it is known not to
be required.
All I am doing to the _port is this: int -> string -> char *
Below is the relevant section of my code:
#include <iostream>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <unistd.h>
#define PORT "45100"
auto main(void) -> int {
int rv;
int _rSockd;
const char * port = std::to_string(8080).c_str();
struct addrinfo hints;
struct addrinfo *servinfo;
struct addrinfo *p;
memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_DGRAM;
hints.ai_flags = AI_PASSIVE;
if((rv = getaddrinfo(NULL, port, &hints, &servinfo)) != 0) {
std::cerr << "getaddrinfo: " << std::endl << gai_strerror(rv) << std::endl;
throw std::runtime_error("failed to getaddrinfo");
}
for(p = servinfo; p != NULL; p = p->ai_next) {
if((_rSockd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) {
std::perror("_rSockd: socket");
close(_rSockd);
continue;
}
if(bind(_rSockd, p->ai_addr, p->ai_addrlen) == -1) {
close(_rSockd);
std::perror("_rSockd: bind");
continue;
}
break;
}
if(p == NULL) {
throw std::runtime_error("failed to bind to socket");
}
freeaddrinfo(servinfo);
return 0;
}
Any help would be appreciated; I feel I have been staring at it too long and am missing something basic.
NOTE: When I added the MCVE (the above updated code), I no longer have my issue when running with port 8080 and 45100. So I will now explain the ??build?? process now to see if this is causing the issue.
UPDATE
When I made the MCVE, it worked. It never had the issues from my project base.
My project is a library being called from another executable project. libSaol has the netcode, saol is the executable that links libSaol.
To be honest, this confuses me more...
I'm struggling with an issue for hours:
I want to connect an boost asio udo socket to an endpoint. There is no problem doing this in IPv4. But if I try to do the same in IPv6, I get an error-code "invalid argument".
using boost::asio::ip::udp;
struct UdpConnectionParams
{
udp::endpoint m_localEndpoint;
udp::endpoint m_remoteEndpoint;
}
boost::system::error_code setupUdpConnection(udp::socket& p_socket, const UdpConnectionParams& p_params)
{
// close socket
boost::system::error_code h_ignoreError;
p_socket.close(h_ignoreError);
// variables for kind of UDP connection
udp h_protocol(udp::v4());
bool h_shallBind{false};
bool h_shallConnect{false};
// determine kind of connection
if(p_params.m_localEndpoint != udp::endpoint())
{
h_protocol = p_params.m_localEndpoint.protocol();
h_shallBind = true;
}
if(p_params.m_remoteEndpoint != udp::endpoint())
{
h_protocol = p_params.m_remoteEndpoint.protocol();
h_shallConnect = true;
}
if(!h_shallBind && !h_shallConnect)
{
// no endpoint specified, return error
return boost::system::error_code(ENetworkErrorCode::NO_ENDPOINT_SPECIFIED, NetworkErrorCategory::getCategory());
}
try
{
p_socket.open(h_protocol);
//bind socket to certain endpoint
if(h_shallBind)
{
p_socket.bind(p_params.m_localEndpoint);
}
//connect socket to client. Thus it is possible to use p_socket.send()
if(h_shallConnect)
{
p_socket.connect(p_params.m_remoteEndpoint);
}
}
catch (boost::system::system_error& h_error)
{
p_socket.close(h_ignoreError);
return h_error.code();
}
// no error
return boost::system::error_code();
}
int main()
{
boost::asio::io_service service;
udp::socket socket(service);
boost::system::error_code error;
UdpConnectionParams params;
params.m_localEndpoint = udp::endpoint(udp::v6(), 55555);
params.m_remoteEndpoint = udp::endpoint(boost::asio::ip::address_v6::from_string("ff01::101"), 55555);
error = setupUdpConnection(socket, params);
cout << error << error.message() << endl; // "invalid argument"
return 0;
}
The only way I get no error, is with localhost IP address (::1). There is no difference if I bind the socket to an endpoint.
What am I doing wrong?
What am I doing wrong?
The problem is that you don't specify an interface index/scope in the IPv6 address you are using. IPv6 multicast address require a scope to be specified, so that the network stack will know which of your computer's local network interfaces to associate the IP address with.
i.e. instead of:
boost::asio::ip::address_v6::from_string("ff01::101"), 55555);
you need something like:
boost::asio::ip::address_v6::from_string("ff01::101%eth0"), 55555);
(The suffix after the % symbol will depend on the name of the network interface you want to use, of course)
(As a side note, the "ff01::" prefix is for node-local IPv6 multicast groups, which means that your UDP packets will only go to other programs running on the same computer. If that's what you intended, then great; on the other hand, if you wanted your UDP packets to reach other computers on the same LAN, you'll want to use a "ff02::" or "ff12::" prefix instead (ff02:: would be for a well-known multicast address, ff12:: would be for a transient multicast address). See the "Multicast address scope" table on the Wikipedia page for details)
In this users example a route is obtained by using the command line utility ip in linux. Example output:
$ ip route get 4.2.2.1
4.2.2.1 via 192.168.0.1 dev eth0 src 192.168.0.121
cache
$
Let's refer to the addresses in the following manner:
4.2.2.1 as address A (destination)
192.168.0.1 as address B (gateway)
192.168.0.121 as address C (source)
In my case I'm interested in C - and I'm trying to figure out how I might be able to obtain the same piece of information in my c++ program. Specifically
Given address A, find address C
do not want to use system or anything that will somehow run a shell command
Using boost is allowed, and preferred
Any recommendations? Thanks
There you go:
#include <iostream>
#include "boost/asio/io_service.hpp"
#include "boost/asio/ip/address.hpp"
#include "boost/asio/ip/udp.hpp"
boost::asio::ip::address source_address(
const boost::asio::ip::address& ip_address) {
using boost::asio::ip::udp;
boost::asio::io_service service;
udp::socket socket(service);
udp::endpoint endpoint(ip_address, 0);
socket.connect(endpoint);
return socket.local_endpoint().address();
}
// Usage example:
int main() {
auto destination_address = boost::asio::ip::address::from_string("8.8.8.8");
std::cout << "Source ip address: "
<< source_address(destination_address).to_string()
<< '\n';
}
mash's answer is almost right but fails on iOS. The line udp::endpoint endpoint(ip_address, 0); needs to have a non-zero port or you'll get the error "Can't assign requested address" since 0 is not a valid port number. I don't think it matters what the port is (as long as it's a valid non-zero port number) so I would recommend using 3478 which is the standard UDP STUN port.
Corrected code:
#include <iostream>
#include "boost/asio/io_service.hpp"
#include "boost/asio/ip/address.hpp"
#include "boost/asio/ip/udp.hpp"
boost::asio::ip::address source_address(
const boost::asio::ip::address& ip_address) {
using boost::asio::ip::udp;
boost::asio::io_service service;
udp::socket socket(service);
udp::endpoint endpoint(ip_address, 3478);
socket.connect(endpoint);
return socket.local_endpoint().address();
}
// Usage example:
int main() {
auto destination_address = boost::asio::ip::address::from_string("8.8.8.8");
std::cout << "Source ip address: "
<< source_address(destination_address).to_string()
<< '\n';
}
I have a Debian/linux server which has several Ip adresses, all assigned to the same physical network card. The /etc/network/interfaces config file looks like this (the xx represent numbers)
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 176.xx.xx.144
netmask 255.255.255.0
network 176.xx.xx.0
broadcast 176.xx.xx.255
gateway 176.xx.xx.254
auto eth0:0
allow-hotplug eth0:0
iface eth0:0 inet static
address 46.xx.xx.57
netmask 255.255.255.255
broadcast 46.xx.xx.57
auto eth0:1
allow-hotplug eth0:1
iface eth0:1 inet static
address 94.xx.xx.166
netmask 255.255.255.255
broadcast 94.xx.xx.166
//IPv6 Stuff...
I am working on a client application that uses Boost Asio to handle all network connections. In this Application I want to be able to connect to an external server using a specific networkinterface/Ip address. I found this similar question, however simply binding a boost::asio::ip::tcp::socket to a specfic endpoint and then connect to an external Server doesn't work. Here is a minimal working example of what I tried:
#include <iostream>
#include <boost/asio.hpp>
int main( int argC, char *argV[] ) {
boost::asio::io_service ioService;
boost::asio::ip::tcp::socket socket(ioService);
boost::asio::ip::tcp::endpoint localEndpoint(
boost::asio::ip::address::from_string("94.xx.xx.166"), 0);
boost::asio::ip::tcp::resolver resolver(ioService);
boost::asio::ip::tcp::resolver::iterator remoteEndpoint =
resolver.resolve(boost::asio::ip::tcp::resolver::query("haatschii.de", "80"));
socket.open(boost::asio::ip::tcp::v4());
std::cout << "Before binding socket has local endpoint: "
<< socket.local_endpoint().address().to_string()
<< ":" << socket.local_endpoint().port() << std::endl;
socket.bind(localEndpoint);
std::cout << "Before connecting socket has local endpoint: "
<< socket.local_endpoint().address().to_string()
<< ":" << socket.local_endpoint().port() << std::endl;
boost::asio::connect(socket, remoteEndpoint);
std::cout << "After connecting socket has local endpoint: "
<< socket.local_endpoint().address().to_string()
<< ":" << socket.local_endpoint().port() << std::endl;
//Test request to a page that echos our IP address.
boost::asio::write(socket,
boost::asio::buffer("GET /ip.php HTTP/1.1\r\nHost: haatschii.de\r\nAccept: */*\r\n\r\n", 57));
//Parse server response (not important for this code example)
return 0;
}
When I run this on my server I get:
Before binding socket has local endpoint: 0.0.0.0:0
Before connecting socket has local endpoint: 94.xx.xx.166:38399
After connecting socket has local endpoint: 176.xx.xx.144:45959
External server says we are using IP: 176.xx.xx.144
Right now I am a bit lost, because I don't know what else to try. I don't necessarily need a portable solution for this, anything that works with this Debian setup will do.
Update
I'll offer the bounty for a solution that works for my setup. If necessary I can change the /etc/network/interfaces config file. However in order to reuse my code, any solution has to work with Boost Asio sockets (at least as a wrapper).
To bind to a specific interface you have to open the connection first. You do that - so far so good. But after that you call boost::asio::connect(socket, remoteEndpoint); which will close the connection for you (as a service so to say).
Boost tells you that it does so - but you have to look closely. In the reference under parameters for the overloaded version of connect you are using it will say
Parameters
s
The socket to be connected. If the socket is already open, it will be closed.
or in its implementation in boost/asio/impl/connect.hpp:
// Copyright (c) 2003-2011 Christopher M. Kohlhoff (chris at kohlhoff dot com)
//
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
[...]
template <typename Protocol, typename SocketService,
typename Iterator, typename ConnectCondition>
Iterator connect(basic_socket<Protocol, SocketService>& s,
Iterator begin, Iterator end, ConnectCondition connect_condition,
boost::system::error_code& ec)
{
ec = boost::system::error_code();
for (Iterator iter = begin; iter != end; ++iter)
{
iter = connect_condition(ec, iter);
if (iter != end)
{
s.close(ec);
s.connect(*iter, ec);
if (!ec)
return iter;
}
}
if (!ec)
ec = boost::asio::error::not_found;
return end;
}
(note the s.close(ec);)
The solution
should be simple. Replace boost::asio::connect... by
socket.connect(*remoteEndpoint);
(or a loop over the respective remote endpoints, similar to the boost sourcecode, if necessary.)
Generally, you could use the following workflow:
void connect_handler(const boost::system::error_code& error)
{
if (!error) { // Connect succeeded.
}
}
...
boost::asio::ip::tcp::socket socket(io_service);
boost::asio::ip::tcp::endpoint remote_endpoint(
boost::asio::ip::address::from_string("1.2.3.4"), 12345); // server address
socket.open(boost::asio::ip::tcp::v4());
socket.bind(boost::asio::ip::tcp::endpoint(
boost::asio::ip::address::from_string("1.2.3.55"), // your local address
7777)
);
socket.async_connect(remote_endpoint, connect_handler);
More info could be found here.
I want to write a program in C/C++ that will dynamically read a web page and extract information from it. As an example imagine if you wanted to write an application to follow and log an ebay auction. Is there an easy way to grab the web page? A library which provides this functionality? And is there an easy way to parse the page to get the specific data?
Have a look at the cURL library:
#include <stdio.h>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, "curl.haxx.se");
res = curl_easy_perform(curl);
/* always cleanup */
curl_easy_cleanup(curl);
}
return 0;
}
BTW, if C++ is not strictly required. I encourage you to try C# or Java. It is much easier and there is a built-in way.
Windows code:
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#pragma comment(lib,"ws2_32.lib")
using namespace std;
int main (){
WSADATA wsaData;
if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0) {
cout << "WSAStartup failed.\n";
system("pause");
return 1;
}
SOCKET Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
struct hostent *host;
host = gethostbyname("www.google.com");
SOCKADDR_IN SockAddr;
SockAddr.sin_port=htons(80);
SockAddr.sin_family=AF_INET;
SockAddr.sin_addr.s_addr = *((unsigned long*)host->h_addr);
cout << "Connecting...\n";
if(connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr)) != 0){
cout << "Could not connect";
system("pause");
return 1;
}
cout << "Connected.\n";
send(Socket,"GET / HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n", strlen("GET / HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n"),0);
char buffer[10000];
int nDataLength;
while ((nDataLength = recv(Socket,buffer,10000,0)) > 0){
int i = 0;
while (buffer[i] >= 32 || buffer[i] == '\n' || buffer[i] == '\r') {
cout << buffer[i];
i += 1;
}
}
closesocket(Socket);
WSACleanup();
system("pause");
return 0;
}
There is a free TCP/IP library available for Windows that supports HTTP and HTTPS - using it is very straightforward.
Ultimate TCP/IP
CUT_HTTPClient http;
http.GET("http://folder/file.htm", "c:/tmp/process_me.htm");
You can also GET files and store them in a memory buffer (via CUT_DataSource derived classes). All the usual HTTP support is there - PUT, HEAD, etc. Support for proxy servers is a breeze, as are secure sockets.
You can do it with socket programming, but it's tricky to implement the parts of the protocol needed to reliably fetch a page. Better to use a library, like neon. This is likely to be installed in most Linux distributions. Under FreeBSD use the fetch library.
For parsing the data, because many pages don't use valid XML, you need to implement heuristics, not a real yacc-based parser. You can implement these using regular expressions or a state transition machine. As what you're trying to do involves a lot of trial-and-error you're better off using a scripting language, like Perl. Due to the high network latency you will not see any difference in performance.
You're not mentioning any platform, so I give you an answer for Win32.
One simple way to download anything from the Internet is the URLDownloadToFile with the IBindStatusCallback parameter set to NULL. To make the function more useful, the callback interface needs to be implemented.
Try using a library, like Qt, which can read data from across a network and get data out of an xml document. This is an example of how to read an xml feed. You could use the ebay feed for example.
It can be done in Multiplatform QT library:
QByteArray WebpageDownloader::downloadFromUrl(const std::string& url)
{
QNetworkAccessManager manager;
QNetworkReply *response = manager.get(QNetworkRequest(QUrl(url.c_str())));
QEventLoop event;
QObject::connect(response, &QNetworkReply::finished, &event, &QEventLoop::quit);
event.exec();
return response->readAll();
}
That data can be e.g. saved to file, or transformed to std::string:
const string webpageText = downloadFromUrl(url).toStdString();
Remember that you need to add
QT += network
to QT project configuration to compile the code.