How do I address a very specific web url? - c++

I am writing code in C++ (using the Poco net libraries) to try to create a program for fun which will email me every few hours with updates to the TwitchPlaysPokemon stream (stupid, I know). Here is my code:
#include <iostream>
#include "Poco/Net/SocketAddress.h"
#include "Poco/Net/StreamSocket.h"
#include "Poco/Net/SocketStream.h"
#include "Poco/StreamCopier.h"
using namespace std;
using namespace Poco::Net;
using namespace Poco;
int main(int argc, char *argv[])
{
string url = "www.reddit.com";
string fullPage;
SocketAddress sa(url, 80);
StreamSocket socket(sa);
SocketStream str(socket);
str << "GET / HTTP/1.1\r\n"
"Host: " << url << "\r\n"
"\r\n";
str.flush();
StreamCopier::copyStream(str, cout);
}
This exact code works perfectly fine. It grabs the raw html of www.reddit.com and prints it to the console. However, I'm trying to get information from one of two places for my program:
Either:
Here (url = "http://www.reddit.com/live/sw7bubeycai6hey4ciytwamw3a")
or
Here (url = "https://sites.google.com/site/twitchplayspokemonstatus/")
Either of these will be fine for my purposes. The problem is that when I plug these values in as the url in my program, the program has no idea what I'm talking about. Specifically, I get the following:
so clearly it cannot find the host. This is where I am stuck, as I know very little about internet protocol, hosts, etc. I tried to see if there was a specific IP address for this website (using cmd prompt ping), but it couldn't figure it out either ( it says "Ping request could not find the host www.reddit.com/live/sw7bubeycai6hey4ciytwamw3a"). The Poco library accepts written out urls (www.reddit.com), IPv4, and IPv6 addresses as the host input to SocketAddress (where I use the variable url, the other variable is the port which I've been told should basically always be 80?)
Question: I need help figuring out how I should be identifying the host to the Poco library. In other words, how do I properly refer to the host for either of those two sites listed above in such a way that my code can recognize it and grab the HTML from the page.

It sounds as though you may not understand HTTP correctly. Here's a brief refresher.
To get the contents of the URL http://www.example.com/path/page.html, the corresponding HTTP request would be sent to www.example.com on port 80, and would have the contents:
GET /path/page.html HTTP/1.1\r\n
Host: www.example.com\r\n
\r\n
The critical part that it doesn't look like you're doing correctly here is splitting the URL into the hostname and path components. Having a single url variable won't work (unless you manually split it on the first slash).

Related

Curl replacing \u in response to \\u in c++

I am sending a request using libcurl in windows and the response I get has some universal characters in them that start with \u. Libcurl is not recognizing this universal character and as a result, it escapes the \ turning the universal character to \\u.
Is there any way to fix this? I have tried using str.replace but it can not replace escaped sequences
the code I used to implent this was
#include <iostream>
#include <string>
#include <cpr/cpr.h>
int main()
{
auto r = cpr::Get(cpr::Url{"http://prayer.osamaanees.repl.co/api"});
std::string data = r.text;
std::cout << data << std::endl;
return 0;
}
This code uses the cpr library which is a wrapper for curl.
It prints out the following:
{
"times":{"Fajr":"04:58 AM","Sunrise":"06:16 AM","Dhuhr":"12:30 PM","Asr":"04:58 PM","Maghrib":"06:43 PM","Isha":"08:00 PM"},
"date":"Tuesday, 20 Mu\u1e25arram 1442AH"
}
Notice the word Mu\u1e25arram, it should have been Muḥarram but since curl escaped the \ before u it prints out as \u1e25
Your analysis is wrong. Libcurl is not escaping anything. Load the URL in a web browser of your choosing and look at the raw data that is actually being sent. For example, this is what I see in Firefox:
The server really is sending Mu\u1e25arram, not Muḥarram like you are expecting. And this is perfectly fine, because the server is sending back JSON data, and JSON is allowed to escape Unicode characters like this. Read the JSON spec, particularly Section 9 on how Unicode codepoints may be encoded using hexidecimal escape sequences (which is optional in JSON, but still allowed). \u1e25 is simply the JSON hex-escaped form of ḥ.
You are merely printing out the JSON content as-is, exactly as the server sent it. You are not actually parsing it at all. If you were to use an actual JSON parser, Mu\u1e25arram would be decoded to Muḥarram for you. For example, here is how Firefox parses the JSON:
It is not libcurl's job to decode JSON data. Its job is merely to give you the data that the server sends. It is your job to interpret the data afterwards as needed.
I would like to thank Remy for pointing out how wrong I was in thinking curl or the JSON parser was the problem when in reality I needed to convert my console to UTF-8 mode.
It was after I fixed my Codepage I was able to get the output I wanted.
For future reference, I am adding the code that fixed my problem:
We need to include Windows.h
#include <Windows.h>
Then at the start of our code:
UINT oldcp = GetConsoleOutputCP();
SetConsoleOutputCP(CP_UTF8);
After this we need to reset the console back to the original codepage with:
SetConsoleOutputCP(oldcp);

c++ Http request SSL error

OK so I'm working on this c++ project where I read from a site, needs to be google, to get answers to display to the user and I am using the code from this link calling a website second solution the second solution as the base for my project. Requesting the xml from Google's search results will not work as the answers are not in the xml so I made a google custom search to output to a json using the google custom search api. My problem is when is requesting the url, link to my google custom search example here, of the json it gives an ssl error from my program. How do I solve this ssl problem? Seems I need to GET https not http or I need code to verify the ssl certificate. Not sure how to do this as I'm new to protocols and networking and such on c++.
Cannot use third part libraries.
And I apologize ahead of time for my problem and being a noob on this subject.
The given code uses plain sockets, no SSL is mentioned in the code, so it will not work with HTTPS or any other SSL based protocol. In order to connect to HTTPS web page, one has to use OpenSSL or any other library providing this layer for TCP sockets (unless you're gonna deal with encryption on your own which I doubt). Here is an example by using Boost library:
/*
Compile with
g++ -std=c++11 -lpthread -lboost_system -lssl -lcrypto -ogoog goog.cpp
*/
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
#include <boost/asio.hpp>
#include <boost/asio/ssl.hpp>
using std::cout;
using std::endl;
using std::vector;
using std::string;
using boost::asio::ip::tcp;
using boost::asio::ip::address;
using boost::asio::io_service;
using boost::asio::connect;
using boost::asio::buffer;
using boost::system::error_code;
using boost::system::system_error;
using boost::asio::ssl::context;
int main()
{
const char* PORT = "443";
const string HOST = "googleapis.com";
const string REQUEST = "GET https://www.googleapis.com/customsearch/v1?q=when%20is%20george%20washingtons%20birthdate&cx=014855184903748195002:umdboiolvoi&key=AIzaSyDxFosFrZlMpgdFeTsPWZfp925MbaBX49s HTTP/1.1\r\n\r\n";
try
{
io_service ios;
tcp::resolver resolver(ios);
tcp::resolver::query query(HOST, PORT);
tcp::resolver::iterator iterator = resolver.resolve(query);
context ctx(context::sslv23);
boost::asio::ssl::stream<tcp::socket> sock(ios, ctx);
sock.set_verify_mode(boost::asio::ssl::verify_none);
connect(sock.lowest_layer(), iterator);
sock.handshake(boost::asio::ssl::stream_base::client);
const int BUFLEN = 2048;
vector<char> buf(BUFLEN);
sock.write_some(boost::asio::buffer(REQUEST, REQUEST.size()));
while (true)
{
size_t len = sock.read_some(boost::asio::buffer(buf, BUFLEN));
cout << "main(): buf.data()=";
cout.write(buf.data(), len);
}
}
catch (system_error& exc)
{
cout << "main(): exc.what()=" << exc.what() << endl;
}
return EXIT_SUCCESS;
}
It connects to Google APIs (no certificate verification is performed) over SSL socket, sends GET request, fetches the page and prints to stdout. However, it is done in the infinite loop, so it's up to you to parse the JSON answer and determine when to exit the reading loop.

How-to: Send text from webpage and send to external application

i'm writing a program for my algorithm class that is supposed to be able to traverse a webpage, find a random address, and then using a browser extension(Firefox/Chrome), it should do a Google Maps search for that address. I literally just thought that maybe trying to use the extension to capture text and put it into a text file and then make my program read that text file would be a good idea, but i have no clue as to how that would be implemented.
My code so far (Don't worry, after a Window UI, it will get longer. This is just a test console app):
#include <iostream>
#include <cstdlib>
#include <stdlib.h>
#include <windows.h>
using namespace std;
int main ()
{
string address;
cout << "Please input address: ";
//cin >> address;
getline(cin, address);
//word_list = getRecursiveURLs(url, DEPTH)
//return cleaner(word_list)
//string address = "Houston, Tx ";
std::string str = "http://mapof.it/" + address;
//cout << mapSearch;
const char * c = str.c_str();
ShellExecute(NULL, "open", c, NULL, NULL, SW_SHOWNORMAL);
}
Right now, my code takes in an address and adds it to the end of a "Mapof.it" url that basically initiates a GMaps search.
It look like user is interact with your C++ program. It doesn't need to communicate with browser progress.
You can send http request from C++ program, fetch the reponse text, then parse it.
First, you try to find whether the website provide a api url which return json/xml format, because json/xml is easier to parse. For example, Google Map does provide api.
If not, try to use regular expression to parse html, or find some DOM handle library to parse it with DOM.
If your result text can't not extract from raw, it create by JavaScript dynamically, you can find some "headless browser" library to help you.
If you need a full feature browser, use QT, it provide QtWebkit widget.

How to find the computer's domain programatically

Most of the systems currently are attached to a domain or so. Is there any method/system call I can make to get the system current domain (Something similar to gethostname). I am mainly looking for some portable solution (win/Lin) but if you can direct me how I can get the info in Linux it will be greatly helpful. I am trying to acquire the same in a C++ program in Linux, but have not yet been able.
Just to clarify, I know we can get the hostname easily. It the "somedomain" part of the "localhost#somedomain" I am looking for.
There's a getdomainname() function, that gets the DNS domain name of your computer (not the workgroup/Windows domain), e.g.:
#include <iostream>
#include <unistd.h>
int main() {
char buffer[1024];
getdomainname(buffer, sizeof(buffer));
std::cout << buffer << std::endl;
}

Why does getenv("QUERY_STRING") return null in a FastCGI C++ program?

So I have a working FastCGI application written in C++ using Light HTTPd, however I'm unable to retrieve the querystring using getenv("QUERY_STRING"). Everything works fine if I take out the querystring request (or add a check for null), but with that in place it fails:
#include <stdlib.h>
#ifdef _WIN32
#include <process.h>
#else
#include <unistd.h>
extern char ** environ;
#endif
#include "fcgio.h"
#include "fcgi_config.h" // HAVE_IOSTREAM_WITHASSIGN_STREAMBUF
#include "redisclient.h"
....
while (FCGX_Accept_r(&request) == 0)
{
fcgi_streambuf cin_fcgi_streambuf(request.in);
fcgi_streambuf cout_fcgi_streambuf(request.out);
fcgi_streambuf cerr_fcgi_streambuf(request.err);
...
cout << "Content-type: text/html\r\n"
"\r\n"
"<TITLE>^_^</TITLE>\n"
"<H1>echo-cfpp</H1>\n"
"<H4>PID: " << pid << "</H4>\n"
"<H4>Request Number: " << ++count << "</H4>\n";
// If I make this conditional on getenv("QUERY_STRING") not returning null,
// then the program behaves reliably.
cout <<getenv("QUERY_STRING");
}
I've verified that I'm passing a querystring in the request, so why then is getenv("QUERY_STRING") returning null? And what should I be doing to retrieve it?
I don't have extensive experience with the reference FastCGI library for C/C++, but I've implemented both CGI and FastCGI libraries for Windows in the past, so the following might help.
Basically, as per the FastCGI specification, the CGI environment variables are passed through the FCGI_PARAMS stream, which are normally decoded by the FastCGI library. Now, FastCGI doesn't specify much about what is required and what isn't, and it's assumed that the rules are basically the same as for CGI. The CGI specification section 4.1.7 says the following about the QUERY_STRING environment variable:
The server MUST set this variable; if the Script-URI does not include a query component, the QUERY_STRING MUST be defined as an empty string ("").
Now, this basically means that your FastCGI library is decoding a QUERY_STRING parameter in the FCGI_PARAMS stream (or else the gateway server is not following the specification).
Since the reference library tries to abstract both CGI and FastCGI library in the same program and supports multi-threading, I strongly doubt that you would find the result in an environment variable (or there would be a race condition).
Basically, this means getenv() always returns NULL and you are passing a null const char* to an std::ostream through operator<< is illegal. This probably crashes your application since NULL is not a special value designating the end of stream.
TL; DR:
You cannot access the QUERY_STRING value through the process environment because you are using FastCGI, not CGI. You need to read the library's documentation for the standard way to access the request's query string.
Edit: I've got some more info on this situation.
The documentation for FCGX_Accept_r() in fcgiapp.h says`:
Creates a parameters data structure to be accessed
via getenv(3) (if assigned to environ) or by FCGX_GetParam
and assigns it to *envp.
Using the following after FCGX_Accept_r would fix the issue:
environ = request.envp;
However, this is not safe for multi-threaded applications (environ is not in thread-local storage), so I recommend that you use the other documented method instead:
const char * query_string = FCGX_GetParam("QUERY_STRING", request.envp);