parsing JSON data from URL with C++ - c++

I have a MYSQL database linked to a webpage which, through some PHP code pushes pertinent data using JSON to a blank page. I am trying to grab this data using C++ and sockets and verifying a Key using the URL data.
I have a method of doing this using C# and a simple WebClient().DownloadString but I am having issues translating it to C++.
I have tried using some of the other libs mentioned around stackoverflow but haven't had the luck I was hoping for. Any help would be greatly appreciated.
char* SocketRequest_URL(char* URL, char* Key, int sign, char* Path = "")
{
char *begin = bufferReturn;
char *end = begin + sizeof(bufferReturn);
std::fill(begin, end, 0);
Host = gethostbyname(URL);
SocketAddress.sin_addr.s_addr = *((unsigned long*)Host->h_addr);
SocketAddress.sin_family = AF_INET;
SocketAddress.sin_port = SERVER_PORT;
Socket = socket(AF_INET, SOCK_STREAM, 0);
if (connect(Socket, (struct sockaddr *)&SocketAddress, sizeof(SocketAddress)) != CELL_OK) {
return "CONNECTION ERROR";
}
strcpy(RequestBuffer, "GET /");
strcat(RequestBuffer, Path);
char sign_Buf[15];
sprintf(sign_Buf, "%d", sign);
strcat(RequestBuffer, sign_Buf);
strcat(RequestBuffer, "/");
strcat(RequestBuffer, Key);
strcat(RequestBuffer, " HTTP/1.0\r\nHOST: ");
strcat(RequestBuffer, URL);
strcat(RequestBuffer, "\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
strcat(RequestBuffer, "\r\n\r\n");
send(Socket, RequestBuffer, strlen(RequestBuffer), 0);
while (recv(Socket, bufferReturn, 1024, 0) > 0)
{
return bufferReturn;
}
}
Essentially, the page I am trying to pull data from has information pushed to it. I want to grab that data and save it. I can't even seem to get connected to the link.

I would not recommend to use socket API for HTTP/S interactions. Here is a list of the C/C++ libraries/stacks that may be used for that, for example:
WinInet
WinHttp
Qt
POCO
Boost.Asio
libcurl
Also, you should add if condition for socket API functions and check returning values which could give you a tip about what's happening in your code. And your code returns from the function after first successful call to the recv that is incorrect as you must expect as much calls to recv as needed, until it returns -1.

Related

How to send POST Request and receive GET response

I'm trying to send a POST Request and receive a GET response in return.
Here is my code :
#define IP "87.98.245.77"
#define Port 80
void getGETMessage(SOCKET s)
{
char *data = new char[2000];
recv(s, data, strlen(data), NULL);
int x = 3;
}
void SendPOSTMessage(SOCKET s)
{
string data = "POST /index.php?p=convert HTTP/1.1"
"Host: convert2mp3.netConnection: keep-alive"
"Content-Length: 80";
"Cache-Control: max-age=0"
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
"Origin: http://convert2mp3.net"
"User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36"
"Content-Type: application/x-www-form-urlencoded"
"Referer: http://convert2mp3.net/index.php?p=home"
"Accept-Encoding: gzip,deflate,sdch"
"Accept-Language: he-IL,he;q=0.8,en-US;q=0.6,en;q=0.4,de;q=0.2,fr;q=0.2";
if (send(s, data.c_str(), data.length(), NULL) < data.length())
{
cout << "Error : " << WSAGetLastError() << endl;
return ;
}
getGETMessage(s);
}
int main()
{
char *data = new char[200];
WSADATA wsaData;
int iResult = WSAStartup(MAKEWORD(2, 2), &wsaData);
SOCKET ConnectSocket;
ConnectSocket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
sockaddr_in clientService;
clientService.sin_family = AF_INET;
clientService.sin_addr.s_addr = inet_addr(IP); // IP
clientService.sin_port = htons(Port); // Port
iResult = connect(ConnectSocket, (SOCKADDR *)& clientService, sizeof(clientService));
if (iResult == SOCKET_ERROR)
{
cout << "Error" << endl;
return 0;
}
SendPOSTMessage(ConnectSocket);
cin.get();
cin.get();
}
For some reason, I'm not getting anything back...
I tried getting the IP of this website :
http://convert2mp3.net/index.php?p=home and using cmd, it told me it's http://87.98.245.77/ but it doesn't load in my browser... Maybe this is my error?
Thanks!
Your data is invalid. Each HTTP header needs to be terminated by \r\n, and you need another one after the last line of headers. And you're sending a Content-length of 80 but no content.
Further notes:
If send() returns a positive integer, calling WSAGetLastError() is meaningless. There has been no error, so what you will get is undefined. You should check for -1. In practice send() won't return a short count unless you're in non-blocking mode.
NULL is not a correct fourth parameter for send() or recv(), but 0 is.
There is no such thing as a 'GET response'. There are just HTTP responses.
Problem (at least one) is in this part of code:
void getGETMessage(SOCKET s)
{
char *data = new char[2000];
recv(s, data, strlen(data), NULL);
int x = 3;
}
You didn't initialized data, so, there may be any symbols, so strlen(data) may return 0, if first symbol will be '\0', or something else, depends on when '\0' will appears.
Try to change strlen to sizeof.
Something like that:
void getGETMessage(SOCKET s)
{
char data[2000] = {0};
recv(s, data, sizeof(data)-1, NULL);
cout << data << endl;
int x = 3;
}
Based on your code, you're only sending the header information. You specify a Content-Length of 80, but you're not sending any data. Therefore, the server is still waiting on the data and is not sending a response back to you yet.
My suggestion would be to utilize some HTTP library rather than attempt to perform all the low-level functionality yourself. Some suggestions are in this Stack Overflow question.

How do I get a Win7 app to communicate with a website?

I have sought in vain for a book entitled "Website Communication for Dummies". Can anyone suggest some good reading material / tutorial for me to consult?
Here is where I am at: I have a 32-bit Windows app I have written in C++ using Visual Studio 2010 C++ Express. The app facilitates User selection of an URL in text format == i.e., www.maps.google.com -- and then creates a socket and connects it, etc. The problem is that I can use the "send" command w/o error, but I have no idea what content to send in the 2nd argument, which is a const char[].
I've tried simple commands like "dump" and "refresh" for various websites, but the recv() function merely returns 0 (bytes received) after a long delay.
Thanks for attending to this.
To understand what sort of data goes back and forth between web server and a client, look at the RFC (or start with a tutoral).
When you have the understanding of the protocol and played with raw sockets, look for C or C++ implementations. libcurl would be one such. I also think Windows has build-in support for HTTP clients in Windows SDK.
You'll probably want to send something like
GET / HTTP/1.1
to get a proper http response. However most sites will disregard requests that don't include certain HTTP headers (e.g. Host). I would advise looking up http client libraries in C++ to do some of the grunt work for you, writing your own http request building code is very much reinventing the wheel.
First use send( GET / HTTP/1.1.. ) to the webserver to make a request, and after that use recv(Socket,buffer..) to download the website HTML code into a buffer.
send(Socket,"GET / HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n", strlen("GET / HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n"),0);
Winsock code:
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#pragma comment(lib,"ws2_32.lib")
using namespace std;
int main (){
WSADATA wsaData;
if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0) {
cout << "WSAStartup failed.\n";
system("pause");
return 1;
}
SOCKET Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
struct hostent *host;
host = gethostbyname("www.google.com");
SOCKADDR_IN SockAddr;
SockAddr.sin_port=htons(80);
SockAddr.sin_family=AF_INET;
SockAddr.sin_addr.s_addr = *((unsigned long*)host->h_addr);
cout << "Connecting...\n";
if(connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr)) != 0){
cout << "Could not connect";
system("pause");
return 1;
}
cout << "Connected.\n";
send(Socket,"GET / HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n", strlen("GET / HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n"),0);
char buffer[10000];
int nDataLength;
while ((nDataLength = recv(Socket,buffer,10000,0)) > 0){
int i = 0;
while (buffer[i] >= 32 || buffer[i] == '\n' || buffer[i] == '\r') {
cout << buffer[i];
i += 1;
}
}
closesocket(Socket);
WSACleanup();
system("pause");
return 0;
}

Sending POST data using winsock (and receiving them with PHP server side script)

So I thought I would play with HTTP a bit and try to send simple plain(not encoded) text from my program to a server. However, something is not right and I don't know what.
Here is the server side PHP script, I have tested it by sending POST data from HTML form and it worked just great, so I guess there isn't anything wrong on server side.
<?php
$file = 'postData.txt';
$somecontent = $_POST['dat'];
$fp = fopen($file, 'w') or die('Could not open file!');
fwrite($fp, "$somecontent") or die('Could not write to file');
fclose($fp);
?>
Here is the program(this code includes some unused parts like reading file content in buffer etc., that's cuz I am playing with it all the time and changing stuff every 5 seconds, don't mind it):
#include <windows.h>
#include "WinSock2.h"
#include <stdio.h>
#include <stdint.h>
#include <iostream>
int main()
{
WSADATA wsa;
if (WSAStartup(MAKEWORD(2, 2), &wsa) != 0)
return 0;
SOCKET fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (fd < 0) throw;
SOCKADDR_IN service;
service.sin_family = AF_INET;
service.sin_port = htons(80);
LPHOSTENT host = gethostbyname("123.17.25.123");
if (!host) throw;
service.sin_addr = *((LPIN_ADDR)*host->h_addr_list);
if (connect(fd, (SOCKADDR *)&service, sizeof(service)) < 0) throw;
FILE *f = fopen("file.txt", "rb");
if (!f) throw;
uint32_t len = 0;
fseek(f, 0x00, SEEK_END);
len = ftell(f);
fseek(f, 0x00, SEEK_SET);
char header[1024];
char *buffer = new char[len];
fread(buffer, sizeof(char), len, f);
sprintf(header,
"POST /recv.php HTTP/1.1\r\n"
"Host: 123.17.25.123\r\n"
"User-Agent: Mozilla Firefox/4.0\r\n"
"Content-Length: %d\r\n"
"Content-Type: application/x-www-form-urlencoded\r\n"
"Accept-Charset: utf-8\r\n\r\n",
len+4);
std::cout << header << std::endl;
send(fd, header, strlen(header), 0);
send(fd, "dat=", 4, 0);
send(fd, buffer, strlen(buffer), 0);
send(fd, "\r\n", 2, 0);
delete [] buffer;
fprintf(stderr, "Done\n");
closesocket(fd);
WSACleanup();
return 0;
}
So, what's wrong with it? Does anyone have any idea? :P
edit1: I monitored the traffic with wireshark and tried to run the program few times, but no packets were captured. Strange, it does not even send anything anywhere.
edit2: Thanks to TokenMacGuy got it working. Code above is lame, but it will read all file content and send it as POST data to your server, hopefully it will be useful for noobs like me to learn. Thank you once again!
You aren't recieving the data because you aren't actually sending any data. Although buffer appears in the sprintf, there's no format specifier to consume it (only the length is formatted).
Try removing the buffer from the sprintf call altogether, and then call send twice, once for the headers (as you already do) and again to send the actual data.
Or maybe you don't intend to send any data from the file you read. You just want to get that dat=somedatar. The problem is that you indicate the content-type as text/plain, in which case the server won't interpret it at all. The content type should probably be application/x-www-form-urlencoded. Since the dat parameter is part of the body, the content-length header must include it. If the content length doesn't match the actual number of bytes sent as content, conforming servers ignore the whole request (usually returning a 400-499 range response code).
POST should have two newlines before the data, not after it.
"Accept-Charset: utf-8\r\n\r\n"
"dat=somedata\r\n",

Why would a blocking socket repeatedly return 0-length data?

I'm having a significant problem using a standard BSD-style socket in a C++ program. In the code below, I connect to a local web server, send a request, and simply create a loop waiting for data to return. I actually do receive the data, but then I get an endless stream of 0-length data as if it was a non-blocking socket. The web server presumably didn't kill the connection, because if so I would have received a length of -1.
Please ignore simple typos I make below, as I'm writing the code from memory, not a direct copy/paste. The code produces the same result on OSX and Windows.
int sock = socket(AF_INET, SOCK_STREAM, 0);
//assume serv_addr has been created correctly
connect(sock, (sockaddr*)&serv_addr, sizeof(serv_addr)) < 0);
std::string header = "GET / HTTP/1.1\r\n"
"Host: 127.0.0.1:80\r\n"
"Keep-Alive: 300\r\n"
"Connection: keep-alive\r\n\r\n";
send(sock, header.c_str(), header.length()+1, 0);
for (;;) {
char buffer[1024];
int len = recv(sock, buffer, 1024, 0);
cout << len << endl;
//this outputs two numbers around 200 and 500,
//which are the header and html, and then it
//outputs and endless stream of 0's
}
From the man page of recv
For TCP sockets, the return value 0 means the peer has closed its half
side of the connection.

Programmatically reading a web page

I want to write a program in C/C++ that will dynamically read a web page and extract information from it. As an example imagine if you wanted to write an application to follow and log an ebay auction. Is there an easy way to grab the web page? A library which provides this functionality? And is there an easy way to parse the page to get the specific data?
Have a look at the cURL library:
#include <stdio.h>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, "curl.haxx.se");
res = curl_easy_perform(curl);
/* always cleanup */
curl_easy_cleanup(curl);
}
return 0;
}
BTW, if C++ is not strictly required. I encourage you to try C# or Java. It is much easier and there is a built-in way.
Windows code:
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#pragma comment(lib,"ws2_32.lib")
using namespace std;
int main (){
WSADATA wsaData;
if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0) {
cout << "WSAStartup failed.\n";
system("pause");
return 1;
}
SOCKET Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
struct hostent *host;
host = gethostbyname("www.google.com");
SOCKADDR_IN SockAddr;
SockAddr.sin_port=htons(80);
SockAddr.sin_family=AF_INET;
SockAddr.sin_addr.s_addr = *((unsigned long*)host->h_addr);
cout << "Connecting...\n";
if(connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr)) != 0){
cout << "Could not connect";
system("pause");
return 1;
}
cout << "Connected.\n";
send(Socket,"GET / HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n", strlen("GET / HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n"),0);
char buffer[10000];
int nDataLength;
while ((nDataLength = recv(Socket,buffer,10000,0)) > 0){
int i = 0;
while (buffer[i] >= 32 || buffer[i] == '\n' || buffer[i] == '\r') {
cout << buffer[i];
i += 1;
}
}
closesocket(Socket);
WSACleanup();
system("pause");
return 0;
}
There is a free TCP/IP library available for Windows that supports HTTP and HTTPS - using it is very straightforward.
Ultimate TCP/IP
CUT_HTTPClient http;
http.GET("http://folder/file.htm", "c:/tmp/process_me.htm");
You can also GET files and store them in a memory buffer (via CUT_DataSource derived classes). All the usual HTTP support is there - PUT, HEAD, etc. Support for proxy servers is a breeze, as are secure sockets.
You can do it with socket programming, but it's tricky to implement the parts of the protocol needed to reliably fetch a page. Better to use a library, like neon. This is likely to be installed in most Linux distributions. Under FreeBSD use the fetch library.
For parsing the data, because many pages don't use valid XML, you need to implement heuristics, not a real yacc-based parser. You can implement these using regular expressions or a state transition machine. As what you're trying to do involves a lot of trial-and-error you're better off using a scripting language, like Perl. Due to the high network latency you will not see any difference in performance.
You're not mentioning any platform, so I give you an answer for Win32.
One simple way to download anything from the Internet is the URLDownloadToFile with the IBindStatusCallback parameter set to NULL. To make the function more useful, the callback interface needs to be implemented.
Try using a library, like Qt, which can read data from across a network and get data out of an xml document. This is an example of how to read an xml feed. You could use the ebay feed for example.
It can be done in Multiplatform QT library:
QByteArray WebpageDownloader::downloadFromUrl(const std::string& url)
{
QNetworkAccessManager manager;
QNetworkReply *response = manager.get(QNetworkRequest(QUrl(url.c_str())));
QEventLoop event;
QObject::connect(response, &QNetworkReply::finished, &event, &QEventLoop::quit);
event.exec();
return response->readAll();
}
That data can be e.g. saved to file, or transformed to std::string:
const string webpageText = downloadFromUrl(url).toStdString();
Remember that you need to add
QT += network
to QT project configuration to compile the code.