socket, request http webpage - c++

I am fetching some websites with sockets by making a http request and reading the response header like this:
char buffer[1000];
while ((bytesReceived = tcpSocket.Receive(buffer, 1000-1)) > 0)
{
buffer[bytesReceived] = '\0';
myFile << buffer;
memset(buffer, 0, 1000);
}
This is the receive function:
int fsx::TcpSocket::Receive(char* _buffer, size_t _length)
{
int iResult = recv(this->socketHandler, _buffer, _length, 0);
if (iResult >= 0)
{
return iResult;
}
else
{
return SOCKET_ERROR;
}
}
And this part of the response im getting:
HTTP/1.1 200 OK
Date: Tue, 22 Sep 2015 10:46:10 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
Set-Cookie: __cfduid=d01e9db42c5332c444d5105c2cd9fd9e01442918769; expires=Wed, 21-Sep-16 10:46:09 GMT; path=/; domain=.stackoverflow.com; HttpOnly
Cache-Control: public, no-cache="Set-Cookie", max-age=60
Cf-Railgun: 2b57bd3274 5.38 0.314316 0030 3350
Expires: Tue, 22 Sep 2015 10:47:09 GMT
Last-Modified: Tue, 22 Sep 2015 10:46:09 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
X-Request-Guid: 9921fd42-6fd5-4a34-a839-c87d26b2f39a
Set-Cookie: prov=e6796729-38a7-4754-af17-96349ae78010; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Server: cloudflare-nginx
CF-RAY: 229d6ca79fef05b5-ARN
3b19 //<------------- WHAT THE HECK IS THIS?
<!DOCTYPE html>
<html itemscope itemtype="http://schema.org/QAPage">
<head>
As you can see, im getting this characters '3b19' at the end of the response header, what is that? Its a different set of characters every single time and I can't seem to find them at: http://www.stackoverflow.com/questions/12691882/how-to-send-and-receive-data-socket-tcp-c-c which is the page that im fetching.

It is a length of the content send used in "chunked encoding".
RFC2616 3.6.1 Chunked Transfer Coding is describing about "chunked encoding".

Related

How get google.com web page using C socket

I wrote code that should query the google.com web page and display its contents, but it doesn't work as intended.
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int main()
{
int sockfd;
struct sockaddr_in destAddr;
if((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1){
fprintf(stderr, "Error opening client socket\n");
close(sockfd);
return;
}
destAddr.sin_family = PF_INET;
destAddr.sin_port = htons(80);
destAddr.sin_addr.s_addr = inet_addr("64.233.164.94");
memset(&(destAddr.sin_zero), 0, 8);
if(connect(sockfd, (struct sockaddr *)&destAddr, sizeof(struct sockaddr)) == -1){
fprintf(stderr, "Error with client connecting to server\n");
close(sockfd);
return;
}
char *httprequest1 = "GET / HTTP/1.1\r\n"
"Host: google.com\r\n"
"\r\n";
char *httprequest2 = "GET / HTTP/1.1\r\n"
"Host: http://www.google.com/\r\n"
"\r\n";
char *httprequest3 = "GET / HTTP/1.1\r\n"
"Host: http://www.google.com/\r\n"
"Upgrade-Insecure-Requests: 1\r\n"
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\n"
"User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\r\n"
"\r\n";
char *httprequest = httprequest2;
printf("start send\n");
int send_result = send(sockfd, httprequest, strlen(httprequest), 0);
printf("send_result: %d\n", send_result);
#define bufsize 1000
char buf[bufsize + 1] = {0};
printf("start recv\n");
int bytes_readed = recv(sockfd, buf, bufsize, 0);
printf("end recv: readed %d bytes\n", bytes_readed);
buf[bufsize] = '\0';
printf("-- buf:\n");
puts(buf);
printf("--\n");
return 0;
}
If I send httprequest1, I get this output:
gcc -w -o get-google get-google.c
./get-google
start send
send_result: 36
start recv
end recv: readed 528 bytes
-- buf:
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:52:16 GMT
Expires: Sun, 09 Oct 2022 11:52:16 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
--
In httprequest2, I specified the parameter Host: and I got the following this output:
gcc -w -o get-google get-google.c
./get-google
start send
send_result: 48
start recv
end recv: readed 198 bytes
-- buf:
HTTP/1.1 400 Bad Request
Content-Length: 54
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:53:19 GMT
Connection: close
<html><title>Error 400 (Bad Request)!!1</title></html>
--
Then I try copy headers from browser and after httprequest3 I got same result as for httprequest2.
How can I get the full page?
It should be Host: www.google.com and not Host: http://www.google.com/
However, it might not give you the home page. Google wants you to use HTTPS, so it'll probably redirect you to https://www.google.com/ and you won't be able to implement HTTPS fully yourself (you'll have to use a library like OpenSSL)

Libcurl HTTP GET request using a query string

I'm struggeling with doing a GET request passing a query string. Everytime I run it I get 410 gone response, I checked if the link got deleted but it's still accessible.
My code:
CURLUcode rc;
CURLU* url = curl_url();
CURL* handle = curl_easy_init();
CURLcode res;
struct curl_slist* list = NULL;
if (handle) {
rc = curl_url_set(url, CURLUPART_HOST, "www.example.com", 0);
rc = curl_url_set(url, CURLUPART_QUERY,"b=sashio", CURLU_APPENDQUERY);
rc = curl_url_set(url, CURLUPART_QUERY, "id=me_ZwNjBQRlZGL0AwR0ZQNjAQR1AQZ3At==", CURLU_APPENDQUERY);
rc = curl_url_set(url, CURLUPART_SCHEME, "http", 0);
rc = curl_url_set(url, CURLUPART_PATH, "/en/m.php?", 0);
curl_easy_setopt(handle, CURLOPT_CURLU, url);
list = curl_slist_append(list, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
curl_easy_setopt(handle, CURLOPT_HTTPHEADER, list);
curl_easy_setopt(handle, CURLOPT_HTTPGET, 1L);
curl_easy_setopt(handle, CURLOPT_VERBOSE, 1L);
curl_easy_setopt(handle, CURLOPT_ENCODING, "");
}
res = curl_easy_perform(handle);
Verbose output :
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 410 Gone
< Date: Wed, 12 Aug 2020 19:41:23 GMT
< Server: Apache
< X-UA-Compatible: IE=EmulateIE9
< Last-Modified: Wed, 12 Aug 2020 19:41:23 GMT
< Cache-Control: private, must-revalidate, max-age=0
< Etag: "1804121564-00010031-BZGZ0AGt5BQt2AGD1ZQZ1AmV"
< Set-Cookie: PHPSESSID=tp3mbkk7148cm989areejhhd90; path=/
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Pragma: no-cache
< Content-Length: 192
< Connection: close
< Content-Type: text/html
<
* Closing connection 0
Thanks for everyone specially #AndreasWenzel. The browser was sending a cookie to the server like AndreasWenzel said. So in the headers I added the cookie, now I receive a correct response.
Have a nice day!

Discord Webhooks With C++

So I've taken up the challenge of doing webhooks in c++ and I wanted to just get some help with the post requests. Here is the code I have at the moment, I wanted to send embeds through post requests in C++.
Here is my code along with errors and all, the webhook is still active if you want to test yourself. Am trying to keep it all using windows librarys on purpose.
#include <winsock2.h>
#include <ws2tcpip.h>
#include <string>
#pragma warning(disable:4996)
#pragma comment(lib, "ws2_32.lib")
using namespace std;
int Plug(string address, string port, SOCKET* csock) {
WSADATA WSAData;
WSAStartup(MAKEWORD(2, 0), &WSAData);
PADDRINFOA result;
ADDRINFOA hints;
ZeroMemory(&hints, sizeof(ADDRINFO));
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;
hints.ai_protocol = IPPROTO_TCP;
int res = getaddrinfo(address.c_str(), port.c_str(), &hints, &result);
*csock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (SOCKET_ERROR == connect(*csock, (SOCKADDR*)result->ai_addr, sizeof(SOCKADDR))) return WSAGetLastError();
return 0;
}
void Unplug(SOCKET* csock) {
closesocket(*csock);
WSACleanup();
}
string PostRequest(string host, string query, string data) {
string req = "POST " + query + " HTTP/1.1" + "\r\n";
req += "Host: " + host + "\r\n";
req += "Content-Type: application/x-www-form-urlencoded\r\n";
req += "Content-Length: " + to_string(data.length()) + "\r\n";
req += "Connection: Close";
req += "\r\n\r\n" + data + "\r\n\r\n";
SOCKET s;;
Plug(host, "80", &s);
Send(&s, req);
while (GetAvailable(&s) == 0) Sleep(10);
string result = Receive(&s);
//if (result.find("\r\n\r\n") == string::npos && DEBUG) return result;
//if (result.find("\r\n\r\n") == string::npos) return "PR_INVALID_RESPONSE";
//result = result.substr(result.find("\r\n\r\n") + 4, string::npos);
Unplug(&s);
return result;
}
int main() {
//https://discordapp.com/api/webhooks/705211476553629747/zwzqZMnTTgLtHBm3kc_DvvD71IW9FfE4ur-PQlkgeZhd56cT7UjSJCWI-V8wPiEUWV2w
std::cout<<PostRequest("162.159.129.233","/api/webhooks/705249405141516360/s8ioXr6IZEeuPnMi1O37CmY3o5pUZcu6ho7aIJdieSAqgGyTXOZjkOZdMNe1uJre6dto","{'embeds': [{'description': '**ERROR**: `TESTSTRING`\n', 'color': 4508791}]}");
}
However when I make the post requests it is unable to send my content and instead returns 400. I've tried a few more things but Im wondering if its my query?
HTTP/1.1 301 Moved Permanently
Cache-Control: max-age=3600
Cf-Ray: 58bd304dec1ff2c0-WAW
Cf-Request-Id: 026a1c84ad0000f2c054113200000001
Date: Thu, 30 Apr 2020 00:36:28 GMT
Expires: Thu, 30 Apr 2020 01:36:28 GMT
Location: MYWEBHOOKURL
Server: cloudflare
Set-Cookie: __cfruid=5adc23f36cce3f0a398b8f9e91429b4349ef9314-1588206988; path=/; domain=.discordapp.com; HttpOnly
Vary: Accept-Encoding
Content-Length: 0
Connection: close
And lets say I use discords actual ip instead of resolving it I end up getting
this instead
HTTP/1.1 403 Forbidden
Cache-Control: max-age=15
Cf-Ray: 58bd37782e6dffbc-WAW
Cf-Request-Id: 026a20ff1a0000ffbcc89df200000001
Content-Type: text/plain; charset=UTF-8
Date: Thu, 30 Apr 2020 00:41:21 GMT
Expires: Thu, 30 Apr 2020 00:41:36 GMT
Server: cloudflare
Set-Cookie: __cfduid=d4cb17401215362929759e2da8110f0e31588207281; expires=Sat, 30-May-20 00:41:21 GMT; path=/; domain=.162.159.129.233; HttpOnly; SameSite=Lax
Vary: Accept-Encoding
Content-Length: 16
Connection: close
error code: 1003
If you have any ideas dont hesitate to put them down below, ive been lost on this for a long time.
You could just use D++ to post to webhooks:
https://dpp.dev/classdpp_1_1webhook.html
In fact you could do everything with this lib instead of rolling your own solution.

Extract cookies from an http header

I'm writing a C++ function that's going to extract cookies from an http header. The header is inside a string and it looks like this:
HTTP/1.1 200 OK
cache-control: no-cache, no-store, max-age=0, must-revalidate
content-language: en
content-length: 3202
content-type: text/html; charset=utf-8
date: Fri, 25 Apr 2014 13:31:44 GMT
etag: "46ec0cd3920851f7b63dbaa70280cd32"
expires: Mon, 01 Jan 1990 00:00:00 GMT
pragma: no-cache
server: tfe
set-cookie: d=32; path=/; expires=Sat, 25-Apr-2015 13:31:44 GMT
set-cookie: req_country=United+Kingdom; path=/; expires=Sun, 25-May-2014 13:31:44 GMT
I need this function to find the cookies:
set-cookie: d=32; path=/; expires=Sat, 25-Apr-2015 13:31:44 GMT
set-cookie: req_country=United+Kingdom; path=/; expires=Sun, 25-May-2014 13:31:44 GMT
and put them in to another string that's going to look lie this:
d=32; req_country=United+Kingdom;
There can also be more than 2 cookies in each header.
I've tried:
size_t p1 = header_data.find("set-cookie:");
size_t p2 = header_data.find(";");
std::string head = header_data.substr(p1,p2-p1);
and after execution it gave me the following error:
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
Aborted (core dumped)
Try this code. Not optimized, but i guess it works in way you want:
#include <iostream>
#include <vector>
#include <sstream>
using namespace std;
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}
int main() {
std::string header =
"HTTP/1.1 200 OK\n"
"cache-control: no-cache, no-store, max-age=0, must-revalidate\n"
"content-language: en\n"
"content-length: 3202\n"
"content-type: text/html; charset=utf-8\n"
"date: Fri, 25 Apr 2014 13:31:44 GMT\n"
"etag: \"46ec0cd3920851f7b63dbaa70280cd32\"\n"
"expires: Mon, 01 Jan 1990 00:00:00 GMT\n"
"pragma: no-cache\n"
"server: tfe\n"
"set-cookie: d=32; path=/; expires=Sat, 25-Apr-2015 13:31:44 GMT\n"
"set-cookie: req_country=United+Kingdom; path=/; expires=Sun, 25-May-2014 13:31:44 GMT\";\n";
vector<string> headerLines = split(header, '\n');
for (int i(0); i != headerLines.size(); ++i) {
if (headerLines[i].find("set-cookie:") != std::string::npos) {
std::string variablesPart = split(split(headerLines[i], ';')[0], ':')[1];
std::cout << "\nExtracted: {" << variablesPart << "}";
}
}
}

Why is my program reading extra bytes in HTTP/1.1

I'm experimenting with sockets and trying to build a very simple webbot.
This is my code:
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <errno.h>
#include <cstring>
#include <iostream>
#include <string>
#define HTTP_PORT "80"
#define HOST "www.taringa.net"
#define PORT HTTP_PORT
#define IN_BUFFSIZE 1024
#define OUT_BUFFSIZE 1024
#define REQUEST "GET /Taringa/posts HTTP/1.0\r\nHost: www.taringa.net\r\nUser-Agent: foo\r\n\r\n"
using namespace std;
int main(int argc, char **argv) {
struct addrinfo hints, *res;
struct sockaddr_in servAddress;
int sockfd;
char addrstr[100];
char buff_msg_out[OUT_BUFFSIZE], buff_msg_in[IN_BUFFSIZE];
memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
if ( getaddrinfo(HOST, HTTP_PORT, &hints, &res) != 0) {
cerr << "Error en getaddrinfo" << endl;
return -1;
}
// Crear socket
if ( ( sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol) ) < 0 )
{
cerr << "Error en socket()" << endl;
return -1;
}
// Iniciar conexion
if ( connect(sockfd, res->ai_addr, res->ai_addrlen) == -1 )
{
cerr << "Error en connect()" << endl;
cerr << "Error: " << strerror(errno) << endl;
return -1;
}
cout << "Conectado con éxito" << endl;
// Enviar datos
strncpy(buff_msg_out, REQUEST, strlen(REQUEST));
if ( send(sockfd, buff_msg_out, strlen(buff_msg_out), 0) <= 0 )
{
cerr << "Error en write()" << endl;
return -1;
}
cout << "Mensaje enviado:" << endl << buff_msg_out << endl << endl;
int bytes_recv = 0;
while ( ( bytes_recv = recv(sockfd, buff_msg_in, IN_BUFFSIZE-1, 0)) > 0 )
{
buff_msg_in[bytes_recv] = '\0';
cout << buff_msg_in << endl;
}
freeaddrinfo(res);
close(sockfd);
return 0;
}
This is the output when the request has HTTP/1.0
HTTP/1.1 200 OK
Server: n0
Date: Fri, 01 Feb 2013 19:57:26 GMT
Content-Type: text/html; charset=utf8
Connection: close
Set-Cookie: trngssn=06359673; path=/
Set-Cookie: trngssn=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: taringa_user_id=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: lastNick=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: fbs=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: tws=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: iB-friendfind=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="es" xml:lang="es" >
<head profile="http://purl.org/NET/erdf/profile" prefix="og: http://ogp.me/ns
# fb: http://ogp.me/ns/fb# article: http://ogp.me/ns/article#">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv=”X-Frame-Options” content=”Deny” />
<link rel="alternate" type="application/atom+xml" title="Últimos Posts de Taringa" href="/rss/Taringa/posts/" />
<link rel="alternate" type="application/atom+xml" title="Últimos Temas de Taringa" href="/rss/Taringa/tem
as/" />
<title>Posts de Taringa! - Taringa!</title>
...
</body>
</html>
But when I specify HTTP/1.1, the reply is
HTTP/1.1 200 OK
Server: n0
Date: Fri, 01 Feb 2013 20:03:54 GMT
Content-Type: text/html; charset=utf8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: trngssn=81047255; path=/
Set-Cookie: trngssn=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: taringa_user_id=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: lastNick=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: fbs=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: tws=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
Set-Cookie: iB-friendfind=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.taringa.net
d0f
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="es" xml:lang="es" >
<head profile="http://purl.org/NET/erdf
/profile" prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# article: http://ogp.me/ns/article#">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv=”X-Frame-Options” content=”Deny” />
<link rel="alternate" type="application/atom+xml" title="Últimos Posts de Taringa" href="/rss/Taringa/posts/" />
<link rel="alternate" type="application/atom+xml" title="Últimos Te
mas de Taringa" href="/rss/Taringa/temas/" />
<title>Posts de Taringa! - Taringa!</title>
...
and here is the problem
</body>
</html>
0
and it waits a few seconds before closing the communication.
I just tried with stackoverflow.com/about and it works fine. Except for the following text that the server sends me after the webpage
</html>HTTP/1.0 400 Bad request
Cache-Control: no-cache
Connection: close
Content-Type: text/html
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>
Am I missing something?
The server is using chunked encoding. The d0f is the length of the "chunk" in octets expressed as hex. The 0 is the length of the next chunk (i.e. there is none).
Server: n0
Date: Fri, 01 Feb 2013 20:03:54 GMT
Content-Type: text/html; charset=utf8
Transfer-Encoding: chunked
Connection: keep-alive
Don't say you support HTTP 1.1 if you don't.
All HTTP/1.1 applications MUST be able to receive and decode the "chunked" transfer-coding, and MUST ignore chunk-extension extensions they do not understand. -- HTTP 1.1