How can we parse HTTP response header fields using Qt/C++? - c++

I am writing a piece of software that uses Qt/KDE libs. The objective is to parse the HTTP header response fields into different fields of a struct. So far the HTTP header response is contained in a QString.
It looks something like this:
"HTTP/1.1 302 Found
date: Tue, 05 Jun 2012 07:40:16 GMT
server: Apache/2.2.22 (Linux/SUSE)
x-prefix: 49.244.80.0/21
x-as: 23752
x-mirrorbrain-mirror: mirror.averse.net
x-mirrorbrain-realm: region
link: <http://download.services.openoffice.org/files/du.list.meta4>; rel=describedby; type="application/metalink4+xml"
link: <http://download.services.openoffice.org/files/du.list.torrent>; rel=describedby; type="application/x-bittorrent"
link: <http://mirror.averse.net/openoffice/du.list>; rel=duplicate; pri=1; geo=sg
link: <http://ftp.isu.edu.tw/pub/OpenOffice/du.list>; rel=duplicate; pri=2; geo=tw
link: <http://ftp.twaren.net/OpenOffice/du.list>; rel=duplicate; pri=3; geo=tw
link: <http://mirror.yongbok.net/openoffice/du.list>; rel=duplicate; pri=4; geo=kr
link: <http://ftp.kaist.ac.kr/openoffice/du.list>; rel=duplicate; pri=5; geo=kr
digest: MD5=b+zfBEizuD8eXZUTWJ47xg==
digest: SHA=A5zw6PkywlhiPlFfjca+gqIGLHA=
digest: SHA-256=HOrd0MMBzS8Ctljpe4PauwStijsnBKaa3gXO4L30eiA=
location: http://mirror.averse.net/openoffice/du.list
content-length: 329
connection: close
content-type: text/html; charset=iso-8859-1"
In addition to the custom fields there might be few more fields in the header response.
The only possible way that I came up was to manually search for the fields like "link", "digest" and others and create a QMap with the fields as keys.However, I guess there must be a better way to do this. I would be thankful to you if you could help me.

The HTTP header should initially be in a QByteArray (because it is in ASCII, not UTF-16), but the method would be the same with a QString:
split the header line by line,
split each line at the colon character,
trim any white spaces (regular spaces and '\r' characters) around the 2 resulting strings before storing them.
QByteArray httpHeaders = ...;
QMap<QByteArray, QByteArray> headers;
// Discard the first line
httpHeaders = httpHeaders.mid(httpHeaders.indexOf('\n') + 1).trimmed();
foreach(QByteArray line, httpHeaders.split('\n')) {
int colon = line.indexOf(':');
QByteArray headerName = line.left(colon).trimmed();
QByteArray headerValue = line.mid(colon + 1).trimmed();
headers.insertMulti(headerName, headerValue);
}

Related

Why might mutt email be accepted/rejected by windows recipient as a function of alphabetic string content in the body of html file being sent?

Calling mutt-1.5.24 on linux.
I'm seeing some very odd behavior when emailing an html file from linux to windows/outlook using mutt on linux. Example of the mutt call...
mutt -e 'set content_type=text/html' -s 'yuk, yuk, yuk' 'moe.howard#stooge.com' < a.html
The email does not show up on the windows side. mutt returned no error or warning on the linux side. Now, here's the odd part... If I global/replace the string "pcie" in the body of the html to "pcix", the email appears on the windows/outlook side just fine. OR... if I global/replace "ity" to "..." it also works fine (even if I leave "pcie" alone). But changing "ity" to "xxx" fails. Very odd character sensitivity behavior like this.
In my home dir on the linux side I see a file ~/sent getting created. The header (whether the email made it to the windows/outlook side or not) looks like...
From m.howard#theserver.stooge.com Thu Jan 28 18:49:29 2021
Date: Thu, 28 Jan 2021 18:49:29 -0500
From: Moe Howard <mhoward#theserver.stooge.com>
To: moe.howard#stooge.com
Subject: yuk, yuk, yuk
Message-ID: <20210128234929.GA48266#atletx7-reg062.amd.com>
MIME-Version: 1.0
Content-Type: text/html; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.24 (2015-08-30)
Status: RO
Content-Length: 20537
Lines: 122
<html>
....etc... the rest of the html which firefox reads just fine if I get rid of the header above
Grasping at straws. Looking at the "charset=us-ascii" in the "sent" file thinking it should be something else ? So I tried providing other options by adding "-e 'set assumed_charset=utf-8:us-ascii'" to the command. No luck.
Any ideas what might be happening and what a solution might be ?
Figured it out. All my email actually arrived in Outlook. It's just that it got sent to the junk folder, labeled as spam. So if the body of the html contains "pcie", it's spam. But "pcix" is not. Got to go undo that now.

Remove HTTP headers from Prometheus in Zabbix

I have a server that has a Nginx VTS module installed on it, which outputs metrics in prometheus format.
When I try to actively check web.page.get via Zabbix I get the HTTP header and then the data in the format below:
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 24 Sep 2020 09:16:20 GMT
Content-Type: text/plain
Content-Length: 33769
Connection: close
Vary: Accept-Encoding
# HELP nginx_vts_info Nginx info
# TYPE nginx_vts_info gauge
nginx_vts_info{hostname="example",version="1.18.0"} 1
# HELP nginx_vts_start_time_seconds Nginx start time
# TYPE nginx_vts_start_time_seconds gauge
nginx_vts_start_time_seconds 1600367492.145
# snip output...
I wrote a regular expression that removes the header but only outputs the first line:
# \n\s?\n(.*)
# HELP nginx_vts_info Nginx info
How do I rewrite the expression so that the header is removed and the rest of the data is available?
Please try below regex
\n\s?\n([\s\S]*)
in regex . wont check newlines unless specific flags set. hence in your example, only the first line was returned. so rewriting it to include newlines as well will help.

Filter data received from HTTP request

So in my c++ program i create a socket then download file from my web server (website) which retrieves a http response (using recv) then store response in char buffer, in this question let's say the buffer identifier is something like httpbufff, if i take httpbufff and cout it, it will look like this:
HTTP/1.1 200 OK
Date: Tue, 12 Jul 2016 08:52:15 GMT
Server: Apache/2.4.16
X-Powered-By: PHP/5.4.45
Vary: Accept-Encoding,User-Agent
Connection: close
Content-Type: text/html
Website content test
My question is, is there a way to extract "Website content test" from my char buffer httpbufff?
I was thinking about using strtok but that doesn't seem like a good solution in my opinion.
thanks!

Jmeter-Regular expression extractor

My Jmeter response returns me 'Location' in the response header.I want to fetch this Location header and use it on my other requests.
Sample Start: 2015-07-24 14:46:38 CEST
Load time: 163
Latency: 163
Size in bytes: 372
Headers size in bytes: 350
Body size in bytes: 22
Sample Count: 1
Error Count: 1
Response code: 201
Response message: Processed
Response headers:
HTTP/1.1 201 Processed
X-Backside-Transport: OK OK,FAIL FAIL
Connection: Keep-Alive
Transfer-Encoding: chunked
****Location: /retail/iows/ie/en/storage/servicedocs/paxplanner/2015-07-24/eCommerce.pdf****
X-Client-IP: 127.0.0.1,10.62.26.150
Content-Type: application/octet-stream
Date: Fri, 24 Jul 2015 12:46:38 GMT
X-Archived-Client-IP: 127.0.0.1
Steps I followed:
I have used Regular expression extractor.
Enabled response header radio button with the whole location header.
Please help me to sort it out.
If you want to retrieve the Location field's value from the request's response, you might want to try the following pattern: Location:([^\r?\n]+), the first matching group will contain the value of the Location field.
Above expression is based in the following rules:
HTTP header fields are colon (":") separated <key, value> pairs.
HTTP header fields are terminated by the EOL char combination (CR and LF)
Please try this..
Location:([\s\S]*)X-Client
If it doesn't work then try to use a \ before - in X-Client (escaping -)

Separating HTTP Response Body from Header in C++

I'm currently writing my own C++ HTTP class for a certain project. And I'm trying to find a way to separate the response body from the header, because that's the only part I need to return.
Here's a sample of the raw http headers if you're not familiar with it:
HTTP/1.1 200 OK
Server: nginx/0.7.65
Date: Wed, 29 Dec 2010 06:13:07 GMT
Content-Type: text
Connection: keep-alive
Vary: Cookie
Content-Length: 82
Below that is the HTML/Response body. What would be the best way to do this? I'm only using Winsock library for the requests by the way (I don't even think this matters).
Thanks in advance.
HTTP headers are terminated by the sequence \r\n\r\n (a blank line). Just search for that, and return everything after. (It may not exist of course, e.g. if it was in response to a HEAD request.)
Do you need to roll your own? There are C/C++ libraries out there for doing HTTP, e.g. libcurl. If you need to support the full gamut of HTTP, then it's not always a simple delineation. You might also have to cater, for example, for chunked encoding.
DO IF Socket.IsServerReady(Sock) THEN Text = text + Socket.Read(Sock, 65000) 'print text '' 32000 bytes... whatever they give us Bytes = bytes + Socket.Transferred StatusBar.Panel(0).Caption = "Bytes Read: " + STR$(Bytes)
END IF
'RichEdit.addstrings text zzz=Bytes LOOP UNTIL Socket.Transferred = 0 RichEdit.Clear RichEdit.Text = text Socket.Close(Sock) dim mem as qmemorystream dim S$ as string S$ = text for n=0 to 400 buff$=mid$(S$,n,5)
if buff$="alive" then' found end of headers richedit1.addstrings (buff$) richedit1.addstrings (mid$(S$,n,9)) richedit1.addstrings str$(n+9) zzz=n+8'offset + 8 bit space after headers and before Bof end if next n Mem.WriteStr(S$, LEN(S$))'write entire file to memory Mem.Position = zzz ' use offset as Start position S$ = Mem.ReadStr(LEN(S$)) ' read rest of file into string till Eof Mem.Close' dont forget to close 'PRINT S$ '' print it
Filex.Open("c:/CAP.AVI", fmCreate)'create file on system filex.WriteBinStr(S$,len(S$)-zzz)' write to it filex.close 'dont forget to close