How to parse/check an HTTP message in PCapPlusPlus? - c++

In PCap++, I want to detect if a payload is an HTTP request or not. For this, I am trying to parse the string and expect the library to allows me to check if this was done successfully.
Unfortunately, I was unable to achieve this:
I can create a RawPacket with the message
I can create a Packet with the message, but it does not contains any HttpRequestLayer, in consequence, the parsing is useless to detect the validity of the message.
I cannot create an HttpRequestLayer directly from the message.
Some examples:
std::string msg= "GET /index.html HTTP/1.1\nHost: example.com\n\n";
// Try to get a RawPacket: works, but does not helps a lot
struct timeval tp; // requires <time.h>
gettimeofday(&tp, nullptr);
RawPacket rp(static_cast<const uint8_t*>(msg.data()), static_cast<int>(msg.size()), tp, false);
// Trying to parse it: works but detect generic Newtork layer only, no HTTP
Packet p(&rp, false, HTTP);
// Trying to create an HttpRequestLayer directly: crash
HttpRequestLayer http(static_cast<const uint8_t*>(msg.data()), static_cast<int>(msg.size()), nullptr, nullptr);
My question is:
How to detect if a message is a valid HTTP message with PCap++?
Note: I am looking for an efficient solution (very sub-optimal solutions, like generating TCP layers is not an option).

PcapPlusPlus can parse packets, not messages. A RawPacket object expects a stream of bytes that represent a network packet, typically with a data link layer (e.g Ethernet), network layer (e.g IP), transport layer (e.g TCP) and application layer (HTTP in this case). PcapPlusPlus will parse this byte stream into the a list of layers/protocols you can look into.
HTTP is an application protocol, hence any HTTP packet will contain the other layers mentioned above. So providing just the HTTP message is not enough and PcapPlusPlus won't be able to parse it as a packet.
You can learn more about PcapPlusPlus from the tutorials: https://pcapplusplus.github.io/docs/tutorials
Specifically you can look into the packet parsing tutorial:
https://pcapplusplus.github.io/docs/tutorials/packet-parsing

pcpp::Packet has a method of getting layer you need - getLayerOfType. You could detect HTTP message using it.
Example:
timeval tm;
gettimeofday(&tm, NULL);
pcpp::RawPacket rawPacket((uint8_t*)rawPacketFromNet.data(), rawPacketFromNet.size(), tm, false, pcpp::LinkLayerType::LINKTYPE_RAW);
pcpp::Packet parsedPacket(&rawPacket);
pcpp::HttpRequestLayer* httpLayer = parsedPacket.getLayerOfType<pcpp::HttpRequestLayer>();
if (httpLayer)
{
// you have this layer in your packet
uint8_t* dataPtr = httpLayer->getData();
size_t size = httpLayer->getDataLen();
}
I think your example could have worked if you'd start with pcpp::Packet and then add to it http layer. For constructing http layer in your case try to use HttpRequestLayer(HttpMethod method, std::string uri, HttpVersion version);

Related

How to send custom packets in omnet++?

Let's say i created my own packet called myPacket. Is there a way i can send it using socket.sendTo()?
I know socket.sendTo() takes in an INET packet so is there a way to convert myPacket into an INET packet?
The module that is going to receive the packet is Radio. I checked Radio's functions and they take in an inet packet so what can i do about it?
Signal *Radio::createSignal(Packet *packet) const
{
encapsulate(packet);
if (sendRawBytes) {
auto rawPacket = new Packet(packet->getName(), packet->peekAllAsBytes());
rawPacket->copyTags(*packet);
delete packet;
packet = rawPacket;
}
Signal *signal = check_and_cast<Signal *>(medium->transmitPacket(this, packet));
ASSERT(signal->getDuration() != 0);
return signal;
}
Sending messages using sockets needs a socket on the other side. If you have a socket at the other side so go ahead and send your message using a socket.
Basically, messages sent by using the cSimpleModule basic member function send(). This method is used to send messages to other modules through gates. One can also use the scheduleAt() to send a message at specific point in time.
If you use a higher level application such as http or tcp applications, so you are most probably going to use sockets. Sockets also use send() and scheduleAt() to send messages through gates.
You need to do 4 steps:
Define your own .msg class and extend some of the inet predefined classes. See inet/applications/base/ApplicationPacket.msg as an example.
Define your communication protocol aka inet socket object to pass the messages. Look at this guide. Don't forget to pass destination address and port, normally they are defined as .NED parameters and injected through the omnetpp.ini file.
Then you need to write a method, which builds your packet and sends it to the destination address. Take a look at the method UdpBasicApp::sendPacket() at inet/applications/udpapp/UdpBasicApp.cc as an example.
At receiver side I usually have a bunch of processing methods switched in handleMessage method or similar one to catch all possible messages my receiver can receive and process. All such methods take cMessage* msg as an argument and then at the beginning:
Packet* packet = check_and_cast<Packet*>(msg);
if (!packet) {
return;
}
const auto& payload = packet->peekAtFront<YourOwnPacketClass>();
// work with you message body...

How can I convert serialized data in boost::beast to a string so that I could process it in a FIFO manner?

I have an application of a client where I need to receive http "long running requests" from a server. I send a command, and after getting the header of the response, I have to just receive json data separated by \r\n until the connection is terminated.
I managed to adapt boost beast client example to send the message and receive the header and parse it and receive responses from the server. However, I failed at finding a way to serialize the data so that I could process the json messages.
The closest demonstration of the problem can be found in this relay example. In that example (p is a parser, sr is a serializer, input is a socket input stream and output is an socket output stream), after reading the http header, we have a loop that reads continuously from the server:
do
{
if(! p.is_done())
{
// Set up the body for writing into our small buffer
p.get().body().data = buf;
p.get().body().size = sizeof(buf);
// Read as much as we can
read(input, buffer, p, ec);
// This error is returned when buffer_body uses up the buffer
if(ec == error::need_buffer)
ec = {};
if(ec)
return;
// Set up the body for reading.
// This is how much was parsed:
p.get().body().size = sizeof(buf) - p.get().body().size;
p.get().body().data = buf;
p.get().body().more = ! p.is_done();
}
else
{
p.get().body().data = nullptr;
p.get().body().size = 0;
}
// Write everything in the buffer (which might be empty)
write(output, sr, ec);
// This error is returned when buffer_body uses up the buffer
if(ec == error::need_buffer)
ec = {};
if(ec)
return;
}
while(! p.is_done() && ! sr.is_done());
A few things I don't understand here:
We're done reading the header. Why do we need boost beast and not boost asio to read a raw tcp message? When I tried to do that (with both async_read/async_read_some) I got an infinite reads of zero size.
The documentation of parser says (at the end of the page) that a new instance is needed for every message, but I don't see that in the example.
Since tcp message reading is not working, is there a way to convert the parser/serializer data to some kind of string? Even write it to a text file in a FIFO manner, so that I could process it with some json library? I don't want to use another socket like the example.
The function boost::beast::buffers() failed to compile for the parser and the serializer, and for the parser there's no consume function, and the serializer's consume seems to be for particular http parts of the message, which fires an assert if I do it for body().
Besides that, I also failed at getting consistent chunks of data from the parser and the buffer with old-school std::copy. I don't seem to understand how to combine the data together to get the stream of data. Consuming the buffer with .consume() at any point while receiving data leads to need buffer error.
I would really appreciate someone explaining the logic of how all this should work together.
Where is buf? You could read directly into the std::string instead. Call string.resize(N), and set the pointer and size in the buffer_body::value_type to string.data() and string.size().

How can a web server know when an HTTP request is fully received?

I'm currently writing a very simple web server to learn more about low level socket programming. More specifically, I'm using C++ as my main language and I am trying to encapsulate the low level C system calls inside C++ classes with a more high level API.
I have written a Socket class that manages a socket file descriptor and handles opening and closing using RAII. This class also exposes the standard socket operations for a connection oriented socket (TCP) such as bind, listen, accept, connect etc.
After reading the man pages for the send and recv system calls I realized that I needed to call these functions inside some form of loop in order to guarantee that all bytes are successfully sent/received.
My API for sending and receiving looks similar to this
void SendBytes(const std::vector<std::uint8_t>& bytes) const;
void SendStr(const std::string& str) const;
std::vector<std::uint8_t> ReceiveBytes() const;
std::string ReceiveStr() const;
For the send functionality I decided to use a blocking send call inside a loop such as this (it is an internal helper function that works for both std::string and std::vector).
template<typename T>
void Send(const int fd, const T& bytes)
{
using ValueType = typename T::value_type;
using SizeType = typename T::size_type;
const ValueType *const data{bytes.data()};
SizeType bytesToSend{bytes.size()};
SizeType bytesSent{0};
while (bytesToSend > 0)
{
const ValueType *const buf{data + bytesSent};
const ssize_t retVal{send(fd, buf, bytesToSend, 0)};
if (retVal < 0)
{
throw ch::NetworkError{"Failed to send."};
}
const SizeType sent{static_cast<SizeType>(retVal)};
bytesSent += sent;
bytesToSend -= sent;
}
}
This seems to work fine and guarantees that all bytes are sent once the member function returns without throwing an exception.
However, I started running into problems when I began implementing the receive functionality. For my first attempt I used a blocking recv call inside a loop and exited the loop if recv returned 0 indicating that the underlying TCP connection was closed.
template<typename T>
T Receive(const int fd)
{
using SizeType = typename T::size_type;
using ValueType = typename T::value_type;
T result;
const SizeType bufSize{1024};
ValueType buf[bufSize];
while (true)
{
const ssize_t retVal{recv(fd, buf, bufSize, 0)};
if (retVal < 0)
{
throw ch::NetworkError{"Failed to receive."};
}
if (retVal == 0)
{
break; /* Connection is closed. */
}
const SizeType offset{static_cast<SizeType>(retVal)};
result.insert(std::end(result), buf, buf + offset);
}
return result;
}
This works fine as long as the connection is closed by the sender after all bytes have been sent. However, this is not the case when using e.g. Chrome to request a webpage. The connection is kept open and my receive member function is stuck blocked on the recv system call after receiving all bytes in the request. I managed to get around this problem by setting a timeout on the recv call using setsockopt. Basically, I return all bytes received so far once the timeout expires. This feels like a very inelegant solution and I do not think that this is the way web servers handles this issue in reality.
So, on to my question.
How does a web server know when an HTTP request have been fully received?
A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.
HTTP/1.1 is a text-based protocol, with binary POST data added in a somewhat hacky way. When writing a "receive loop" for HTTP, you cannot completely separate the data receiving part from the HTTP parsing part. This is because in HTTP, certain characters have special meaning. In particular, the CRLF (0x0D 0x0A) token is used to separate headers, but also to end the request using two CRLF tokens one after the other.
So to stop receiving, you need to keep receiving data until one of the following happens:
Timeout – follow by sending a timeout response
Two CRLF in the request – follow by parsing the request, then respond as needed (parsed correctly? request makes sense? send data?)
Too much data – certain HTTP exploits aim to exhaust server resources like memory or processes (see e.g. slow loris)
And perhaps other edge cases. Also note that this only applies to requests without a body. For POST requests, you first wait for two CRLF tokens, then read Content-Length bytes in addition. And this is even more complicated when the client is using multipart encoding.
A request header is terminated by an empty line (two CRLFs with nothing between them).
So, when the server has received a request header, and then receives an empty line, and if the request was a GET (which has no payload), it knows the request is complete and can move on to dealing with forming a response. In other cases, it can move on to reading Content-Length worth of payload and act accordingly.
This is a reliable, well-defined property of the syntax.
No Content-Length is required or useful for a GET: the content is always zero-length. A hypothetical Header-Length is more like what you're asking about, but you'd have to parse the header first in order to find it, so it does not exist and we use this property of the syntax instead. As a result of this, though, you may consider adding an artificial timeout and maximum buffer size, on top of your normal parsing, to protect yourself from the occasional maliciously slow or long request.
The solution is within your link
A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.
There it says:
It must use CRLF line endings, and it must end in \r\n\r\n
The answer is formally defined in the HTTP protocol specifications 1:
in W3C's spec for HTTP 0.9.
in RFC 1945 for HTTP 1.0, specifically in Section 4: HTTP Message, Section 5: Request, and Section 7: Entity.
in RFC 2616 for HTTP 1.1, specifically in Section 4: HTTP Message, particular in 4.3: Message Body and 4.4: Message Length.
in RFC 7230 (and 7231...7235) for HTTP 1.1, specifically in Section 3: Message Format, in particular 3.3: Message Body.
So, to summarize, the server first reads the message's initial start-line to determine the request type. If the HTTP version is 0.9, the request is done, as the only supported request is GET without any headers. Otherwise, the server then reads the message's message-headers until a terminating CRLF is reached. Then, only if the request type has a defined message body then the server reads the body according to the transfer format outlined by the request headers (requests and responses are not restricted to using a Content-Length header in HTTP 1.1).
In the case of a GET request, there is no message body defined, so the message ends after the start-line in HTTP 0.9, and after the terminating CRLF of the message-headers in HTTP 1.0 and 1.1.
1: I'm not going to get into HTTP 2.0, which is a whole different ballgame.

Serialize and deserialize the message using google protobuf in socket programming in C++

Message format to send to server side as below :
package test;
message Test {
required int32 id = 1;
required string name = 2;
}
Server.cpp to do encoding :
string buffer;
test::Test original;
original.set_id(0);
original.set_name("original");
original.AppendToString(&buffer);
send(acceptfd,buffer.c_str(), buffer.size(),0);
By this send function it will send the data to client,i hope and i am not getting any error also for this particular code.
But my concern is like below:
How to decode using Google Protocol buffer for the above message in
the client side
So that i can see/print the message.
You should send more than just the protobuf message to be able to decode it on the client side.
A simple solution would be to send the value of buffer.size() over the socket as a 4-byte integer using network byte order, and the send the buffer itself.
The client should first read the buffer's size from the socket and convert it from network to host byte order. Let's denote the resulting value s. The client must then preallocate a buffer of size s and read s bytes from the socket into it. After that, just use MessageLite::ParseFromString to reconstruct your protobuf.
See here for more info on protobuf message methods.
Also, this document discourages the usage of required:
You should be very careful about marking fields as required. If at
some point you wish to stop writing or sending a required field, it
will be problematic to change the field to an optional field – old
readers will consider messages without this field to be incomplete and
may reject or drop them unintentionally. You should consider writing
application-specific custom validation routines for your buffers
instead. Some engineers at Google have come to the conclusion that
using required does more harm than good; they prefer to use only
optional and repeated. However, this view is not universal.

Gsoap Http event callback

I downloaded gSoap and generated source code for wsdl. And I could connect to the server and send the request.
But I can't understand how I can catch http events like bytes sent, bytes recv.
I read this document http://www.cs.fsu.edu/~engelen/soapdoc2.html.
But I can't find what I need. I found function fsend. As I understand this function is being executed when we're sending a request to the server. Do I have to do something like this?
service.fsend = Custom;
Where Custom is my callback?
I've found other callback is ffiltersend.
As I understand this function is being runned when request is sending.
I use it.
But I don't understand last parametr in this method is pointer to size_t.
When I get value from this pointer and devide by 2 I get count of my real bytes. Why?
To obtain statistics on the number of bytes sent, received, and to log the inbound and outbound messages to a file system, use the "logging plugin" that comes with the gSOAP software.
First, register the plugin with:
#include "plugin/logging.h" // this file is in the gSOAP distro path
...
soap_register_plugin(soap, logging);
Then use these functions to set the logging destinations for inbound and outbound messages:
soap_set_logging_inbound(struct soap*, FILE*);
soap_set_logging_outbound(struct soap*, FILE*);
where the file descriptor passed as the second argument points to an open file that you can open and close before and after logging. Use NULL as the second argument to disable logging.
To obtain stats, i.e. message size byte counts, use:
soap_get_logging_stats(struct soap*, size_t *sent, size_t *recv);
where the second and third arguments will be updated by this call.
That's all there is too it.
If you want to use your own message handling callbacks then perhaps a good place to start is to learn from the plugin/logging.c file on how that is done. This file is short.