c++ accessed url log - c++

im now currently developing a standalone c++ program that would list all the access URL in a browser and its corresponding response time....
at this point of time, i can already sniff all out and in going packets. i am using winpcap for this...
retrieved packets were filtered to by only those 'tcp port 80(http) or 443(https)'...
and know i want to read some http headers. the problem i have is that usually ip are fragmented.
I want to know how to reassemble this and how to have some details about the http..
Note: i want to implement that of WIRESHARK.. in every packet/frame, it has a
'REASSEMBLED TCP SEGMENT'
any idea or tutorials how i can easily attain this?!..
thanks alot!

You'll have to do the same thing TCP does to reassemble packets, which means parsing the header of the packets and sequencing them into another buffer. The worst program logic is probably dealing with missing information; you'll then have to see if it was flagged and retransmitted.
There are a number of RFCs which cover this: 675, 793, 1122 and others. If looking through those seems overwhelming, maybe back off and look at the Roadmap RFC, rfc 4614.

Related

How to find retransmitted TCP packets

I'm trying to write a C++ program to scan a pcap file and filter out certain packets. I tried using winpcap to scan and filter but I can't find a way to separate retransmissions. Does anyone know how this filtering can be done?
You will have to implement several TCP variables to do this.
Read section 3.2 of RFC 793.

How to send/receive XML data with sockets in Qt using string?

I have a Qt TCP Server and Client program which can interact with each other. The Server can send some function generated data to the socket using Qtextstream. And the Client reads the data from the socket using simple readAll() and displays to a QtextEdit.
Now my data from Server side is huge (around 7000+ samples ) and I need the data to appear on the Client side instantaneously. I have learned that using XML will help in my case. So, I made an Qt XML Server and it generates the whole xml data into a .xml file. I read the .xml file in Client side and I can get to display its contents. I used the DOM method for parsing. But I get the data to display only when all the 7000+ samples have been generated on the Server side.
I need clarifications on these questions:
How do I write each element of the XML Server side in to a String and send them through socket? I learnt tagName() can help me, but I have not been able to figure out how.
Is there any other way other than the String method to get a single element generated in the Server side to appear in the Client side.
PS: I am a newbie, forgive my ignorance. Thank you.
Most DOM XML parsers require a complete, well-formed XML document before they'll do anything with it. That's precisely what you see: your data is processed only after all of the samples have been received.
You need to use an incremental parser that doesn't care about the XML document not being complete yet.
On the other hand: if you're not requiring XML for interoperability with 3rd party systems, you're probably wasting a lot of resources by using it. I don't know where you've "learned" that XML will "help in your case". To me it's not learning, it's just following the crowd without understanding what's going on. Is your requirement to use XML or to move the data around? Moving data around has been a well understood problem for decades. Computers "speak" binary. No need to work around it, you know. If all you need is to move around some numbers, use QDataStream and be done with it. It'll be two orders of magnitude faster than the fastest XML parsers, you'll transmit an order of magnitude less data, and everyone will live happily ever after*.
*living happily ever after not guaranteed, individual results may vary.

How to correctly parse incoming HTTP requests

i've created an C++ application using WinSck, which has a small (handles just a few features which i need) http server implemented. This is used to communicate with the outside world using http requests. It works, but sometimes the requests are not handled correctly, because the parsing fails. Now i'm quite sure that the requests are correctly formed, since they are sent by major web browsers like firefox/chrome or perl/C# (which have http modules/dll's).
After some debugging i found out that the problem is in fact in receiving the message. When the message comes in more than just one part (it is not read in one recv() call) then sometimes the parsing fails. I have gone through numerous tries on how to resolve this, but nothing seems to be reliable enough.
What i do now is that i read in data until i find "\r\n\r\n" sequence which indicates end of header. If WSAGetLastError() reports something else than 10035 (connection closed/failed) before such a sequence is found i discard the message. When i know i have the whole header i parse it and look for information about the body length. However i'm not sure if this information is mandatory (i think not) and what should i do if there is no such information - does it mean there will be no body? Another problem is that i do not know if i should look for a "\r\n\r\n" after the body (if its length is greater than zero).
Does anybody know how to reliably parse a http message?
Note: i know there are implementations of http servers out there. I want my own for various reasons. And yes, reinventing the wheel is bad, i know that too.
If you're set on writing your own parser, I'd take the Zed Shaw approach: use the Ragel state machine compiler and build your parser based on that. Ragel can handle input arriving in chunks, if you're careful.
Honestly, though, I'd just use something like this.
Your go-to resource should be RFC 2616, which describes HTTP 1.1, which you can use to construct a parser. Good luck!
You could try looking at their code to see how they handle a HTTP message.
Or you could look at the spec, there's message length fields you should use. Only buggy browsers send additional CRLFs at the end, apparently.
Anyway HTTP request has "\r\n\r\n" at the end of request headers and before the request data if any, even if request is "GET / HTTP/1.0\r\n\r\n".
If method is "POST" you should read as many bytes after "\r\n\r\n", as specified in Content-Length field.
So pseudocode is:
read_until(buf, "\r\n\r\n");
if(buf.starts_with("POST")
{
contentLength = regex("^Content-Length: (\d+)$").find(buf)[1];
read_all(buf, contentLength);
}
There will be "\r\n\r\n" after the content only if content includes it. Content may be binary data, it hasn't any terminating sequences, and the one method to get its size is use Content-Length field.
HTTP GET/HEAD requests have no body, and POST request can have no body too. You have to check if it's a GET/HEAD, if it's, then you have no content (body/message) sent. If it was a POST, do as the specs say about parsing a message of known/unknown length, as #gbjbaanb said.

tool to find out distance in terms of no. of hops in unix

I am writing an application for video streaming.In the application server is required to know the distance of the client from it self in terms of hop number.My question is,is there any tool/method other than traceroute available in unix environment to find it?
I also need to find out the geographical location of the client.So is their any tool/method for this as well?
Any help in this regard will be highly appreciated.
Thanks in advance.
Mawia
This is slightly complex for several reasons. First of all, the number of hops might not be the same in each direction. Disregarding that, you might be able to look at the TTL field of the recieved IP packets. This is a layer lower down than UDP, so I'm not entirely sure how to access it through the normal socket interfaces. But if you can get at that value, you can compare it to what the client/server normally sends (usually 64 or 255) and deduce the number of hops it has traversed.
For geographical information there are a number of geoip databases available online, but usually commerically. The precision of the data might also vary.
You might try tracepath as an alternative to traceroute.
If you haven't read this yet, you must do so now:
"The case of the 500 mile email."

How do I extract the network protocol from the source code of the server?

I'm trying to write a chat client for a popular network. The original client is proprietary, and is about 15 GB larger than I would like. (To be fair, others call it a game.)
There is absolutely no documentation available for the protocol on the internet, and most search results only come back with the client's scripting interface. I can understand that, since used in the wrong way, it could lead to ruining other people's experience.
I've downloaded the source code of a couple of alternative servers, including the one I want to connect to, but those
contain no documentation other than install instructions
are poorly commented (I did a superficial browsing)
are HUGE (the src folder of the target server contains 12 MB worth of .cpp and .h files), and grep didn't find anything related
I've also tried searching their forums and contacting the maintainers of the server, but so far, no luck.
Packet sniffing isn't likely to help, as the protocol relies heavily on encryption.
At this point, all my hope is my ability to chew through an ungodly amount of code. How do I start?
Edit: A related question.
If your original code is encrypted with some well known library like OpenSSL or Ctypto++ it might be useful to write your wrapper for the main entry points of these libraries, then delagating the call to the actual library. If you make such substitution and build the project successfully, you will be able to trace everything which goes out in the plain text way.
If your project is not using third party encryption libs, hopefully it is still possible to substitute the encryption routines with some wrappers which trace their input and then delegate encryption to the actual code.
Your bet is that usually enctyption is implemented in separate, relatively small number of source files so that should be easier for you to track input/output in these files.
Good luck!
I'd say
find the command that is used to send data through the socket (the call depends on the network library)
find references of this command and unroll from there. If you can modify-recompile the server code, it might help.
On the way, you will be able to log decrypted (or, more likely, not yet encrypted) network activity.
IMO, the best answer is to read the source code of the alternative server. Try using a good C++ IDE to help you. It will make a lot of difference.
It is likely that the protocol related material you need to understand will be limited to a subset of the files. These will contain references to network sockets and things. Start from there and work outwards as far as you need to.
A viable approach is to tackle this as a crypto challenge. That makes it easy, because you control so much.
For instance, you can use a current client to send a known message to the server, and then check server memory for that string. Once you've found out in which object the string ends, it also becomes possible to trace its ancestry through the code. Set a breakpoint on any non-const method of the object, and find the stacktraces. This gives you a live view of how messages arrive at the server, and a list of core functions essential to message processing. You can next find related functions (caller/callee of the functions on your list).