Efficient pcap parser in c/c++ - c++

I am a newbie on pcap parsing and I would like to ask you for some help about this task (I am using Debian 9):
A.pcap is a pcap file that contains the network packet to parse, along with other packets to discard
B.so is a library file that contains the binary to parse the packets the task cares about. There are no header files, so I need to use a binary inspection
Both files can be downloaded at this link:
https://www.dropbox.com/s/ustehbd8lmejddv/task.zip?dl=0
First thing I try to check and dig inside both files using:
nm -gC B.so
tcpick -C -yP -r A.pcap
tcpdump -qns 0 -A -r A.pcap
Now I should parse only the inbound OrderField packets and retrieve the following fields:
1. OrderStatus
2. OrderLocalID
3. LimitPrice
4. Direction
5. InstrumentID
I believe I am having troubles with tcpick and tcpdump because I can't access to any of those information, I can only see a long list of MAC/IP addresses plus some "random" chars.
Do you have any suggestions?
Thank you in advance.

I'm not interviewing the company and have nothing to do with them, but just having fun to reverse engineering this stuff. I just give some idea and hints found from the problem and internet.
You can use libpcap to retrieve every TCP packet from the file, and reconstruct the related inbound transaction to fulfill the requirement of the problem.
The OrderField packet might be related to this https://github.com/fakechris/femas_api/blob/master/traderapidemo/TraderApi4LNX64/USTPFtdcUserApiStruct.h
The compression method is ZeroCompress. Therefore you can directly call the shared library CompressUtil::Zerodecompress to unpack the TCP packet payload
Use the data structure typedef found in the github to extract these 4 necessary fields.
Disclaimer: I might be wrong because I am not able to verify the result.

Related

How to find retransmitted TCP packets

I'm trying to write a C++ program to scan a pcap file and filter out certain packets. I tried using winpcap to scan and filter but I can't find a way to separate retransmissions. Does anyone know how this filtering can be done?
You will have to implement several TCP variables to do this.
Read section 3.2 of RFC 793.

How does one filter pcap (e.g. tcpdump) files by content regex match?

I have some large pcap (packet capture) files collected with tcpdump. I would like to filter out packets that contain a specific string. I want my output to still be in pcap format. I've found several ways of only displaying packets matching a regex from a pcap file, but what I need is to filter such files rather than filter out the display (e.g. stdout) of the packets. The output needs to be pcap files with the matched packets removed.
I suspect dpkt (a python module) might help, but I'd prefer to do this using an existing (C/C++) tool, if possible. I'll accept code as an answer (maybe a good dpkt example with benchmarks will convince me to just go that way as well ;-)).
Thanks in advance!
Answer:
Per Nim's answer, it's possible to do this via Wireshark/tshark. For others' reference, here's an example commandline, where I check for the string match within udp packets (this example can be built on to do tcp or specific protocol field searching):
tshark -r infile -R 'not udp matches "my_search_string"' -w outfile
Thanks again!
This website has a very nice example of how you can read the pcap file in C, a quick google search will reveal how you can re-write the file.
Alternatively, AFAIK Wireshark may allow you to do this already - i.e. open the file, apply a filter and save the file (and a quick run through Wireshark - reveals that it does indeed offer this).

How do you test a Wireshark dissector?

When you write a dissector for Wireshark, how do you test it? Looking for the visual output in the UI is not sufficient for a none-trivial protocol.
Is there a good way for unit testing of the dissector?
EDIT:
The structure of protocol frames is dynamic. The dissector must somehow interpret the content.
For example if the fifth field is one a byte array follows as sixth field. If it's two you have a double array and if it's three you have to add a zero terminated string.
This usually never happens in a daily work capture. That's why you need a synthetic capture data even with "impossible" content.
To test a Wireshark dissector I found this useful:
Define a set of packets that the dissector should analyse including malformed packets
Implement the packets as a hex dump
Define the expected output
For each packet dump
Generate pcap files with text2pcap
Run the dissector with tshark
Extract the payload from the PDML output of tshark
Compare the XML output with the expected XML output
This can be improved by filtering the XML output since the PDML also includes the packet bytes, what can be annoying if the payload is large or/and complex.
The suggested arguments to the wireshark executables are
text2pcap -T 1024,9876 foo.txt foo.pcap
tshark -T pdml -r "foo.pcap"
To extract the dissector output it's useful to use an XPATH expression with the .NET CLR class XmlNode. This can be done e.g. this way:
XmlNode output = tsharkOutput.SelectSingleNode("packet/proto[#name='foo']");
XmlNodeList refList = referenceDocument.SelectNodes("proto[#name='foo']");
You can use something like Scapy or PacketSender to generate test packets.
I guess I'm old fashioned. A dissector's primary purpose is transforming data to a human readable form, so I tested mine by having humans read it.
I suppose you could do more automated testing by exporting to txt or pdml from file->export, or implementing some sort of test wrapper around your plugin DLL.
You could parse the output of tshark.
Just for updating the post.
Tshark uses the same plugins as Wireshark, and loads them in the same
manner. tshark is also used in this way in the Wireshark CI build
tests, see the test directory of the Wireshark sources for some test
script examples.
https://code.wireshark.org/review/gitweb?p=wireshark.git;a=tree;f=test
- grahamb.
source: https://ask.wireshark.org/questions/36721/tshark-for-plugin-testing

c++ accessed url log

im now currently developing a standalone c++ program that would list all the access URL in a browser and its corresponding response time....
at this point of time, i can already sniff all out and in going packets. i am using winpcap for this...
retrieved packets were filtered to by only those 'tcp port 80(http) or 443(https)'...
and know i want to read some http headers. the problem i have is that usually ip are fragmented.
I want to know how to reassemble this and how to have some details about the http..
Note: i want to implement that of WIRESHARK.. in every packet/frame, it has a
'REASSEMBLED TCP SEGMENT'
any idea or tutorials how i can easily attain this?!..
thanks alot!
You'll have to do the same thing TCP does to reassemble packets, which means parsing the header of the packets and sequencing them into another buffer. The worst program logic is probably dealing with missing information; you'll then have to see if it was flagged and retransmitted.
There are a number of RFCs which cover this: 675, 793, 1122 and others. If looking through those seems overwhelming, maybe back off and look at the Roadmap RFC, rfc 4614.

How do I extract the network protocol from the source code of the server?

I'm trying to write a chat client for a popular network. The original client is proprietary, and is about 15 GB larger than I would like. (To be fair, others call it a game.)
There is absolutely no documentation available for the protocol on the internet, and most search results only come back with the client's scripting interface. I can understand that, since used in the wrong way, it could lead to ruining other people's experience.
I've downloaded the source code of a couple of alternative servers, including the one I want to connect to, but those
contain no documentation other than install instructions
are poorly commented (I did a superficial browsing)
are HUGE (the src folder of the target server contains 12 MB worth of .cpp and .h files), and grep didn't find anything related
I've also tried searching their forums and contacting the maintainers of the server, but so far, no luck.
Packet sniffing isn't likely to help, as the protocol relies heavily on encryption.
At this point, all my hope is my ability to chew through an ungodly amount of code. How do I start?
Edit: A related question.
If your original code is encrypted with some well known library like OpenSSL or Ctypto++ it might be useful to write your wrapper for the main entry points of these libraries, then delagating the call to the actual library. If you make such substitution and build the project successfully, you will be able to trace everything which goes out in the plain text way.
If your project is not using third party encryption libs, hopefully it is still possible to substitute the encryption routines with some wrappers which trace their input and then delegate encryption to the actual code.
Your bet is that usually enctyption is implemented in separate, relatively small number of source files so that should be easier for you to track input/output in these files.
Good luck!
I'd say
find the command that is used to send data through the socket (the call depends on the network library)
find references of this command and unroll from there. If you can modify-recompile the server code, it might help.
On the way, you will be able to log decrypted (or, more likely, not yet encrypted) network activity.
IMO, the best answer is to read the source code of the alternative server. Try using a good C++ IDE to help you. It will make a lot of difference.
It is likely that the protocol related material you need to understand will be limited to a subset of the files. These will contain references to network sockets and things. Start from there and work outwards as far as you need to.
A viable approach is to tackle this as a crypto challenge. That makes it easy, because you control so much.
For instance, you can use a current client to send a known message to the server, and then check server memory for that string. Once you've found out in which object the string ends, it also becomes possible to trace its ancestry through the code. Set a breakpoint on any non-const method of the object, and find the stacktraces. This gives you a live view of how messages arrive at the server, and a list of core functions essential to message processing. You can next find related functions (caller/callee of the functions on your list).