Code to analyze pcap file - c++

I am trying to analyse a file containing packets captured using tcpdump. I first want to categorize the packets into flows using 5-tuple. Then I need to get the size and inter-arrival time of each packet in each flow. I tried Conversation list in wireshark but it gives only the number of packets in the flow not information about each packet in the flow. A suggestion for any code (c++ or shell script) that can do the job? Thank you

UmNyobe,
If you haven't heard of Scapy yet I beleive what you are trying to do would be a near perfect fit. For example I wrote this little snippet to parse a pcap field and give me something like what you are talking about using Scapy.
#!/usr/bin/python -tt
from scapy import *
import sys
from datetime import datetime
'''Parse PCAP files into easy to read NETFLOW like output\n
Usage:\n
python cap2netflow.py <[ pcap filename or -l ]>\n
-l is live capture switch\n
ICMP packets print as source ip, type --> dest ip, code'''
def parse_netflow(pkt):
# grabs 'netflow-esqe' fields from packets in a PCAP file
try:
type = pkt.getlayer(IP).proto
except:
pass
snifftime = datetime.fromtimestamp(pkt.time).strftime('%Y-%m-%d %H:%M:%S').split(' ')[1]
if type == 6:
type = 'TCP'
if type == 17:
type = 'UDP'
if type == 1:
type = 'ICMP'
if type == 'TCP' or type == 'UDP':
print( ' '.join([snifftime, type.rjust(4, ' '), str(pkt.getlayer(IP).src).rjust(15, ' ') , str(pkt.getlayer(type).sport).rjust(5, ' ') , '-->' , str(pkt.getlayer(IP).dst).rjust(15, ' ') , str(pkt.getlayer(type).dport).rjust(5, ' ')]))
elif type == 'ICMP':
print(' '.join([snifftime, 'ICMP'.rjust(4, ' '), str(pkt.getlayer(IP).src).rjust(15, ' ') , ('t: '+ str(pkt.getlayer(ICMP).type)).rjust(5, ' '), '-->' , str(pkt.getlayer(IP).dst).rjust(15, ' '), ('c: ' + str(pkt.getlayer(ICMP).code)).rjust(5, ' ')]))
else:
pass
if '-l' in sys.argv:
sniff(prn=parse_netflow)
else:
pkts = rdpcap(sys.argv[1])
print(' '.join(['Date: ',datetime.fromtimestamp(pkts[0].time).strftime('%Y-%m-%d %H:%M:%S').split(' ')[0]]))
for pkt in pkts:
parse_netflow(pkt)
Install Python and Scapy then use this to get you started. Let me know if you need any assistance figuring it all out, if you know C++ chances are this will already make alot of sense to you.
Get Scapy here
http://www.secdev.org/projects/scapy/
There are tons of links on this page to helpful tutorials, keep in mind Scapy does alot more but hone in on the areas that talk about pcap parsing..
I hope this helps!
dc

I worked on a library to analyze tcp dump but it was for a business so I cannot just give to you. if you don't find what you are looking for then my answer can help. A tcpdump is just nested network data like the Matryoshka dolls, where the pcap layer is added by tcpdump.
If you only want to work on the captures, the format of a dump is specified in Libpcap File Format. To get the size and time of arrival of each packet you need to process the dump using this specification.
If you have to go deeper in the analysis these are the following layers in order
the link layer
the internet layer
Transport layer
The application layer
Each layer has a header definition. So you need to find which protocol stack your pcap data contains and to parse the header to get information.

What are the members of the 5-tuple? If the flows are TCP or UDP, the source and destination IP addresses and port numbers, plus, perhaps, a number to distinguish multiple flows over time between the two endpoints would work; for SCTP, it would be similar, although if a flow is a stream, you might need more.
If the members of the 5-tuple are all "named fields" in Wireshark, you could use TShark with the -T fields option, and use the -e option to specify which fields to print, and select a field with the time stamp (frame.time_epoch would give you the time as seconds and fractions of a second since the UN*X epoch), a field the appropriate size (frame.len gives you the raw number of bytes in the link-layer packet PLUS any meta-data such as a radiotap header for 802.11 radio information), and the other fields, and then feed the output of TShark to a script or program that does the processing you want to do. That lets TShark do the processing of the protocol layers, so that your program only needs to process the resulting data.

Related

Extract GET request contents from Scapy packet

We are parsing pcap files that are created via the tcpdump command. Inside these pcap files we are attempting to extract the GET request information in the Raw field and print it in a readable format.
pkts = rdpcap(filename)
for pkt in pkts:
if Raw in pkt:
raw_test = pkt[Raw].load
if "GET" in raw_test:
#do stuff
The resulting text of raw_test comes out looking like this:
▒פ▒▒▒▒▒▒2▒nk▒N▒▒bEr▒▒(|▒▒▒▒Ǫ=▒▒Ih▒H+%▒2.▒L[▒▒▒sl▒E▒▒▒k6▒]=މf▒d▒O▒hB{6s▒▒▒7O2!PCG&▒A.4I▒耓▒X▒▒▒W]▒▒M5#▒▒▒vK▒#Ċ▒ ▒▒▒m]Zb_▒8▒▒▒nb~
]▒h▒6▒.̠▒49ؾG?▒▒▒4▒Ӹ▒▒G▒▒́G▒:Y▒▒▒▒.▒8▒▒d▒i4▒JAC)▒▒AO▒k▒z-▒▒S30▒X?▒▒W5B▒yW▒m▒▒▒/ƈ:G▒▒▒E▒▒<▒▒▒m▒]▒▒▒▒t▒:▒▒▒Ŕ▒W▒▒D▒E▒▒▒▒▒࿄▒▒zZ▒▒x▒]▒▒{{▒▒u▒){▒▒o▒▒G▒F▒▒▒▒▒v
▒▒▒b.
We have also tried formatting it via pkt.sprintf(“{Raw:%Raw.load%}\n”) but that has yielded the same output
P.S. Please do not link us to other related stack posts/questions as we have come across many of them already, and none of them seem to fix our problem.
Thank you in advance, any help is greatly appreciated!.
Please try this, I assume that http is targeted to port 80
if TCP in pkt and pkt[TCP].dport == 80 \
and pkt[TCP].load.startswith("GET") :
print pkt[TCP].load

Simple libtorrent Python client

I tried creating a simple libtorrent python client (for magnet uri), and I failed, the program never continues past the "downloading metadata".
If you may help me write a simple client it would be amazing.
P.S. When I choose a save path, is the save path the folder which I want my data to be saved in? or the path for the data itself.
(I used a code someone posted here)
import libtorrent as lt
import time
ses = lt.session()
ses.listen_on(6881, 6891)
params = {
'save_path': '/home/downloads/',
'storage_mode': lt.storage_mode_t(2),
'paused': False,
'auto_managed': True,
'duplicate_is_error': True}
link = "magnet:?xt=urn:btih:4MR6HU7SIHXAXQQFXFJTNLTYSREDR5EI&tr=http://tracker.vodo.net:6970/announce"
handle = lt.add_magnet_uri(ses, link, params)
ses.start_dht()
print 'downloading metadata...'
while (not handle.has_metadata()):
time.sleep(1)
print 'got metadata, starting torrent download...'
while (handle.status().state != lt.torrent_status.seeding):
s = handle.status()
state_str = ['queued', 'checking', 'downloading metadata', \
'downloading', 'finished', 'seeding', 'allocating']
print '%.2f%% complete (down: %.1f kb/s up: %.1f kB/s peers: %d) %s %.3' % \
(s.progress * 100, s.download_rate / 1000, s.upload_rate / 1000, \
s.num_peers, state_str[s.state], s.total_download/1000000)
time.sleep(5)
What happens it is that the first while loop becomes infinite because the state does not change.
You have to add a s = handle.status (); for having the metadata the status changes and the loop stops. Alternatively add the first while inside the other while so that the same will happen.
Yes, the save path you specify is the one that the torrents will be downloaded to.
As for the metadata downloading part, I would add the following extensions first:
ses.add_extension(lt.create_metadata_plugin)
ses.add_extension(lt.create_ut_metadata_plugin)
Second, I would add a DHT bootstrap node:
ses.add_dht_router("router.bittorrent.com", 6881)
Finally, I would begin debugging the application by seeing if my network interface is binding or if any other errors come up (my experience with BitTorrent download problems, in general, is that they are network related). To get an idea of what's happening I would use libtorrent-rasterbar's alert system:
ses.set_alert_mask(lt.alert.category_t.all_categories)
And make a thread (with the following code) to collect the alerts and display them:
while True:
ses.wait_for_alert(500)
alert = lt_session.pop_alert()
if not alert:
continue
print "[%s] %s" % (type(alert), alert.__str__())
Even with all this working correctly, make sure that torrent you are trying to download actually has peers. Even if there are a few peers, none may be configured correctly or support metadata exchange (exchanging metadata is not a standard BitTorrent feature). Try to load a torrent file (which doesn't require downloading metadata) and see if you can download successfully (to rule out some network issues).

Python2.7 --Reconstruct packets to print html

Using wireshark, I could see the html page I was requesting (segment reconstruction). I was not able to use pyshark to do this task, so I turned around to scapy. Using scapy and sniffing wlan0, I am able to print request headers with this code:
from scapy.all import *
def http_header(packet):
http_packet=str(packet)
if http_packet.find('GET'):
return GET_print(packet)
def GET_print(packet1):
ret = packet1.sprintf("{Raw:%Raw.load%}\n")
return ret
sniff(iface='wlan0', prn=http_header, filter="tcp port 80")
Now, I wish to be able to reconstruct the full request to find images and print the html page requested.
What you are searching for is
IP Packet defragmentation
TCP Stream reassembly
see here
scapy
provides best effort ip.defragmentation via defragment([list_of_packets,]) but does not provide generic tcp stream reassembly. Anyway, here's a very basic TCPStreamReassembler that may work for your usecase but operates on the invalid assumption that a consecutive stream will be split into segments of the max segment size (mss). It will concat segments == mss until a segment < mss is found. it will then spit out a reassembled TCP packet with the full payload.
Note TCP Stream Reassembly is not trivial as you have to take care of Retransmissions, Ordering, ACKs, ...
tshark
according to this answer tshark has a command-line option equivalent to wiresharks "follow tcp stream" that takes a pcap and creates multiple output files for all the tcp sessions/"conversations"
since it looks like pyshark is only an interface to the tshark binary it should be pretty straight forward to implement that functionality if it is not already implemented.
With Scapy 2.4.3+, you can use
sniff([...], session=TCPSession)
to reconstruct the HTTP packets

decoding internet packets payload in python

I have used scapy to sniff internet packets from my computer knowing that they are not encrypted how can I decode the data being sent so it comes out as clear text , something like wireshark does, I would like a code exemple for it.
I do not want to use wireshark I want to code this myself for learning.
I used the following simple script to capture the packets :
from scapy.all import *
def callback(pkt) :
print pkt.summary()
print pkt.show()
sniff(store=0, prn= callback)
It depends on the application that sends the taffic. If it sends the data unencrypted and in plain text (ascii) you can access and display it using the atribute load of the packet. For example:
def callback(pkt) :
print pkt.load
If the data is not plain text you need to know how the application is encoding the data and decode it. If you're looking for more similar output to that of wireshark you can try with hexdump(pkt).

IPv6 destination options header

I'm working on a software-defined networking research project, and what I need is to make a simple UDP server that puts a data tag into the destination options field (IPv6) of the UDP packet. I was expecting to either the sendmsg() recvmsg() commands, or by using setsockopt() and getsockopt(). So, Python 2.7 doesn't have sendmsg() or recvmsg(), and while I can get setsockopt() to correctly load a tag into the packet (I see it in Wireshark), the getsockopt() command just returns a zero, even if the header is there.
#Python 2.7 client
#This code does put the dest opts header onto the packet correctly
#dst_header is a packed binary string (construction details irrelevant--
# it appears correctly formatted and parsed in Wireshark)
addr = ("::", 5000, 0, 0)
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_DSTOPTS, dst_header)
s.sendto('This is my message ', addr)
#Python 2.7 server
addr = ("::", 5000, 0, 0)
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_RECVDSTOPTS, 1)
s.bind(addr)
data, remote_address = s.recvfrom(MAX)
header_data = s.getsockopt(socket.IPPROTO_IPV6, socket.IPPROTO_DSTOPTS, 1024)
I also tried this in Python 3.4, which does have sendmsg() and recvmsg(), but I just get an error message of "OSError: [Errno 22]: Invalid argument", even though I'm passing it (apparently) correct types:
s.sendmsg(["This is my message"], (socket.IPPROTO_IPV6, socket.IPV6_DSTOPTS, dst_header), 0, addr) #dst_header is same string as for 2.7 version
It looks like 99% of the usage of sendmsg() and recvmsg() is for passing UNIX file descriptors, which isn't what I want to do. Anybody got any ideas? I thought this would be just a four or five line nothing-special program, but I'm stumped.
OK, I'm going to partially answer my own question here, on the off chance that a search engine will bring somebody here with the same issues as I had.
I got the Python 3.4 code working. The problem was not the header, it was the message body. Specifically, both the message body and the header options value fields must be bytes (or bytearray) objects, stored in an iterable container (here, a list). By passing it ["This is my message"] I was sending in a string, not a bytes object; Python let it go, but the OS couldn't cope with that.
You might say I was "byted" by the changes in the handling of strings in Python 3.X...