Extract GET request contents from Scapy packet - python-2.7

We are parsing pcap files that are created via the tcpdump command. Inside these pcap files we are attempting to extract the GET request information in the Raw field and print it in a readable format.
pkts = rdpcap(filename)
for pkt in pkts:
if Raw in pkt:
raw_test = pkt[Raw].load
if "GET" in raw_test:
#do stuff
The resulting text of raw_test comes out looking like this:
▒פ▒▒▒▒▒▒2▒nk▒N▒▒bEr▒▒(|▒▒▒▒Ǫ=▒▒Ih▒H+%▒2.▒L[▒▒▒sl▒E▒▒▒k6▒]=މf▒d▒O▒hB{6s▒▒▒7O2!PCG&▒A.4I▒耓▒X▒▒▒W]▒▒M5#▒▒▒vK▒#Ċ▒ ▒▒▒m]Zb_▒8▒▒▒nb~
]▒h▒6▒.̠▒49ؾG?▒▒▒4▒Ӹ▒▒G▒▒́G▒:Y▒▒▒▒.▒8▒▒d▒i4▒JAC)▒▒AO▒k▒z-▒▒S30▒X?▒▒W5B▒yW▒m▒▒▒/ƈ:G▒▒▒E▒▒<▒▒▒m▒]▒▒▒▒t▒:▒▒▒Ŕ▒W▒▒D▒E▒▒▒▒▒࿄▒▒zZ▒▒x▒]▒▒{{▒▒u▒){▒▒o▒▒G▒F▒▒▒▒▒v
▒▒▒b.
We have also tried formatting it via pkt.sprintf(“{Raw:%Raw.load%}\n”) but that has yielded the same output
P.S. Please do not link us to other related stack posts/questions as we have come across many of them already, and none of them seem to fix our problem.
Thank you in advance, any help is greatly appreciated!.

Please try this, I assume that http is targeted to port 80
if TCP in pkt and pkt[TCP].dport == 80 \
and pkt[TCP].load.startswith("GET") :
print pkt[TCP].load

Related

c/c++ pcap filter expression for ARP reply packets

I am trying to create pcap filter for filtering ARP replies only. In wireshark i use
arp.opcode==2
and it works perfectly. But when i use it in pcap_compile function, it throws an exception - syntax error. I tried also these variants:
arp.opcode = 2
arp.opcode 2
arp opcode 2
arp.reply
arp reply
and nothing seems to work. I tried to google it, but no success. Is it even possibly to filter that specific packets?
I suspect this should work, based on the packet structure from Wikipedia:
arp [6:2] = 2
That's also suggested by this answer: https://stackoverflow.com/a/40199540/212870
(It's easier to look up once you figure out the answer, unfortunately.)

Python2.7 --Reconstruct packets to print html

Using wireshark, I could see the html page I was requesting (segment reconstruction). I was not able to use pyshark to do this task, so I turned around to scapy. Using scapy and sniffing wlan0, I am able to print request headers with this code:
from scapy.all import *
def http_header(packet):
http_packet=str(packet)
if http_packet.find('GET'):
return GET_print(packet)
def GET_print(packet1):
ret = packet1.sprintf("{Raw:%Raw.load%}\n")
return ret
sniff(iface='wlan0', prn=http_header, filter="tcp port 80")
Now, I wish to be able to reconstruct the full request to find images and print the html page requested.
What you are searching for is
IP Packet defragmentation
TCP Stream reassembly
see here
scapy
provides best effort ip.defragmentation via defragment([list_of_packets,]) but does not provide generic tcp stream reassembly. Anyway, here's a very basic TCPStreamReassembler that may work for your usecase but operates on the invalid assumption that a consecutive stream will be split into segments of the max segment size (mss). It will concat segments == mss until a segment < mss is found. it will then spit out a reassembled TCP packet with the full payload.
Note TCP Stream Reassembly is not trivial as you have to take care of Retransmissions, Ordering, ACKs, ...
tshark
according to this answer tshark has a command-line option equivalent to wiresharks "follow tcp stream" that takes a pcap and creates multiple output files for all the tcp sessions/"conversations"
since it looks like pyshark is only an interface to the tshark binary it should be pretty straight forward to implement that functionality if it is not already implemented.
With Scapy 2.4.3+, you can use
sniff([...], session=TCPSession)
to reconstruct the HTTP packets

decoding internet packets payload in python

I have used scapy to sniff internet packets from my computer knowing that they are not encrypted how can I decode the data being sent so it comes out as clear text , something like wireshark does, I would like a code exemple for it.
I do not want to use wireshark I want to code this myself for learning.
I used the following simple script to capture the packets :
from scapy.all import *
def callback(pkt) :
print pkt.summary()
print pkt.show()
sniff(store=0, prn= callback)
It depends on the application that sends the taffic. If it sends the data unencrypted and in plain text (ascii) you can access and display it using the atribute load of the packet. For example:
def callback(pkt) :
print pkt.load
If the data is not plain text you need to know how the application is encoding the data and decode it. If you're looking for more similar output to that of wireshark you can try with hexdump(pkt).

Code to analyze pcap file

I am trying to analyse a file containing packets captured using tcpdump. I first want to categorize the packets into flows using 5-tuple. Then I need to get the size and inter-arrival time of each packet in each flow. I tried Conversation list in wireshark but it gives only the number of packets in the flow not information about each packet in the flow. A suggestion for any code (c++ or shell script) that can do the job? Thank you
UmNyobe,
If you haven't heard of Scapy yet I beleive what you are trying to do would be a near perfect fit. For example I wrote this little snippet to parse a pcap field and give me something like what you are talking about using Scapy.
#!/usr/bin/python -tt
from scapy import *
import sys
from datetime import datetime
'''Parse PCAP files into easy to read NETFLOW like output\n
Usage:\n
python cap2netflow.py <[ pcap filename or -l ]>\n
-l is live capture switch\n
ICMP packets print as source ip, type --> dest ip, code'''
def parse_netflow(pkt):
# grabs 'netflow-esqe' fields from packets in a PCAP file
try:
type = pkt.getlayer(IP).proto
except:
pass
snifftime = datetime.fromtimestamp(pkt.time).strftime('%Y-%m-%d %H:%M:%S').split(' ')[1]
if type == 6:
type = 'TCP'
if type == 17:
type = 'UDP'
if type == 1:
type = 'ICMP'
if type == 'TCP' or type == 'UDP':
print( ' '.join([snifftime, type.rjust(4, ' '), str(pkt.getlayer(IP).src).rjust(15, ' ') , str(pkt.getlayer(type).sport).rjust(5, ' ') , '-->' , str(pkt.getlayer(IP).dst).rjust(15, ' ') , str(pkt.getlayer(type).dport).rjust(5, ' ')]))
elif type == 'ICMP':
print(' '.join([snifftime, 'ICMP'.rjust(4, ' '), str(pkt.getlayer(IP).src).rjust(15, ' ') , ('t: '+ str(pkt.getlayer(ICMP).type)).rjust(5, ' '), '-->' , str(pkt.getlayer(IP).dst).rjust(15, ' '), ('c: ' + str(pkt.getlayer(ICMP).code)).rjust(5, ' ')]))
else:
pass
if '-l' in sys.argv:
sniff(prn=parse_netflow)
else:
pkts = rdpcap(sys.argv[1])
print(' '.join(['Date: ',datetime.fromtimestamp(pkts[0].time).strftime('%Y-%m-%d %H:%M:%S').split(' ')[0]]))
for pkt in pkts:
parse_netflow(pkt)
Install Python and Scapy then use this to get you started. Let me know if you need any assistance figuring it all out, if you know C++ chances are this will already make alot of sense to you.
Get Scapy here
http://www.secdev.org/projects/scapy/
There are tons of links on this page to helpful tutorials, keep in mind Scapy does alot more but hone in on the areas that talk about pcap parsing..
I hope this helps!
dc
I worked on a library to analyze tcp dump but it was for a business so I cannot just give to you. if you don't find what you are looking for then my answer can help. A tcpdump is just nested network data like the Matryoshka dolls, where the pcap layer is added by tcpdump.
If you only want to work on the captures, the format of a dump is specified in Libpcap File Format. To get the size and time of arrival of each packet you need to process the dump using this specification.
If you have to go deeper in the analysis these are the following layers in order
the link layer
the internet layer
Transport layer
The application layer
Each layer has a header definition. So you need to find which protocol stack your pcap data contains and to parse the header to get information.
What are the members of the 5-tuple? If the flows are TCP or UDP, the source and destination IP addresses and port numbers, plus, perhaps, a number to distinguish multiple flows over time between the two endpoints would work; for SCTP, it would be similar, although if a flow is a stream, you might need more.
If the members of the 5-tuple are all "named fields" in Wireshark, you could use TShark with the -T fields option, and use the -e option to specify which fields to print, and select a field with the time stamp (frame.time_epoch would give you the time as seconds and fractions of a second since the UN*X epoch), a field the appropriate size (frame.len gives you the raw number of bytes in the link-layer packet PLUS any meta-data such as a radiotap header for 802.11 radio information), and the other fields, and then feed the output of TShark to a script or program that does the processing you want to do. That lets TShark do the processing of the protocol layers, so that your program only needs to process the resulting data.

WinPCap Data Getting Truncated

Working on parsing Arp packets and I found this nice problem.
when receiving an Arp packet I was parsing the target's IP address.
I have c0 a8 in my hex dumb but after that it ends. I am missing data! I see the data in Wireshark but I am not getting the data through WinPCap.
I have yet to run into this issue before. Any ideas SO? So far no memory access errors though. Probably just luck. :x
EDIT:
My main look for processing packets is from the example pktdump_ex.
Here is the while line
while((res = pcap_next_ex( fp, &header, &pkt_data)) >= 0)
After that is executed, the snalen is 2b.
As noted in he comment, this smells like a faulty snaplen configuration. If you look at the winpcap api docs pcap_open() apidoc, it states:
snaplen,: length of the packet that has to be retained. For each packet received by the filter, only the first 'snaplen' bytes are stored in the buffer and passed to the user application. For instance, snaplen equal to 100 means that only the first 100 bytes of each packet are stored.
As explanation for the second parameter of pcap_open. Unless you provide some more detailed code snippets to work with, this is the closest to an answer we will get.