decompressing IMAP deflated message

decompressing IMAP deflated message - compression

I have an issue trying to decompress an imap message compressed using deflate method. The things I've tryed so far were isolating one of the directions of an IMAP conversation (using wireshark's follow tcp function) and saving the message data in an raw format that I hope it contains only the deflated message part. I then found some programs like tinf (1st and 3rd example) and miniz (tgunzip example) and tryed to inflate back that file, but with no succes.
I am missing something? Thank you in advance.
tinf - http://www.ibsensoftware.com/download.html
Miniz - https://code.google.com/archive/p/miniz/source/default/source

Try piping that raw data to:
perl -MCompress::Zlib -pe 'BEGIN{$i = inflateInit(-WindowBits => -15)}
$_=$i->inflate($_)'
The important part is the -WindowBits => -15 that changes the expected format into a raw one without adler checksum.
(that's derived from the dovecot source, works for me on Thunderbird to gmail network capture).
From RFC4978 that specifies IMAP compression (emphasis mine):
When using the zlib library (see RFC1951), the functions
deflateInit2(), deflate(), inflateInit2(), and inflate() suffice to
implement this extension. The windowBits value must be in the range
-8 to -15, or else deflateInit2() uses the wrong format.
deflateParams() can be used to improve compression rate and resource
use. The Z_FULL_FLUSH argument to deflate() can be used to clear the
dictionary (the receiving peer does not need to do anything).

Related

Some questions about CkZip

I'm evaluating Chilkat and working with CkZip component to see if it feets our requirements. I have some questions derived of my tests:
When I put an event callback object, in: void FileZipped(const char
*path, _int64 fileSize, _int64 compressedSize, bool *abort);
I always get the same value for fileSize and compressedSize
(compression level was put to 9 and algo to deflate) Is it
intentionally / normal? Maybe it's a bug...
It seems that ProgressInfo event is received for the whole zip, so
when compressing a single large file and it tooks a bit of time, we
have no feedback about compression progress (ToBeZipped and
FileZipped received, with a difference of minutes).
I see the method AppendCompressed. So compressing a file with
CkCompression I can obtain compressed data and apply to
AppendCompressed directly. But documentation says CkCompression
handles "ppmd", "deflate", "zlib", "bzip2", or "lzw", and
AppendCompressed says that data should be unencrypted deflate data.
When we are building zipx files with lzma algo, AppendCompressed
data will took deflate compressed data and recompress with lzma? or
AppendCompressed data only takes deflate data so we cannot make a
lzma zipx file using AppendCompressed?.
Thanks in advance!
PD: Sorry, had to post here because chilkat forum says "This forum is closed. Post instead to stackoverflow.com with tag "chilkat""

Due lack of support/answer from Chilkat we have chosen to use another library that for now conforms to what we want/expect and does not have the mentioned failures.
Thanks!

Python2.7 --Reconstruct packets to print html

Using wireshark, I could see the html page I was requesting (segment reconstruction). I was not able to use pyshark to do this task, so I turned around to scapy. Using scapy and sniffing wlan0, I am able to print request headers with this code:
from scapy.all import *
def http_header(packet):
http_packet=str(packet)
if http_packet.find('GET'):
return GET_print(packet)
def GET_print(packet1):
ret = packet1.sprintf("{Raw:%Raw.load%}\n")
return ret
sniff(iface='wlan0', prn=http_header, filter="tcp port 80")
Now, I wish to be able to reconstruct the full request to find images and print the html page requested.

What you are searching for is
IP Packet defragmentation
TCP Stream reassembly
see here
scapy
provides best effort ip.defragmentation via defragment([list_of_packets,]) but does not provide generic tcp stream reassembly. Anyway, here's a very basic TCPStreamReassembler that may work for your usecase but operates on the invalid assumption that a consecutive stream will be split into segments of the max segment size (mss). It will concat segments == mss until a segment < mss is found. it will then spit out a reassembled TCP packet with the full payload.
Note TCP Stream Reassembly is not trivial as you have to take care of Retransmissions, Ordering, ACKs, ...
tshark
according to this answer tshark has a command-line option equivalent to wiresharks "follow tcp stream" that takes a pcap and creates multiple output files for all the tcp sessions/"conversations"
since it looks like pyshark is only an interface to the tshark binary it should be pretty straight forward to implement that functionality if it is not already implemented.

With Scapy 2.4.3+, you can use
sniff([...], session=TCPSession)
to reconstruct the HTTP packets

Can a false sync word be found in the payload of an MPEG-1/MPEG-2 frame?

I know I can find other answers about this on SO, but I want clarifications from somebody who really knows MPEG-1/MPEG-2 (or MP3, obviously).
The start of an MPEG-1/2 frame is 12 set bits starting at a byte boundary, so bytes ff f*, where * is any nibble. Those 12 bits are called a sync word. This is a useful characteristic to find the start of a frame in any MPEG-1/2 stream.
My first question is: formally, can a false sync word be found or not in the payload of an MPEG-1/2 frame, outside its header?
If so, here's my second question: why does the sync word mechanism even exist then? If we cannot make sure that we found a new frame when reading fff, what is the purpose of this sync word?
Please do not even consider ID3 in your answer; I already know about sync words that can be found in ID3v2 payloads, but that's well documented.

I worked on MPEG-2 streams, more precisely Transport Streams (TS): I guess we can find similarities.
A TS is composed of Transport Packets, which have a header, starting with a sync byte 0x47.
We also can found 0x47 within the payload of the TP, but we know that it is not a sync byte because it is not aligned (TP have a fixed size of 188 bytes).
The sync word gives an entry point to someone that looks at the stream, and allows a program to synchronize his process with the stream, hence the name.
It also allows a fast browsing and parsing of the stream: in a TS you can jump from a packet to another (inspect header, check sync byte, skip 188 bytes and so on)
Finally it is a safety measure that helps you to spot errors (in the stream during transmission for example or in the process if a bug caused a bad alignment)
These argument are about TS but I think the same goes with your case : finding a sync word within a payload should not be an issue because you should always able to distinguish payload and header, most of the time with a length information (either because the size is fixed like in TP or because you have a TLV format).

can a false sync word be found or not in the payload of an MPEG-1/2
frame, outside its header?
According to this, "frame sync can be easily (and very frequently) found in any binary file." See the section titled "MPEG Audio Frame Header"
I confirmed this with an .mp3 song that I chose at random (stripped of ID3 tags). It had 5193 sync words, of which only 4898 were found to be valid (using code too long to be included here).
>>> f = open('notag.mp3', 'rb')
>>> r=f.read()
>>> r.count(b'\xff\xfb')
5193
why does the sync word mechanism even exist then? If we cannot make
sure that we found a new frame when reading fff, what is the purpose
of this sync word?
We can be (relatively) sure if we are checking the rest of the frame header, and not just the sync word. There are bits following the sync which can be used to:
identify a false positive or
give you useful info
With .mp3, you have to use those useful bits to calculate the size of the frame. By skipping ahead <frame-size> bytes before looking for the next sync word, you avoid any false syncs that may be present in the payload. See the section titled "How to calculate frame length" in that same link.

parsing an XMPP stream with libxml2

I'm a beginner when it comes to libxml2, so here is my question:
I'm working at a small XMPP client. I have a stream that I receive from the network, the received buffer is fed into my Parser class, chunk by chunk, as the data is received. I may receive incomplete fragments of XML data:
<stream><presence from='user1#dom
and at the next read from socket I should get the rest:
ain.com to='hatter#wonderland.lit/'/>
The parser should report an error in this case.
I'm only interested in elements having depth 0 and depth 1, like stream and presence in my example above. I need to parse this kind of stream and for each of this elements, depth 0 or 1, create a xmlNodePtr (I have classes representing stream, presence elements that take as input a xmlNodePtr). So this means I must be able to create an xmlNodePtr from only an start element like , because the associated end element( in this case) is received only when the communication is finished.
I would like to use a pull parser.
What are the best functions to use in this case ? xmlReaderForIO, XmlReaderForMemory etc ?
Thank you !

You probably want a push parser using xmlCreatePushParserCtxt and xmlParseChunk. Even better would be to choose one of the existing open source C libraries for XMPP. For example, here is the code from libstrophe that does what you want already.

Progress indication with HTTP file download using WinHTTP

I want to implement an progress bar in my C++ windows application when downloading a file using WinHTTP. Any idea how to do this? It looks as though the WinHttpSetStatusCallback is what I want to use, but I don't see what notification to look for... or how to get the "percent downloaded"...
Help!
Thanks!

Per the docs:
WINHTTP_CALLBACK_STATUS_DATA_AVAILABLE
Data is available to be retrieved with
WinHttpReadData. The
lpvStatusInformation parameter points
to a DWORD that contains the number of
bytes of data available. The
dwStatusInformationLength parameter
itself is 4 (the size of a DWORD).
and
WINHTTP_CALLBACK_STATUS_READ_COMPLETE
Data was successfully read from the
server. The lpvStatusInformation
parameter contains a pointer to the
buffer specified in the call to
WinHttpReadData. The
dwStatusInformationLength parameter
contains the number of bytes read.
There may be other relevant notifications, but these two seem to be the key ones. Getting "percent" is not necessarily trivial because you may not know how much data you're getting (not all downloads have content-length set...); you can get the headers with:
WINHTTP_CALLBACK_STATUS_HEADERS_AVAILABLE
The response header has been received
and is available with
WinHttpQueryHeaders. The
lpvStatusInformation parameter is
NULL.
and if Content-Length IS available then the percentage can be computed by keeping track of the total number of bytes at each "data available" notification, otherwise your guess is as good as mine;-).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

decompressing IMAP deflated message - compression

Related

Some questions about CkZip

Python2.7 --Reconstruct packets to print html

Can a false sync word be found in the payload of an MPEG-1/MPEG-2 frame?

parsing an XMPP stream with libxml2

Progress indication with HTTP file download using WinHTTP

Categories

Resources