How to truncate a JPEG 2000 filestream? - c++

I am trying to extract quality layers from a JPEG 2000 filestream, which is contained in a .j2k file for testing. I am trying to do this in order to learn how to transmit the filestream, and eventually to perform Region of Interest (ROI) selection on it. I want to do these things without decoding, and right now the only utility I have is the OpenJPEG library.
I've used the image_to_j2k utility (linux) to transform a test image into a filestream contained in a .j2k file. I've then read the .j2k file into a buffer, in binary mode:
long fsize = get_file_size("img.j2k"); //This does what it's supposed to
char* buffer = new char[fsize];
ifstream in ("img.j2k", ios::in | ios::binary);
in.read(buffer, fsize); //The entire file goes into the buffer
ofstream out1("out1.j2k");
ofstream out2("out2.j2k");
ofstream out3("out3.j2k");
//This is where I try to truncate the filestream
out1.write(buffer, fsize); //Write the entire file to out1.j2k - this works
out2.write(buffer, 11032); //Write 11032 bytes of the filestream to out2.j2k - this does not to what I thought it would
out3.write(buffer, 14714); //Write 14714 bytes of the filestream to out2.j2k - this does not to what I thought it would
in.close();
out1.flush();out1.close();
out2.flush();out2.close();
out3.flush();out3.close();
The number of bytes written to the out2 and out3 files are not chosen at random - they come from an index file that OpenJPEG makes whilst compressing. The thought was that if I took the file from the beginning and read it up to a certain point where the index file tells me there is an "end_pos" marker corresponding to the end of a quality layer, I would simulate an unfinished wireless transmission of the file - this is the end goal, to transmit the file wirelessly out in the forest and show the image in progressively better quality on a handheld device or laptop somewhere else in the forest. The result of trying to use j2k_to_image on the out2.j2k and out3.j2k files is:
[ERROR] JPWL: bad tile byte size (1307053 bytes against 10911 bytes left)
[ERROR] 00000081; expected a marker instead of 1
ERROR -> j2k_to_image: failed to decode image!
Am I going about this the entirely wrong way? Not using JPEG 2000 is out of the question. Thankful for any answers, I've really gone through documentation on this thing but can't find this detail.

Related

Knowing current compressed file size using gzwrite (zlib)

I'm using zlib for c++.
Quote from
http://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/zlib-gzwrite-1.html regarding gzwrite function:
The gzwrite() function shall write data to the compressed file referenced by file, which shall have been opened in a write mode (see gzopen() and gzdopen()). On entry, buf shall point to a buffer containing len bytes of uncompressed data. The gzwrite() function shall compress this data and write it to file. The gzwrite() function shall return the number of uncompressed bytes actually written.
I interpret this as the return value will NOT tell me how much larger the file became when writing. Only how much data was compressed into the file.
The only way to know how large the file is would then be to close it, and read the size from the file system. I have a requirement to only continue to write to the file until it reaches a certain size. Can this be achieved without closing the file?
A workaround would be to write until the uncompressed size reaches my limit and then close the file, read the size from file system and update my best guess of file size based on that, and then re-open the file and continue writing. This would make me close and open the file a few times towards the end (as I'm approaching the size limit).
Another workaround, which would give more of an estimate (which is not what I want really) would be to write until uncompressed size reaches the limit, close the file, read the file size from the file system and calculate the compression ratio so far. The I can use this compression ratio to calculate a new limit for uncompressed file size where the compression should get me down to the limit for the compressed file size. If I repeat this the estimate would improve, but again, not what I'm looking for.
Are there better options?
Preferred option would be if zlib could tell me the compressed file size while the file is still open. I don't see why this information would not be available inside zlib at this point, since compression happens when I call gzwrite and not when i close the file.
zlib provides the function gzoffset(), which does exactly what you're asking.
If for some reason you are stuck with a version of zlib that is more than about eight years old, when gzoffset() was added, then this is easy to do with gzdopen(). You open the output file with fopen() or open(), and provide the file descriptor (using fileno() and dup() if you used fopen()), and then provide that descriptor to gzdopen(). Then you can use ftell() or lseek() at any time to see how much as been written. Be careful to not try to double-close the descriptor. See the comments for gzdopen().
You can work around this issue by using a pipe. The idea is to write the compressed data into a pipe. After that, you read the data from the other end of the pipe, count it and write it to the actual file.
To set this up you need to first open the file to write to via a simple open. Then create a pipe via pipe2 and initialize zlib by passing one of the pipe descriptors to gzdopen:
int out = open("/path/to/file", O_WRONLY | O_CREAT | O_TRUNC);
int p[2];
pipe2(p, O_NONBLOCK);
gzFile zFile = gzdopen(p[0], "w");
You can now write the data first to the pipe and then splice it from the pipe to the out file:
gzwrite(zFile, buf, 1024); //or any other length
size_t bytesWritten = 0;
do {
bytesWritten = splice(p[1], NULL, out, NULL, 1024, SPLICE_F_NONBLOCK | SPLICE_F_MORE);
} while(bytesWritten == 1024);
As you can see, you now have the bytesWritten to tell you how much data was actually written. Simply sum it up in another variable and stop splicing as soon as you have written as much data as you need to (or just splice it in one go by writing everything to the zFile and the splice once with the amount of data you are allowed to store as the fifth parameter. If you want to not compress uneccessary data, simply do it in chunks as shown above).
A note on splice: Splice is linux specific, and is basically just a very efficient copy. You can always replace it with a simple "read and write" combo, i.e. read data from fd[1] into a buffer and then write the data from that buffer into out - splice is just faster and less code.

Reading subchunk2 data of a wav file in C++

I am trying to read the data part of a .wav file into a buffer. I have already read the header part according to C++ Reading the Data part of a WAV file
Therefore, my file pointer wavFile now points to the beginning of the data section. Then I use the following code to read audio data into a buffer.
long bytes = wavHeader.bitsPerSample/8;
long buffsize= wavHeader.Subchunk2Size/bytes;
int16_T *audiobuf = new int16_T[buffsize];
fread(audiobuf,bytes,buffsize,wavFile);
// do some processing
delete audiobuf;
In my test audio file, bitsPerSample is 16 and Subchunk2Size is 79844. Therefore, buffsize is 39922.
After running this code, I noticed that only first 256 positions of audiobuf get filled. But theoretically there should be 39922 entries of audio data. How can I sort out this issue?

how to report progress of data read on a QuaGzipFile (QuaZIP library)

I am using QuaZIP 0.5.1 with Qt 5.1.1 for C++ on Ubuntu 12.04 x86_64.
My program reads a large gzipped binary file, usually 1GB of uncompressed data or more, and makes some computations on it. It is not computational-extensive, and most of the time is passed on I/O. So if I can find a way to report how much data of the file is read, I can report it on a progress bar, and even provide an estimation of ETA.
I open the file with:
QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
{
// report error
return;
}
But there is no functionality in QuaGzipFile to find the file size nor the current position.
I do not need to find size and position of uncompressed stream, the size and position of compressed stream are fine, because a rough estimation of progress is enough.
Currently, I can find size of compressed file, using QFile(fileName).size(). Also, I can easily find current position in uncompressed stream, by keeping sum of return values of gzip.read(). But these two numbers do not match.
I can alter the QuaZIP library, and access internal zlib-related stuff, if it helps.
There is no reliable way to determine total size of uncompressed stream. See this answer for details and possible workarounds.
However, there is a way to get position in compressed stream:
QFile file(fileName);
file.open(QFile::ReadOnly);
QuaGzipFile gzip;
gzip.open(file.handle(), QuaGzipFile::ReadOnly);
while(true) {
QByteArray buf = gzip.read(1000);
//process buf
if (buf.isEmpty()) { break; }
QFile temp_file_object;
temp_file_object.open(file.handle(), QFile::ReadOnly);
double progress = 100.0 * temp_file_object.pos() / file.size();
qDebug() << qRound(progress) << "%";
}
The idea is to open file manually and use file descriptor to get position. QFile cannot track external position changes, so file.pos() will be always 0. So we create temp_file_object from the file descriptor forcing QFile to request file position. I could use some lower level API (such as lseek()) to get file position but I think my way is more cross-platform.
Note that this method is not very accurate and can give progress values bigger than real. That's because zlib can internally read and decode more data than you have already read.
In zlib 1.2.4 and greater you can use the gzoffset() function to get the current position in the compressed file. The current version of zlib is 1.2.8.
Using an ugly hack to zlib, I was able to find position in compressed stream.
First, I copied definition of gz_stream from gzio.c (from zlib-1.2.3.4 source), to the end of quagzipfile.cpp. Then I reimplemented the virtual function qint64 QIODevice::pos() const:
qint64 QuaGzipFile::pos() const
{
gz_stream *s = (gz_stream *)d->gzd;
return ftello64(s->file);
}
Since quagzipfile.cpp and quagzipfile.h seem to be independent from other QuaZIP library files, maybe it is better to copy the functionality I need from these files and avoid this hack?
The current version of program is something like this:
QFile infile(fileName);
if (!infile.open(QIODevice::ReadOnly))
return;
qint64 fileSize = infile.size;
infile.close();
QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
return;
qint64 nread;
char buffer[bufferSize];
while ((nread = gzip.read(&buffer, bufferSize)) > 0)
{
// use buffer
int percent = 100.0 * gzip.pos() / fileSize;
// report percent
}
gzip.close();

is there a way to fopen a file that allows me to edit just a few bytes?

I am writing a class that compresses binary data using a zlib stream. I have a buffer that I fill with the output stream and once it becomes full I dump the buffer out to a file using fopen(filename, 'ab');... What this means is that my program only opens up the file to write to it whenever it has a buffer full of data to dump, it goes and does it and immediately closes it.
The issue is in my format I use an 8 byte header at the beginning of each file which contains the original length and compressed length but I do not know these values until the end of the whole compression process.
What I wanted to do was write 8 bytes of zeros, then append with all my compressed data, then come back at the end during cleanup to fill in those 8 bytes with the size data, but I can't seem to find a way to open the file without bringing it all back into memory. I just want to edit the first 8 bytes of the file. Do I need to use mmap?
Since you're using the file in append mode, you do need to close and re-open it:
open with fopen(filename, "r+b");
write the 8 bytes;
close the file using fclose().
The r+ means
Open for reading and writing. The stream is positioned at the
beginning of the file.
and the b is needed to open in binary mode.
You can use this method to change the data at any position in the file, not just at the beginning: simply use fseek() to seek to the required position before writing.
Use rewind() to take the file pointer back to the start of the file after you write out the last few bytes of data. You can then output your 8 bytes of length info.
If you have flexibility in changing your format, I might suggest this. Define your compressed stream such that it is a sequence of an unknown number of blocks, and each block is preceded by a fixed length integer specifying the number of bytes in the block. The stream is finished when the next block has a size of zero.
The drawback to this format is that there no way for the reader of the stream to know how much data is coming until it's all been read. But the advantage is that it avoids this problem you are trying to solve.
More importantly, it allows you to send a compressed stream of data somewhere as you read the input and you don't have to save it all before sending it. For example, you could write a compression Unix filter that you could put in a pipe stream:
prog1 | yourprog -compress | rsh host yourprog -expand | prog2
Good luck.

How to get boost::iostream to operate in a mode comparable to std::ios::binary?

I have the following question on boost::iostreams. If someone is familiar with writing filters, I would actually appreciate your advices / help.
I am writing a pair of multichar filters, that work with boost::iostream::filtering_stream as data compressor and decompressor.
I started from writing a compressor, picked up some algorithm from lz-family and now am working on a decompressor.
In a couple of words, my compressor splits data into packets, which are encoded separately and then flushed to my file.
When I have to restore data from my file (in programming terms, receive a read(byte_count) request), I have to read a full packed block, bufferize it, unpack it and only then give the requested number of bytes. I've implemented this logic, but right now I'm struggling with the following problem:
When my data is packed, any symbols can appear in the output file. And I have troubles when reading file, which contains symbol (hex 1A, char 26) using boost::iostreams::read(...., size).
If I was using std::ifstream, for example, I would have set a std::ios::binary mode and then this symbol could be read simply.
Any way to achieve the same when implementing a boost::iostream filter which uses boost::iostream::read routine to read char sequence?
Some code here:
// Compression
// -----------
filtering_ostream out;
out.push(my_compressor());
out.push(file_sink("file.out"));
// Compress the 'file.in' to 'file.out'
std::ifstream stream("file.in");
out << stream.rdbuf();
// Decompression
// -------------
filtering_istream in;
in.push(my_decompressor());
in.push(file_source("file.out"));
std::string res;
while (in) {
std::string t;
// My decompressor wants to retrieve the full block from input (say, 4096 bytes)
// but instead retrieves 150 bytes because meets '1A' char in the char sequence
// That obviously happens because file should be read as a binary one, but
// how do I state that?
std::getline(in, t); // <--------- The error happens here
res += t;
}
Short answer for reading file as binary :
specify ios_base::binary when opening file stream.
MSDN Link