how to load matlab matrix with armadillo? - c++

I know matlab matrix can be loaded into C++ program in some ways, while none of these ways seem to be efficient or convenient.
I have seen others modified the header of the '.mat' file, then it can be directly loaded into C++ program with armadillo.
Anyone has ideas how to modify the header file?
It's not just save the matlab '.mat' file into ascii format. The loading time and storage space is larger than binary format.
To store a 2GB binary mat file, I need at least 20GB to store it in ASCII format.
Loading 100MB binary mat file takes less than 1 second, load same size ASCII text data takes much longer.
I don't think save the matlab mat file into ASCII format and load it into armadillo is a good solution.

According to the Armadillo documentation:
file_type can be one of the following:
...
raw_ascii:
Numerical data stored in raw ASCII format, without a header. The numbers are separated by whitespace. The number of columns must be the same in each row. Cubes are loaded as one slice. Data which was saved in Matlab/Octave using the -ascii option can be read in Armadillo, except for complex numbers. Complex numbers are stored in standard C++ notation, which is a tuple surrounded by brackets: eg. (1.23,4.56) indicates 1.24 + 4.56i.
You should therefore be able to load a Matlab matrix written in text format, contained in a file called "MatlabMatrix.mat", by using the following code:
arma::mat fromMatlab;
fromMatlab.load("MatlabMatrix.mat", arma::raw_ascii);
Also, a related question can be found here.

You can export your data in matlab in low level binary format and then load it in armadillo with the arma::raw_binary option.
e.g. in MATLAB:
m=10;
A = randn(m,m);
name = 'test.bin'
[F,err] = fopen(name,'w');
if F<0,error(err);end
fwrite(F,A,'double');
fclose(F);
load with armadillo:
arma::mat A;
std::string name = "test.bin";
A.load(name,arma::raw_binary);
A.print("A");
The only thing is that you lose the matrix dimensions of the original matrix, as armadillo loads it in a vectorized form, so you have to reshape it per hand after loading.
To include matrix dimensions you can mimic the armadillo header when saving in matlab and then use the arma::arma_binary option when loading. If you are interested in that option I can also tell you how to do it.

Related

Extracting MDCT coefficients for steganalysis?

MP3Stego (http://www.petitcolas.net/steganography/mp3stego/) hides data within MP3 files during the inner_loop and uses part2_3_length to modify bits.
I'm wondering whether it would be worth extracting MDCT coefficients and examining them as a histogram to compare it with an MP3 file with no hidden data in it. However, from the encoding process of MP3s MDCT happens before the inner_loop.
Would there be any use of extracting the coefficients if this is the case? If so, would the best way of doing this just to print out data to file during the encoding process?

best format to save arrays (cubes) in armadillo to load later in R

I'm running a program in armadillo and save a cube object (equivalent to a 3-dimensional array in R) of doubles using the command mycube.save("mycube", arma_ascii). However I have not been able to load it properly in R.
What do you think would be the best format to use in order to load it in R?
A while back i stored matrices from a c++ program with:
m.save( "myMatrix.data" ,raw_ascii)
and read it in an R script with:
m <- as.matrix(read.table("myMatrix.data"))
This worked quite well. However, I'm not sure about saving cubes - you may need to split it in slices and re-assemble it in R.
We currently seem to have "half" of the needed support: only a wrap() method to return Cube objects to R.
So if someone were to contribute a working as<>() converter, we could (trivially) rely on R's (nice, binary, compressed, ...) serialization via e.g. the saveRDS() and loadRDS() functions.
It seems that the format raw_ascii can do the trick. It is not exactly an array of 3 dimensions but a matrix concatenating (row-wise) all the individual matrices which could then be manipulated.

How to read image data from .cr2 in C++?

How to read image data from .cr2 (raw image format by Canon) in C++?
The only one operation I need to perform is to read pixel data of .cr2 file directly if it is possible, otherwise I would like to convert it to any loss-less image and read its pixels' data.
Any suggestions?
I would go with ImageMagick too. You don't have to convert all your files up front, you can do them one at a time as you need them.
In your program, rather than opening the CR2 file, just open a pipe (popen() call) that is executing an ImageMagick command like
convert file.cr2 ppm:-
then you can read the extremely simple PPM format which is described here - basically just a line of ASCII text that tells you the file type, then another line of ASCII text that tells you the image dimensions, followed by a max value and then the data in binary.
Later on you can actually use the ImageMagick library and API if you need to.

Using OpenCV's KNearest Neighbour - OpenCV; C++

I'm want to use OpenCV's KNN algorithm to classify 4 features into one of two classes. In a text file, I have my training data in the following format:
feature_1,feature_2,feature_3,feature_4,class
where feature_1, feature_3, feature_4 and class are integers and feature_2 is of type float. The first line of the text file contains the headings for each feature.
However, the OpenCV documentation (http://docs.opencv.org/modules/ml/doc/k_nearest_neighbors.html) states that the train function requires the training data in the Mat data structure.
I'm confused as to how I can convert my text file of training data, to a Mat. If anyone can help me out with this I would really appreciate it.
Basicly, OpenCV implements CvMLData which can read csv files (and your file is a comma separated file).
according to documentation: http://docs.opencv.org/modules/ml/doc/mldata.html
Once you create an CvMLData object, you can use read_csv method:
read_csv(const char* filename)
to load it, and then use get_values() to get pointer to the input data as Mat and get_responses() to get the pointer to the labels as Mat
To set which column is considered as "response" (label) use the set_response_idx method

Parallelization of PNG file creation with C++, libpng and OpenMP

I am currently trying to implement a PNG encoder in C++ based on libpng that uses OpenMP to speed up the compression process.
The tool is already able to generate PNG files from various image formats.
I uploaded the complete source code to pastebin.com so you can see what I have done so far: http://pastebin.com/8wiFzcgV
So far, so good! Now, my problem is to find a way how to parallelize the generation of the IDAT chunks containing the compressed image data. Usually, the libpng function png_write_row gets called in a for-loop with a pointer to the struct that contains all the information about the PNG file and a row pointer with the pixel data of a single image row.
(Line 114-117 in the Pastebin file)
//Loop through image
for (i = 0, rp = info_ptr->row_pointers; i < png_ptr->height; i++, rp++) {
png_write_row(png_ptr, *rp);
}
Libpng then compresses one row after another and fills an internal buffer with the compressed data. As soon as the buffer is full, the compressed data gets flushed in a IDAT chunk to the image file.
My approach was to split the image into multiple parts and let one thread compress row 1 to 10 and another thread 11 to 20 and so on. But as libpng is using an internal buffer it is not as easy as I thought first :) I somehow have to make libpng write the compressed data to a separate buffer for every thread. Afterwards I need a way to concatenate the buffers in the right order so I can write them all together to the output image file.
So, does someone have an idea how I can do this with OpenMP and some tweaking to libpng? Thank you very much!
This is too long for a comment but is not really an answer either--
I'm not sure you can do this without modifying libpng (or writing your own encoder). In any case, it will help if you understand how PNG compression is implemented:
At the high level, the image is a set of rows of pixels (generally 32-bit values representing RGBA tuples).
Each row can independently have a filter applied to it -- the filter's sole purpose is to make the row more "compressible". For example, the "sub" filter makes each pixel's value the difference between it and the one to its left. This delta encoding might seem silly at first glance, but if the colours between adjacent pixels are similar (which tends to be the case) then the resulting values are very small regardless of the actual colours they represent. It's easier to compress such data because it's much more repetitive.
Going down a level, the image data can be seen as a stream of bytes (rows are no longer distinguished from each other). These bytes are compressed, yielding another stream of bytes. The compressed data is arbitrarily broken up into segments (anywhere you want!) written to one IDAT chunk each (along with a little bookkeeping overhead per chunk, including a CRC checksum).
The lowest level brings us to the interesting part, which is the compression step itself. The PNG format uses the zlib compressed data format. zlib itself is just a wrapper (with more bookkeeping, including an Adler-32 checksum) around the real compressed data format, deflate (zip files use this too). deflate supports two compression techniques: Huffman coding (which reduces the number of bits required to represent some byte-string to the optimal number given the frequency that each different byte occurs in the string), and LZ77 encoding (which lets duplicate strings that have already occurred be referenced instead of written to the output twice).
The tricky part about parallelizing deflate compression is that in general, compressing one part of the input stream requires that the previous part also be available in case it needs to be referenced. But, just like PNGs can have multiple IDAT chunks, deflate is broken up into multiple "blocks". Data in one block can reference previously encoded data in another block, but it doesn't have to (of course, it may affect the compression ratio if it doesn't).
So, a general strategy for parallelizing deflate would be to break the input into multiple large sections (so that the compression ratio stays high), compress each section into a series of blocks, then glue the blocks together (this is actually tricky since blocks don't always end on a byte boundary -- but you can put an empty non-compressed block (type 00), which will align to a byte boundary, in-between sections). This isn't trivial, however, and requires control over the very lowest level of compression (creating deflate blocks manually), creating the proper zlib wrapper spanning all the blocks, and stuffing all this into IDAT chunks.
If you want to go with your own implementation, I'd suggest reading my own zlib/deflate implementation (and how I use it) which I expressly created for compressing PNGs (it's written in Haxe for Flash but should be comparatively easy to port to C++). Since Flash is single-threaded, I don't do any parallelization, but I do split the encoding up into virtually independent sections ("virtually" because there's the fractional-byte state preserved between sections) over multiple frames, which amounts to largely the same thing.
Good luck!
I finally got it to parallelize the compression process.
As mentioned by Cameron in the comment to his answer I had to strip the zlib header from the zstreams to combine them. Stripping the footer was not required as zlib offers an option called Z_SYNC_FLUSH which can be used for all chunks (except the last one which has to be written with Z_FINISH) to write to a byte boundary. So you can simply concatenate the stream outputs afterwards. Eventually, the adler32 checksum has to be calculated over all threads and copied to the end of the combined zstreams.
If you are interested in the result you can find the complete proof of concept at https://github.com/anvio/png-parallel