transforming c++ array to matlab big matrix using file

transforming c++ array to matlab big matrix using file - c++

In my code i'm changing my array (int*) and then I want to compare it into the matlab results.
since my array is big 1200 X 1000 element. this takes forever to load it into matlab
i'm trying to copy the printed output file into matlab command line...
for (int i = 0; i < _roiY1; i++)
{
for (int j = 0; j < newWidth; j++)
{
channel_gr[i*newWidth + j] = clipLevel;
}
}
ofstream myfile;
myfile.open("C:\\Users\\gdarmon\\Desktop\\OpenCVcliptop.txt");
for (int i = 0; i < newHeight ; i++)
{
for (int j = 0; j < newWidth; j++)
{
myfile << channel_gr[i * newWidth + j] << ", ";
}
myfile<<";" <<endl;
}
is there a faster way to create a readable matrix data from c++? into matlab?

The simplest answer is that it's much quicker to transfer the data in binary form, rather than - as suggested in the question - rendering to text and having Matlab parse it back to binary. You can achieve this by using fwrite() at the C/C++ end, and fread() at the Matlab end.
int* my_data = ...;
int my_data_count = ...;
FILE* fid = fopen('my_data_file', 'wb');
fwrite((void*)my_data, sizeof(int), my_data_count, fid);
fclose(fid);
In Matlab:
fid = fopen('my_data_file', 'r');
my_data = fread(fid, inf, '*int32');
fclose(fid);
It's maybe worth noting that you can call C/C++ functions from within Matlab, so depending on what you are doing that may be an easier architecture (look up "mex files").

Don't write the output as text.
Write your matrix into your output file the way Matlab likes to read: big array of binary.
ofstream myfile;
myfile.open("C:\\Users\\gdarmon\\Desktop\\OpenCVcliptop.txt", ofstream::app::binary);
myfile.write((char*) channel_gr, newHeight*newWidth*sizeof(channel_gr[0]));
You may want to play some games on output to get the array ordered column-row rather than row-column because of the way matlab likes to see data. I remember orders of magnitude improvements in performance when writing mex file plug-ins for file readers, but it's been a while since I've done it.

Related

Performance bottleneck in writing a large matrix of doubles to a file

My program opens a file which contains 100,000 numbers and parses them out into a 10,000 x 10 array correlating to 10,000 sets of 10 physical parameters. The program then iterates through each row of the array, performing overlap calculations between that row and every other row in the array.
The process is quite simple, and being new to c++, I programmed it the most straightforward way that I could think of. However, I know that I'm not doing this in the most optimal way possible, which is something that I would love to do, as the program is going to face off against my cohort's identical program, coded in Fortran, in a "race".
I have a feeling that I am going to need to implement multithreading to accomplish my goal of speeding up the program, but not only am I new to c++, I am new to multithreading, so I'm not sure how I should go about creating new threads in a beneficial way, or if it is even something that would give me that much "gain on investment" so to speak.
The program has the potential to be run on a machine with over 50 cores, but because the program is so simple, I'm not convinced that more threads is necessarily better. I think that if I implement two threads to compute the complex parameters of the two gaussians, one thread to compute the overlap between the gaussians, and one thread that is dedicated to writing to the file, I could speed up the program significantly, but I could also be wrong.
CODE:
cout << "Working...\n";
double **gaussian_array;
gaussian_array = (double **)malloc(N*sizeof(double *));
for(int i = 0; i < N; i++){
gaussian_array[i] = (double *)malloc(10*sizeof(double));
}
fstream gaussians;
gaussians.open("GaussParams", ios::in);
if (!gaussians){
cout << "File not found.";
}
else {
//generate the array of gaussians -> [10000][10]
int i = 0;
while(i < N) {
char ch;
string strNums;
string Num;
string strtab[10];
int j = 0;
getline(gaussians, strNums);
stringstream gaussian(strNums);
while(gaussian >> ch) {
if(ch != ',') {
Num += ch;
strtab[j] = Num;
}
else {
Num = "";
j += 1;
}
}
for(int c = 0; c < 10; c++) {
stringstream dbl(strtab[c]);
dbl >> gaussian_array[i][c];
}
i += 1;
}
}
gaussians.close();
//Below is the process to generate the overlap file between all gaussians:
string buffer;
ofstream overlaps;
overlaps.open("OverlapMatrix", ios::trunc);
overlaps.precision(15);
for(int i = 0; i < N; i++) {
for(int j = 0 ; j < N; j++){
double r1[6][2];
double r2[6][2];
double ol[2];
//compute complex parameters from the two gaussians
compute_params(gaussian_array[i], r1);
compute_params(gaussian_array[j], r2);
//compute overlap between the gaussians using the complex parameters
compute_overlap(r1, r2, ol);
//write to file
overlaps << ol[0] << "," << ol[1];
if(j < N - 1)
overlaps << " ";
else
overlaps << "\n";
}
}
overlaps.close();
return 0;
Any suggestions are greatly appreciated. Thanks!

what compression algorithm to use for highly redundant data

This program uses sockets to transfer highly redundant 2D byte arrays (image like). While the transfer rate is comparatively high (10 Mbps), the arrays are also highly redundant (e.g. Each row may contain several consequently similar values).
I have tried zlib and lz4 and the results were promising, however I still think of a better compression method and please remember that it should be relatively fast as in lz4. Any suggestions?

You should look at the PNG algorithms for filtering image data before compressing. They are simple to more sophisticated methods for predicting values in a 2D array based on previous values. To the extent that the predictions are good, the filtering can make for dramatic improvements in the subsequent compression step.
You should simply try these filters on your data, and then feed it to lz4.

you could create your own, if the data in rows is similar you can create a resource / index map thus reducing substantial the size, something like this
Original file:
row 1: 1212, 34,45,1212,45,34,56,45,56
row 2: 34,45,1212,78,54,87,....
you could create a list of unique values, than use and index in replacement,
34,45,54,56,78,87,1212
row 1: 6,0,2,6,1,0,.....
this can potantialy save you over 30% or more data transfer, but it depends on how redundant the data is
UPDATE
Here a simple implementation
std::set<int> uniqueValues
DataTable my2dData; //assuming 2d vector implementation
std::string indexMap;
std::string fileCompressed = "";
int Find(int value){
for(int i = 0; i < uniqueValues.size; ++i){
if(uniqueValues[i] == value) return i;
}
return -1;
}
//create list of unique values
for(int i = 0; i < my2dData.size; ++i){
for(int j = 0; j < my2dData[i].size; ++j){
uniqueValues.insert(my2dData[i][j]);
}
}
//create indexes
for(int i = 0; i < my2dData.size; ++i){
std::string tmpRow = "";
for(int j = 0; j < my2dData[i].size; ++j){
if(tmpRow == ""){
tmpRow = Find(my2dData[i][j]);
}
else{
tmpRow += "," + Find(my2dData[i][j]);
}
}
tmpRow += "\n\r";
indexMap += tmpRow;
}
//create file to transfer
for(int k = 0; k < uniqueValues.size; ++k){
if(fileCompressed == ""){
fileCompressed = "i: " + uniqueValues[k];
}
else{
fileCompressed += "," + uniqueValues[k];
}
}
fileCompressed += "\n\r\d:" + indexMap;
now on the receiving end you just do the opposite, if the line start with "i" you get the index, if it start with "d" you get the data

Reading .bmp(24 bit) into a 2D array

I'm a complete beginner to this.I'll try to explain myself as much as i can.
int i, j;
string filename;
cout << "Please enter the file name: " << endl;
cin >> filename;
fstream stream;
stream.open(filename.c_str(),
ios::in|ios::out|ios::binary);
int file_size = get_int(stream, 2);
int start = get_int(stream, 10);
int width = get_int(stream, 18);
int height = get_int(stream, 22);
This part should get the file and it's values.
for ( i = 0; i < height; i++ )
{
for ( j = 0; j < width; j++)
{
for (int k = 0; k < split*split; k++){
int pos = stream.tellg();
int blue = stream.get();
int green = stream.get();
int red = stream.get();
And this reaches inside each pixel and gets RBG values.
What i want is to first store RBG values into a 2D array and then do some manipulations on that array.Then i'd like to create a new file with manipulated image.
I've no clue so some instructions along with some code would be really helpfull.

Bmp file format is documented in many places. For example, on wikipedia.
The easiest way would be to implement structure that describes bmp header, and read entire structure in one go, then read individual pixels.
Your reading function is broken and doesn't function because you did not read file signature - "BM" field of the header.
On some operating system all there are already strcutures and functions for reading BMPs. On windows, there's BITMAPFILEHEADER. Using those structures means you aren't using "pure C++".
If you still want to read BMP yourself, read msdn articles aboud bmp or google for "read bmp file" tutorials.

This library is very easy to use http://easybmp.sourceforge.net/. U can easily check RGB values after loading the file.

Reading and writing to a file in c++

I am trying to write a triple vector to a file and then be able to read back into the data structure afterward. When I try to read the file back after its been saved the first fifty values come out correct but the rest of the values are garbage. I'd be really appreciative if someone could help me out here. Thanks a lot!
File declaration:
fstream memory_file("C:\\Users\\Amichai\\Pictures\\output.txt", ios::in | ios::out);
Save function:
void save_training_data(fstream &memory_file, vector<vector<vector<long double> > > &training_data)
{
int sizeI = training_data.size();
memory_file.write((const char *)&sizeI, sizeof(int));
for (int i=0; i < sizeI; i++)
{
int sizeJ = training_data[i].size();
memory_file.write((const char *)&sizeJ, sizeof(int));
for (int j=0; j < sizeJ; j++)
{
int sizeK = training_data[i][j].size();
memory_file.write((const char *)&sizeK, sizeof(int));
for (int k = 0; k < sizeK; k++)
{
int temp;
temp = (int)training_data[i][j][k];
memory_file.write((const char *)&temp, sizeof(int));
}
}
}
}
Read function:
void upload_memory(fstream &memory_file, vector<vector<vector<long double> > > &training_data)
{
memory_file.seekg(ios::beg);
int temp=0;
int sizeK, sizeJ, sizeI;
memory_file.read((char*)&sizeI, sizeof(int));
training_data.resize(sizeI);
for (int i=0; i < sizeI; i++)
{
memory_file.read((char*)&sizeJ, sizeof(int));
training_data[i].resize(sizeJ);
for (int j=0; j < sizeJ; j++)
{
memory_file.read((char*)&sizeK, sizeof(int));
training_data[i][j].resize(sizeK);
for (int k = 0; k < sizeK; k++)
{
memory_file.read((char*)&temp, sizeof(int));
training_data[i][j][k]=temp;
}
}
}
}

Since you're writing binary data (and apparently working under Windows) you really need to specify ios::binary when you open the fstream.

The problem is that you're writing the numerical values in binary form to a file interpreted as text by the stream processor. Either use a binary file (using ios::binary) or convert the numbers to strings before writing to file.

Check out the Boost.Serialization library at www.booost.org. It knows how to read and write STL collections to/from files. I don't know if it can handle nested containers, though.
You may also want to use Boost.Multiarray for your 3-dimensional data. If you're going to do matrix math on your data, then you might want to use Boost.uBlas.

As the other answers suggest using "ios::in | ios::out | ios::binary" instead of "ios::in | ios::out" which is correct, however I remember reading that the C++ stream specification while having the binary option was not designed for binary files at all. If using "ios::binary" doesn't help you would need to use the C function fopen(), fread(), fwrite(), and fclose() of stdio.h instead or, as another user suggests, the Boost::Serilization library.

Faster way to create tab deliminated text files?

Many of my programs output huge volumes of data for me to review on Excel. The best way to view all these files is to use a tab deliminated text format. Currently i use this chunk of code to get it done:
ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
for (int i = 0; i < dim; i++)
output << arrayPointer[j * dim + i] << " ";
output << endl;
}
This seems to be a very slow operation, is a more efficient way of outputting text files like this to the hard drive?
Update:
Taking the two suggestions into mind, the new code is this:
ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
for (int i = 0; i < dim; i++)
output << arrayPointer[j * dim + i] << "\t";
output << "\n";
}
output.close();
writes to HD at 500KB/s
But this writes to HD at 50MB/s
{
output.open(fileName.c_str(), std::ios::binary | std::ios::out);
output.write(reinterpret_cast<char*>(arrayPointer), std::streamsize(dim * dim * sizeof(double)));
output.close();
}

Use C IO, it's a lot faster than C++ IO. I've heard of people in programming contests timing out purely because they used C++ IO and not C IO.
#include <cstdio>
FILE* fout = fopen(fileName.c_str(), "w");
for (int j = 0; j < dim; j++)
{
for (int i = 0; i < dim; i++)
fprintf(fout, "%d\t", arrayPointer[j * dim + i]);
fprintf(fout, "\n");
}
fclose(fout);
Just change %d to be the correct type.

Don't use endl. It will be flushing the stream buffers, which is potentially very inefficient. Instead:
output << '\n';

I decided to test JPvdMerwe's claim that C stdio is faster than C++ IO streams. (Spoiler: yes, but not necessarily by much.) To do this, I used the following test programs:
Common wrapper code, omitted from programs below:
#include <iostream>
#include <cstdio>
int main (void) {
// program code goes here
}
Program 1: normal synchronized C++ IO streams
for (int j = 0; j < ROWS; j++) {
for (int i = 0; i < COLS; i++) {
std::cout << (i-j) << "\t";
}
std::cout << "\n";
}
Program 2: unsynchronized C++ IO streams
Same as program 1, except with std::cout.sync_with_stdio(false); prepended.
Program 3: C stdio printf()
for (int j = 0; j < ROWS; j++) {
for (int i = 0; i < COLS; i++) {
printf("%d\t", i-j);
}
printf("\n");
}
All programs were compiled with GCC 4.8.4 on Ubuntu Linux, using the following command:
g++ -Wall -ansi -pedantic -DROWS=10000 -DCOLS=1000 prog.cpp -o prog
and timed using the command:
time ./prog > /dev/null
Here are the results of the test on my laptop (measured in wall clock time):
Program 1 (synchronized C++ IO): 3.350s (= 100%)
Program 2 (unsynchronized C++ IO): 3.072s (= 92%)
Program 3 (C stdio): 2.592s (= 77%)
I also ran the same test with g++ -O2 to test the effect of optimization, and got the following results:
Program 1 (synchronized C++ IO) with -O2: 3.118s (= 100%)
Program 2 (unsynchronized C++ IO) with -O2: 2.943s (= 94%)
Program 3 (C stdio) with -O2: 2.734s (= 88%)
(The last line is not a fluke; program 3 consistently runs slower for me with -O2 than without it!)
Thus, my conclusion is that, based on this test, C stdio is indeed about 10% to 25% faster for this task than (synchronized) C++ IO. Using unsynchronized C++ IO saves about 5% to 10% over synchronized IO, but is still slower than stdio.
Ps. I tried a few other variations, too:
Using std::endl instead of "\n" is, as expected, slightly slower, but the difference is less than 5% for the parameter values given above. However, printing more but shorter output lines (e.g. -DROWS=1000000 -DCOLS=10) makes std::endl more than 30% slower than "\n".
Piping the output to a normal file instead of /dev/null slows down all the programs by about 0.2s, but makes no qualitative difference to the results.
Increasing the line count by a factor of 10 also yields no surprises; the programs all take about 10 times longer to run, as expected.
Prepending std::cout.sync_with_stdio(false); to program 3 has no noticeable effect.
Using (double)(i-j) (and "%g\t" for printf()) slows down all three programs a lot! Notably, program 3 is still fastest, taking only 9.3s where programs 1 and 2 each took a bit over 14s, a speedup of nearly 40%! (And yes, I checked, the outputs are identical.) Using -O2 makes no significant difference either.

does it have to be written in C? if not, there are many tools already written in C, eg (g)awk (can be used in unix/windows) that does the job of file parsing really well, also on big files.
awk '{$1=$1}1' OFS="\t" file

It may be faster to do it this way:
ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
for (int i = 0; i < dim; i++)
output << arrayPointer[j * dim + i] << '\t';
output << '\n';
}

ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
for (int i = 0; i < dim; i++)
output << arrayPointer[j * dim + i] << '\t';
output << endl;
}
Use '\t' instead of " "

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

transforming c++ array to matlab big matrix using file - c++

Related

Performance bottleneck in writing a large matrix of doubles to a file

what compression algorithm to use for highly redundant data

Reading .bmp(24 bit) into a 2D array

Reading and writing to a file in c++

Faster way to create tab deliminated text files?

Categories

Resources