Transferring Matlab variables to C - c++

I have a very large data structure in some Matlab code that is in the form of cells of arrays. We want to develop C code to work on this data, but I need some way to store the Matlab variable (which we generate in Matlab) and open it in a C/C++ program. What is the easiest way to bridge the two programs so I can transfer the data?

If you are only moving the data from MATLAB to C occassionally, the easiest thing would be to write it to a binary file, then read from the file in C. This of course leaves the C code completely independent of MATLAB.
This does not have to be that messy if your data structure is just a cell array of regular arrays, e.g.
a{1} = zeros(1,5);
a{2} = zeros(1,4);
You could just write a header for each cell, followed by the data to the file. In the above case, that would be:
[length{1} data{1} length{2} data{2}]
In the above case:
5 0 0 0 0 0 4 0 0 0 0
If the arrays are 2D, you can extend this by writing: row, column, then the data in row-major order for each cell.
This might not be entirely convenient, but it should be simple enough. You could also save it as a .mat file and read that, but I would not recommend that. It is much easier to put it in a binary format in MATLAB.
If you need to move the data more frequently than is convenient for a file, there are other options, but all I can think of are tied to MATLAB in some way.

You should use mex files:
http://www.mathworks.fr/support/tech-notes/1600/1605.html

If the two processes need to connect during their lifecycle, you have plenty of options:
Compile Matlab DLL.
Use Matlab Engine.
Compile MEX file (as #Oli mentioned earlier)
If the communication is offline (After Matlab closes, C++ starts to read), then you should use filesystem. Try to format it in XML, it is a well recognized standard.

Related

Use of memory in c++

I'm not sure how to pose this question, but here it goes:
When programming on my Atmel MCU's in c++ I tend to mix the 'program'-variables and the 'user'-variables in the same datamemory. Which in time is a hassle because I want to make a few presets that can be loaded or saved. And I do not want the 'program'-variables saved because the program will generate the correct values based on the 'user'-values. Is it common practice to split that in memoryplaces? Eg. timercounter in PGM-Memory, thresholdByUser in DATA-memory?
In my program i've made several different functions which all have their own set of uservariables.
Eg: settings has 5 uservariables, generator has 6 uservariables etc...
Would you make 1 big array and then make #define settingsgeneratorSpeed 1, #define settingsBacklight 2 as places, so you could call them as such: Array[generatorSpeed], Array[settingsBacklight] or would you still split it up and collect them by using a struct orso?
Working on atmelstudio 4.0 with a ATMEGA644 on STK500.
Thanks for all the help you can give!
Assuming you are using AT(X)Mega, when referring to Atmel MCU's: IIRC it depends which compiler suite you are using. With gcc, if you have something like a static int, it will go to PGM and it copied to RAM when the program runs. Hence, if you want your variables not to be in PGM memory, you must make them stack or heap variables. Constants and statics will always reside in both. If you wan't to have PGM constants only, you can specifiy that, but this requires special read operations.
For question 2, I'd use const int& settingX = array[Xoffset] instead of a define. But that assumes there's some need to iterate though the array, else I'd just define separate variables.

Writing MPI result to a file

I have some code which solves an all-pars shortest path problem and each processor has a piece of the result. I am trying to write this result, which is a martix to an output file. So each process, which has part of the solution, will write the result to an output file in the correct position. Now i am trying to use fseek for this but am a little stuck because of the different sized integers. Like 2 and -199 will have to take more space. How can I do it so that the processors do not overwrite eachother? Also there might be race conditions for the writing.
Should i do this another way or is there a way to accomplish this? I was thinking of sending all result to one process (rank 0) and have that create the array and write the the file.
Don't use ASCII output; use binary, which is well defined in size.
So if you're using fstream and doubles:
fstream filewriter("file.bin",ios::out | ios::binary);
vector<double> mylist;
mylist.push_back(2.5);
mylist.push_back(7.6);
mylist.push_back(2.1);
mylist.push_back(3.2);
mylist.push_back(4.2);
filewriter.write((char*)&mylist[0],mylist.size()*sizeof(double));
This will write exactly 40 bytes, which is the size of double (8) times the size of your list (5 elements). And using fseek will be very easy.
In scientific environment when having a huge output it's extremely recommended to use binary data. However:
1- You have to learn about the concept of endianness (big endian, little endian).
2- You have to document your work proporly for reuse (purpose, size, number of element, dimensionality). I face huge misunderstandings when I forget to document stuff (I'm a PhD physicist who programs simulations).
So ASCII for data analysis is not the right choice.
Luckily, there's a full library specialized in organizing stuff for you, called HDF5. It organizes endianness and portability for you; however, it's not easy to deal with it, and it has a steep learning curve. I think that's a harder story for later times.
What I would recommend, is that you learn how to deal with binary files and how to read them, understand their issues and problems. I think that you're professional enough to deal with binary files, since you use MPI.
Here's a quick tutorial to binary files:
http://courses.cs.vt.edu/cs2604/fall02/binio.html
Cheers.
You could have each process write the output in some format that can be merged and cleaned up after the last one is done. Like (x, y, z), (x, y, z)...where x is the index of the row, y is the column and z the value.
This is a good job for memory-mapped files. They are system-dependent, but they're implemented in both POSIX and Windows OS families, so if you use a modern OS, they'd work. There is a portable and C++-friendly implementation of them in boost (classes mapped_file_source, mapped_file_sink and mapped_file). Interprocess output is a classical example of their usage.
They are binary, so most of that Samer said in his answer applies, too, the only difference is that you use pointer arithmetic instead of seeking.

How to exchange data between C++ and MATLAB?

For the now being I am developing a C++ program based on some MATLAB codes. During the developing period I need to output the intermediate results to MATLAB in order to compare the C++ implementation result with the MATLAB result. What I am doing now is to write a binary file with C++, and then load the binary file with MATLAB. The following codes show an example:
int main ()
{
ofstream abcdef;
abcdef.open("C:/test.bin",ios::out | ios::trunc | ios::binary);
for (int i=0; i<10; i++)
{
float x_cord;
x_cord = i*1.38;
float y_cord;
y_cord = i*10;
abcdef<<x_cord<<" "<<y_cord<<endl;
}
abcdef.close();
return 0;
}
When I have the file test.bin, I can load the file automatically with MATLAB command:
data = load('test.bin');
This method can work well when numerical data is the output; however, it could fail if the output is a class with many member variables. I was wondering whether there are better ways to do the job not only for simple numerical data but also for complicated data structure. Thanks!
I would suggest the use of MATLAB engine through which you can pass data to MATLAB on real time basis and can even visualize the data using various graph plotting facilities available in MATLAB.
All you have to do is to invoke the MATLAB engine from C/C++ program and then you can easily execute MATLAB commands directly from the C/C++ program and/or exchange data between MATLAB and C/C++. It can be done in both directions i.e. from C++ to MATLAB and vice versa.
You can have a look at a working example for the same as shown here.
I would suggest using the fread command in matlab. I do this all the time for exchanging data between matlab and other programs, for instance:
fd = fopen('datafile.bin','r');
a = fread(fd,3,'*uint32');
b = fread(fd,1,'float32');
With fread you have all the flexibility to read any type of data. By placing a * in the name, as above, you also say that you want to store into that data type instead of the default matlab data type. So the first one reads in 3 32 bit unsigned integers and stores them as integers. The second one reads in a single precision floating point number, but stores it as the default double precision.
You need to control the way that data is written in your c++ code, but that is inevitable. You can make a class method in c++ that packs the data in a deterministic way.
Dustin

A good way to output array values from Python and then take them in through C++?

Due to annoying overflow problems with C++, I want to instead use Python to precompute some values. I have a function f(a,b) that will then spit out a value. I want to be able to output all the values I need based on ranges of a and b into a file, and then in C++ read that file and popular a vector or array or whatever's better.
What is a good format to output f(a,b) in?
What's the best way to read this back into C++?
Vector or multidim array?
You can use Python to write out a .h file that is compatible with C++ source syntax.
h_file.write('{')
for a in range(a_size):
h_file.write('{' + ','.join(str(f(a, b)) for b in range(b_size)) + '},\n')
h_file.write('}')
You will probably want to modify that code to throw some extra newlines in, and in fact I have such code that I can show later (don't have access to it now).
You can use Python to write out C++ source code that contains your data. E.g:
def f(a, b):
# Your function here, e.g:
return pow(a, b, 65537)
num_a_values = 50
num_b_values = 50
# Write source file
with open('data.cpp', 'wt') as cpp_file:
cpp_file.write('/* Automatically generated file, do not hand edit */\n\n')
cpp_file.write('#include "data.hpp"\n')
cpp_file.write('const int f_data[%d][%d] =\n'
% (num_a_values, num_b_values))
cpp_file.write('{\n')
for a in range(num_a_values):
values = [f(a, b) for b in range(num_b_values)]
cpp_file.write(' {' + ','.join(map(str, values)) + '},\n')
cpp_file.write('}\n')
# Write corresponding header file
with open('data.hpp', 'wt') as hpp_file:
hpp_file.write('/* Automatically generated file, do not hand edit */\n\n')
hpp_file.write('#ifndef DATA_HPP_INCLUDED\n')
hpp_file.write('#define DATA_HPP_INCLUDED\n')
hpp_file.write('#define NUM_A_VALUES %d\n' % num_a_values)
hpp_file.write('#define NUM_B_VALUES %d\n' % num_b_values)
hpp_file.write('extern const int f_data[%d][%d];\n'
% (num_a_values, num_b_values))
hpp_file.write('#endif\n')
You then compile the generated source code as part of your project. You can then use it by #including the header and accessing the f_data[] array directly.
This works really well for small to medium size data tables, e.g. icons. For larger data tables (millions of entries) some C compilers will fail, and you may find that the compile/link is unacceptably slow.
If your data is more complicated, you can use this same method to define structures.
[Based on Mark Ransom's answer, but with some style differences and more explanation].
If there is megabytes of data, then I would read the data in by memory mapping the data file, read-only. I would arrange things so I can use the data file directly, without having to read it all in at startup.
The reason for doing it this way is that you don't want to read megabytes of data at startup if you're only going to use some of the values. By using memory mapping, your OS will automatically read just the parts of the file that you need. And if you run low on RAM, your OS can reuse the memory allocated for that file without having to waste time writing it to the swap file.
If the output of your function is a single number, you probably just want an array of ints. You'll probably want a 2D array, e.g.:
#define DATA_SIZE (50 * 25)
typedef const int (*data_table_type)[50];
int fd = open("my_data_file.dat", O_RDONLY);
data_table_type data_table = (data_table_type)mmap(0, DATA_SIZE,
PROT_READ, MAP_SHARED, fd, 0);
printf("f(5, 11) = %d\n", data_table[5][11]);
For more info on memory mapped files, see Wikipedia, or the UNIX mmap() function, or the Windows CreateFileMapping() function.
If you need more complicated data structures, you can put C/C++ structures and arrays into the file. But you can't embed pointers or any C++ class that has a virtual anything.
Once you've decided on how you want to read the data, the next question is how to generate it. struct.pack() is very useful for this - it will allow you to convert Python values into a properly-formatted Python string, which you can then write to a file.

Unexpected "padding" in a Fortran unformatted file

I don't understand the format of unformatted files in Fortran.
For example:
open (3,file=filename,form="unformatted",access="sequential")
write(3) matrix(i,:)
outputs a column of a matrix into a file. I've discovered that it pads the file with 4 bytes on either end, however I don't really understand why, or how to control this behavior. Is there a way to remove the padding?
For unformated IO, Fortran compilers typically write the length of the record at the beginning and end of the record. Most but not all compilers use four bytes. This aids in reading records, e.g., length at the end assists with a backspace operation. You can suppress this with the new Stream IO mode of Fortran 2003, which was added for compatibility with other languages. Use access='stream' in your open statement.
I never used sequential access with unformatted output for this exact reason. However it depends on the application and sometimes it is convenient to have a record length indicator (especially for unstructured data). As suggested by steabert in Looking at binary output from fortran on gnuplot, you can avoid this by using keyword argument ACCESS = 'DIRECT', in which case you need to specify record length. This method is convenient for efficient storage of large multi-dimensional structured data (constant record length). Following example writes an unformatted file whose size equals the size of the array:
REAL(KIND=4),DIMENSION(10) :: a = 3.141
INTEGER :: reclen
INQUIRE(iolength=reclen)a
OPEN(UNIT=10,FILE='direct.out',FORM='UNFORMATTED',&
ACCESS='DIRECT',RECL=reclen)
WRITE(UNIT=10,REC=1)a
CLOSE(UNIT=10)
END
Note that this is not the ideal aproach in sense of portability. In an unformatted file written with direct access, there is no information about the size of each element. A readme text file that describes the data size does the job fine for me, and I prefer this method instead of padding in sequential mode.
Fortran IO is record based, not stream based. Every time you write something through write() you are not only writing the data, but also beginning and end markers for that record. Both record markers are the size of that record. This is the reason why writing a bunch of reals in a single write (one record: one begin marker, the bunch of reals, one end marker) has a different size with respect to writing each real in a separate write (multiple records, each of one begin marker, one real, and one end marker). This is extremely important if you are writing down large matrices, as you could balloon the occupation if improperly written.
Fortran Unformatted IO I am quite familiar with differing outputs using the Intel and Gnu compilers. Fortunately my vast experience dating back to 1970's IBM's allowed me to decode things. Gnu pads records with 4 byte integer counters giving the record length. Intel uses a 1 byte counter and a number of embedded coding values to signify a continuation record or the end of a count. One can still have very long record lengths even though only 1 byte is used.
I have software compiled by the Gnu compiler that I had to modify so it could read an unformatted file generated by either compiler, so it has to detect which format it finds. Reading an unformatted file generated by the Intel compiler (which follows the "old' IBM days) takes "forever" using Gnu's fgetc or opening the file in stream mode. Converting the file to what Gnu expects results in a factor of up to 100 times faster. It depends on your file size if you want to bother with detection and conversion or not. I reduced my program startup time (which opens a large unformatted file) from 5 minutes down to 10 seconds. I had to add in options to reconvert back again if the user wants to take the file back to an Intel compiled program. It's all a pain, but there you go.