We are developing a scientific application which has the interface in python 2.7 and the computation routines written in Intel Visual Fortran. Reading the source files is done using python, then only the required data for computations has to be passed to standalone Fortran algorithms. Once the computations done, the data has to be read by python once again.
Using formatted text files seems to be taking too long and not efficient. Further, we would like to have a standard intermediate format. There can be about 20 arrays and those are huge (if written to formatted text, the file is about 500 MB).
Q1. In a similar situation where Python and Fortran data exchange is necessary. What would be recommended way of interaction? (e.g.: writing an intermediate data to be read by the other or calling Fortran from within Python or using numpy to create compatible arrays or etc.)
Q2. If writing intermediate structures is recommended, What format is good for data exchange? (We came across CDF, NETCdf, binary streaming, but didn't try any so far.)
The standard way of wrapping Fortran code in Python is with f2py (included in the numpy module).
For the output of intermediary results, a number of formats could work, it really depends on your requirements.
For simple datasets, from python, just use numpy.save.
If your datasets become large, HDF5 with, for instance, PyTables in Python and libhdf5 in Fortran could be used.
Otherwise, if you don't want to link your code to an external library, custom binary files written from Fortran and parsed with numpy could work too.
I would interface directly between Python and Fortran. It is relatively straightforward to allocate memory using Numpy and pass a pointer through to Fortran. You use iso_c_binding to write C-compatible wrapper routines for your Fortran routines and ctypes to load the Fortran .dll and call the wrappers. If you're interested I can throw together a simple example (but I am busy right this moment).
Related
I want to use my C++ code with Python's multiprocessing so that my C++ code is called from different processes in parallel. The code does not save any state and no memory sharing is required between the different processes. I decided to use Boost/Python to allow my C++ library to be imported into Python.
However, this FAQ says that Boost/Python is not compatible with multiple interpreters. I am trying to understand what this means exactly. Specifically, does this means that calling my C++ code through Boost/Python with multiprocessing would be problematic?
Multiple processes don't require more than 1 interpreter per process.
Also, the way you describe the situation is that you use a native module from Python. In that case Python is supplying the interpreter, anyways.
The way I understand the 1-interpreter limitation applies to embedding python from within C++ - a rather limited subset of the Boost Python features.
I wrote a TCPIP-Socket-Connection with Server and Client in C++, which works quite nice in VisualStudio. Now I want to use the C++ - Client in MATLAB/Simulink through MEX-Files and later in a S-Function.
I found two descriptions about MEX-Files.
C++ MEX File Application Just for C++
C/C++ MEX Files C/C++
Now I am confused, which one would be the one to take. I wrote some easy programms with the second, but always got into problems with datatypes. I think, it is because the given examples and functions are only for C and not for C++.
I appreciate any help! Thank you very much!
The differences:
The C interface described in the second link is much, much older (I used this interface way back in 1998). You can create a MEX-file with this interface and have it run on a large set of different versions of MATLAB. You can use it from C as well as C++ code.
The C++-only interface described in the first link is new in MATLAB R2018a (the C++ classes to manipulate MATLAB arrays were introduced in R2017b, but the ability to write a MEX-file was new in R2018a). MEX-files you write with this interface will not run on prior versions of MATLAB.
Additionally, this interface (finally!) allows for creating shared-data copies, in-place operations, etc. (the stuff we have been asking for for many years, but they didn't want to put into the old C interface because they worried it would be too complex for the average MEX-file writer).
Another change to be aware of:
In R2018a, MATLAB also changed the way that complex arrays are stored in memory. In older versions of MATLAB, the real and imaginary components are stored in separate memory blocks. In R2018a and on, they are stored in the same memory block, in the same fashion as you would likely use in your own code.
This affects MEX-files! If you MEX-file uses complex data, it needs to read and write them in the way that MATLAB stores them. If you run a MEX-file compiled for an older version of MATLAB, or compile a MEX-file using the current default building options in R2018a, a complex array will be copied to the old storage model before being passed to the MEX-file. A new compile option to the mex command, -R2018a, creates MEX-files that pass the data in the new storage model unchanged. But those MEX-files will not be compatible with previous versions of MATLAB.
How to choose?
If you need your MEX-files to run on versions of MATLAB prior to the newest R2018a, use the old C interface, you don't have a choice.
If you want to program in C, use the old C interface.
If you need to use complex data, and don't want to incur the cost of the copy, you need to target R2018a and newer, and R2017b and older, separately. You need to write separate code for these two "platforms". The older versions can only be targeted with the C interface. For the newer versions you can use either interface.
If you appreciate the advantages of modern C++ and would like to take advantage of them, and are targeting only the latest and greatest MATLAB version, then use the new C++ interface. I haven't tried it out yet, but from the documentation it looks to be very well designed.
I am trying to import some protobuf binaries in Matlab. I see 2 ways to do it
1) Use protobuf Matlab plugin
2) Use C++ APIs provided by Google and then import data into matlab using mex files.
Since I am working with large scale data, Which one would be faster to run?
Faster to run is likely to be C++, of those two.
I can tell you definitely that coming up to the Matlab-language level for byte, two-byte, four-byte reads or buffer accesses runs a lot slower than you might expect.
And looking in protobuf-matlab/source/browse/protobuflib, I see tons of that type of operation being done in Matlab code.
You didn't list what I would think to be the best of both worlds: the Java API.
If you have any Java code in your project, or a modest level of comfort with Java, I'd strongly consider using the Java API. Matlab has really good interoperability with Java; it has the big advantage that you can work through examples interactively with methodsview(), etc. There is some learning curve with javaObject()/javaMethod(), vs. when one can use a more native syntax.
I am currently working on a project in C++, and I am actually interested in using Matlab data structures, instead of having to create my own data types (such as matrices, arrays, etc.)
Is there a way to seamlessly use Matlab objects in C++? I don't mind having to run Matlab in the background while my program runs.
EDIT: A starting point is this: http://www.mathworks.co.uk/help/matlab/calling-matlab-engine-from-c-c-and-fortran-programs.html. I will continue reading this.
You can use instead Armadillo C++ maths library; used by NASA, Boeing, Siemens, Deutsche Bank, MIT, CMU, Stanford, etc.
They have good documentation and examples if you are more familiar with MATLAB/OCTAVE
http://arma.sourceforge.net/docs.html#syntax
I would prefer using native C++ library of some sort and not Matlab. This is likely to be faster for both development and execution.
From writing C++ extensions for Matlab I learned one thing: Using Matlab objects in C++ is likely to give you considerable headache.
Matlab data structures are not exposed as C++ classes. Instead, you get pointers that you can manipulate with C-like API functions.
I recommend to use a native C++ library such as Eigen3.
The functionality you are looking at is not really intended to be used as seamless objects. In the past when I have used it I found it much simpler to do the C parts using either native arrays or a third party matrix library and then convert it into a Matlab matrix to return.
Mixing Matlab and C++ is typically done in one of two ways:
Having a C++ program call Matlab to do some specialist processing. This is chiefly useful for rapid development of complex matrix algorithms. You can do this either by calling the full Matlab engine, or by packaging you snippet of Matlab code into a shared library for distribution. (The distributed version packages a distributable copy of the Matlab runtime which is called with your scripts).
Having a Matlab script call a C++ function to do some specialist processing. This is often used to embed C++ implementations of algorithms (such as machine learning models) or to handle specific optimizations.
Both of these use cases have some overhead transferring the data to/from Matlab.
If you are simply looking for some matrix code to use in C++ you would be better off looking into the various C++ matrix libraries, such as the one implemented in Boost.
You can do mixed programming with C++ and Matlab. There are two possible ways:
Call MATLAB Engine directly: Refer to this post for more info. Matlab will run in the background.
Distribute MATLAB into independent shared library: check out here on how to do this (with detail steps and example).
I am creating a NetCDF file with mostly NaN values. Is there a way I can specify it to be compressed rather than taking up large disk space? I am using the University Corporation for Atmospheric Research C++ NetCDF library.
thanks!
Yes, but it depends on which netCDF C++ API you are using, the legacy (netcdf-cxx-4.2) C++ API or the newer netcdf-cxx4-4.2 C++ API.
With the netCDF-4 C++ library, documented here, just use the NcVar::setCompression method.
With the legacy netCDF-3 C++ library, there is no C++ method to do what you want directly. But that library is implemented as a thin layer over the netCDF C library, so by adding an NcVar constructor that sets the compression level by calling the C function nc_def_var_deflate, it ought to be fairly straightforward. Of course, you would have to make sure your legacy C++ library was built to use a previously installed netCDF-4 library.
Obviously it is better to produce netcdf4 compressed files at creation time, but I just wanted to add that if one already has code written that produces the file in an uncompressed way, it is possible to use CDO on existing files to convert to netcdf4 conventions and compress at the same time:
cdo -f nc4c -z zip_9 copy in.nc out.nc