How to determine Endianess of a file

How to determine Endianess of a file - fortran

I am having a hard time making sure if the files I am using are little Endian or Big Endian. I s there any fortran code which I can run over a file to determine its Endianess?
Thank you,
Pradeep

You could look at the file with a hexeditor. Most hexeditors allow you to switch the interpretation of the bits between big and little endian. You will have to have some idea of what values to expect. Interpreted with the wrong endianess, the values will probably be implausible. If the files were written as Fortran sequential files, typically each record will begin and end with an integer specifying the length of the record. This might also help you deduce the endianess of the file.
Or, as Rob said, just try reading the file. If it doesn't work, try the other endian.

Related

How to save a c++ readable .mat file

I am running a DCT code in matlab and i would like to read the compressed file (.mat) into a c code. However, am not sure this is right. I have not yet finished my code but i would like to request for an explanation of how to create a c++ readable file from my .mat file.
Am kinda confused when it comes to .mat, .txt and then binary, float details of files. Someone please explain this to me.

It seems that you have a lot of options here, depending on your exact needs, time, and skill level (in both Matlab and C++). The obvious ones are:
ASCII files
You can generate ASCII files in Matlab either using the save(filename, variablename, '-ascii') syntax, or you can create a more custom format using c-style fprintf commands. Then, within a C or C++ program the files are read using an fscanf.
This is often easiest, and good enough in many cases. The fact that a human can read the files using notepad++, emacs, etc. is a nice sanity check, (although this is often overrated).
There are two big downsides. First, the files are very large (an 8 byte double number requires about 19 bytes to store in ASCII). Second, you have to be very careful to minimize the inevitable loss of precision.
Bytes-on-a-disk
For a simple array of numbers (for example, a 32-by-32 array of doubles) you can simply use the fwrite Matlab function to write the array to a disk. Then within C/C++ use the parallel fread function.
This has no loss of precision, is pretty fast, and relatively small size on disk.
The downside with this approach is that complex Matlab structures cannot necessarily be saved.
Mathworks provided C library
Since this is a pretty common problem, the Mathworks has actually solved this by a direct C implementation of the functions needed to read/write to *.mat files. I have not used this particular library, but generally the libraries they provide are pretty easy to integrate. Some documentation can be found starting here: http://www.mathworks.com/help/matlab/read-and-write-matlab-mat-files-in-c-c-and-fortran.html
This should be a pretty robust solution, and relatively insensitive to changes, since it is part of the mainstream, supported Matlab toolset.
HDF5 based *.mat file
With recent versions of Matlab, you can use the notation save(filename, variablename, '-v7.3'); to force Matlab to save the file in an HDF5 based format. Then you can use tools from the HDF5 group to handle the file. Note a decent, java-based GUI viewer (http://www.hdfgroup.org/hdf-java-html/hdfview/index.html#download_hdfview) and libraries for C, C++ and Fortran.
This is a non-fragile method to store binary data. It is also a bit of work to get the libraries working in your code.
One downside is that the Mathworks may change the details of how they map Matlab data types into the HDF5 file. If you really want to be robust, you may want to try ...
Custom HDF5 file
Instead of just taking whatever format the Mathworks decides to use, it's not that hard create a HDF5 file directly and push data into it from Matlab. This lets you control things like compression, chunk sizing, dataset hierarchy and names. It also insulates you from any future changes in the default *.mat file format. See the h5write command in Matlab.
It is still a bit of effort to get running from the C/C++ end, so I would only go down this path if your project warranted it.

.mat is special format for the MATLAB itself.
What you can do is to load your .mat file in the MATLAB workspace:
load file.mat
Then use fopen and fprintf to write the data to file.txt and then you can read the content of that file in C.

You can also use matlab's dlmwrite to write to a delimited asci file which will be easy to read in C (and human readable too) although it may not be as compressed if that is core to the issue

Adding to what has already been mentioned you can save your data from MATLAB using -ascii.
save x.mat x
Becomes:
save x.txt x -ascii

Big Endian and Little Endian support for byte ordering

We need to support 3 hardware platforms - Windows (little Endian) and Linux Embedded (big and little Endian). Our data stream is dependent on the machine it uses and the data needs to be broken into bit fields.
I would like to write a single macro (if possible) to abstract away the detail. On Linux I can use bswap_16/bswap_32/bswap_64 for Little Endian conversions.
However, I can't find this in my Visual C++ includes.
Is there a generic built-in for both platforms (Windows and Linux)?
If not, then what can I use in Visual C++ to do byte swapping (other than writing it myself - hoping some machine optimized built-in)?
Thanks.

On both platforms you have
for short (16bit): htons() and ntohs()
for long (32bit): htonl() and ntohl()
The missing htonll() and ntohll() for long long (64bit) could easily be build from those two. See this implementation for example.
Update-0:
For the example linked above Simon Richter mentions in a comment, that it not necessarily has to work. The reason for this is: The compiler might introduce extra bytes somewhere in the unions used. To work around this the unions need to be packed. The latter might lead to performance loss.
So here's another fail-safe approach to build the *ll functions: https://stackoverflow.com/a/955980/694576
Update-0.1:
From bames53' s comment I tend to conclude the 1st example linked above shall not be used with C++, but with C only.
Update-1:
To achieve the functionality of the *ll functions on Linux this approach might be the ' best'.

htons and htonl (and similar macros) are good if you insist on dealing with byte sex.
However, it's much better to sidestep the issue by outputting your data in ASCII or similar. It takes a little more room, and it transmits over the net a little more slowly, but the simplicity and futureproofing is worth it.
Another option is to numerically take apart your int's and short's. So you & 0xff and divide by 256 repeatedly. This gives a single format on all architectures. But ASCII's still got the edge because it's easier to debug with.

Not the same names, but the same functionality does exist.
EDIT: Archived Link -> https://web.archive.org/web/20151207075029/http://msdn.microsoft.com/en-us/library/a3140177(v=vs.80).aspx
_byteswap_uint64, _byteswap_ulong, _byteswap_ushort

Binary file turns out smaller on Windows than on Linux

I'm working on a program that writes binary data to a file. On Windows, the resulting file is slightly smaller than on Linux for some reason. The size in bytes and the MD5 hash are both different. How can this happen with the same code?
I already added the ifstream::binary flag and I made sure I set noskipws...
ofstream output("output", ifstream::binary);
output << std::noskipws;
I ran Application Verifier on my program and it didn't generate any errors or warnings regarding possible memory corruption.
Are there any other reasons why file output might be different?

The difference is probably due to different development environments. Different compilers, hardware and operating systems can all change the format of the underlying data. For example, the different compilers may pack your data structures with varying amounts of efficiency. Also, your basic types (ints, longs, floats, etc) may be different sizes due to different processors.
In short, programs that require cross-platform compatibility between binary data develop very precise rules for packing structures and values into binary format (often at the field, and equally precise rules for reading the data.

Without seeing exactly what you're writing to the file and how you are calling it, it's hard to give a truly useful answer, but if you're using the stream insertion operator (aka formatted output operator) to do your file writing, then any non-string data will be converted into strings according to your locale settings. If that is what you are actually doing, using ofstream::binary seems somewhat pointless, because you are just writing text anyway.
I would recommend creating the smallest possible example for which there is a difference and examining the output in a hex editor to see what is going on.

Is it a good idea to save/load an array of struct?

I was wondering if it was a good idea to load/save an array of a certain type of structure using fstream. Note, I am talking about loading/saving to a binary file. Should I be loading/saving independent variables such as int, float, boolean rather then a struct? The reason I ask that is because I've heard that a structure might have some type of padding which might offset the save/load.

A structure may contain padding, which will be written to the file. This is no big deal if the file is going to be read back on the same platform, using code emitted by the same compiler that did the write. However, this is difficult to guarantee, and if you cannot guarantee it, you should normally write the data in some textual format, such as XML, json or whatever.

Without serialization, your binary data will not be portable across different platform (and compilers). So if you need portability, then you need to serialize the data before storing it in file as binary.
Have a look at these:
Boost Serialization Tutorial
Boost Serializable Concept

It's not deprecated (it's not part of any formal spec, where should it be deprecated?), but it's extremely not portable and probably the worst way to go about serialising stuff. Use Boost.Serialization, or a similar library.

As you pointed out in your answer this will happen with writing structs this way. If you want your files to be portable across platforms, e.g. file being written on Linux i686 to opened by Solaris on Sparc then even writing individual float's won't work.
Try writing your data to something like text or XML and then zip/tar the files to make one document of them.

As Neil said, prefer textual representation of data. The XML format may be overkill. Simpler versions are Comma Separated Value (CSV) and one value per text line.

How to decompress a file in fortran77?

I have a compressed file.
Let's ignore the tar command because I'm not sure it is compressed with that.
All I know is that it is compressed in fortran77 and that is what I should use to decompress it.
How can I do it?
Is decompression a one way road or do I need a certain header file that will lead (direct) the decompression?
It's not a .Z file. It ends at something else.
What do I need to decompress it? I know the format of the final decompressed archive.
Is it possible that the file is compressed thru a simple way but it appears with a different extension?

First, let's get the "fortran" part out of the equation. There is no standard (and by that, I mean the fortran standard) way to either compress or decompress files, since fortran doesn't have a compression utility as part of the language. Maybe someone written some of their own, but that's entirely up to him.
So, you're stuck with publicly available compression utilities, and such. On systems which have those available, and on compilers which support it (it varies), you can use the SYSTEM function, which executes the system command by passing a command string to the operating system's command interpreter (I know it exists in cvf, probably ivf ... you should probably look it up in help of your compiler).
Since you asked a similar question already I assume you're still having problem with this. You mentioned that "it was compressed with fortran77". What do you mean by that ? That someone builded a compression utility in f77 and used it ? So that would make it a custom solution ?
If it's some kind of a custom solution, then it can practically be anything, since a lot of algorithms can serve as "compression algorithms" (writing file as binary compared to plain text will save a few bytes; voila, "compression")
Or have I misunderstood something ? Please, elaborate this a little.

My guess is that you have a binary file, which is output by a Fortran program. These can look like compressed files because they are not readable in a text editor.
Fortran allows you to write the in-memory data out to a file without formatting it, so that you can reload it later without having to parse it. The problem, however, is that you need that original source code in order to see what types of variables are written in the file.
If you have no access to the fortran source code, but a lot of time to spare, you could write some simple fortran program and guess what types of variables are being used. I wouldn't advise it, though, as Fortran is not very forgiving.
If you want some simple source code to try, look at this page which details binary read and write in Fortran, and includes a code sample. Just start by replacing reclength=reclength*4 with reclength=reclength*2 for a double precision real.

There is no standard decompression method, there are tons. You will need to know the method used to compress it in order to decompress it.

You said that the file extension was not .Z, but something else. What was that something else?
If it's .gz (which is very common on Unix systems), "gunzip" is the proper command. If it's .tgz, you can gunzip and untar it. (Or you can read the man page for tar(1), since it probably has the ability to gunzip and extract together.)
If it's on Windows, see if Windows can read it directly, as the file system itself appears to support the ZIP format.
If something else, please just list the file name (or, if there are security implications, the file name beginning with the first period), and we might be able to figure it out.

You can check to see if it's a known compressed file type with the file command. Assuming file returns something like "binary file" then you're almost certainly looking at plain binary data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js