converting binary to text in linux - c++

I have a big binary file I have produced by writing an array of float numbers in binary format.
Now how can I simply convert that binary file to text ?

Use the UNIX od command, with the -t f4 option to read the file as 4 byte floating point values. The -A n option is also useful to avoid printing the file offsets. Here is the output of an example file that I created.
/tmp> od -A n -t f4 b.dump
-999.876 -998.876 -997.876 -996.876
-995.876 -994.876 -993.876 -992.876
-991.876 -990.876 -989.876 -988.876
-987.876 -986.876 -985.876 -984.876

You will need to reverse the process.
Read the file back into an array of floats.
Print the array use printf() or your favorite io function.
Any other approach will be ugly and painful; not to say this isn't ugly to start with.

Related

Converting / Flattening RMS indexed files from OpenVMS

I was attempting to convert some Indexed files created on the OpenVMS to plain flat sequential files to be used in Windows or Linux.
Each indexed files contains x quantity of of POD structures (2594 bytes)
I have converted the files using a simple program such as this:
PROGRAM MAKE_FLAT
BYTE byte_array(2594)
PARAMETER FILE_IN = 1
PARAMETER FILE_OUT = 2
OPEN(UNIT=FILE_IN, fmt='UNFORMATTED',
1 FILE='input.data',
1 ORGANIZATION='INDEXED',
1 ACCESS='SEQUENTIAL',
1 KEY=(1:8:INTEGER), RECL=649)
OPEN(UNIT=FILE_OUT, fmt='UNFORMATTED',
1 FILE='output.data')
DO WHILE (.TRUE.)
READ(FILE_IN, END=999) byte_array
WRITE(FILE_OUT) byte_array
END DO
999 CONTINUE
CLOSE(FILE_IN)
CLOSE(FILE_OUT)
END
If there are 1000 records in the file, and I should be expecting a file that is
~ 10002594 bytes, but instead it resulted with 10002044 bytes shown using:
DIR/FULL output.data
Why is it that the program writing fewer bytes per record? Did I do something wrong?
Using the built-in utility of OpenVMS gives me the expected flat file.
ANAL/RMS/FDL FILE.FDL input.data
EDIT/FDL/ANALY=FILE.FDL FILE.FDL
After changing organization from 'INDEXED' to 'SEQUENTIAL' and contiguous to 'YES', performing the following command gives me the flat file of correct size (include padding per record).
CONVERT/FDL=FILE.FDL input.data output.data
If you do not really need to do this in a program, just use CONVERT
$ CONVERT/FDL=FIXED.FDL IN-FILE OUT-FILE
You can use $ EDIT/FDL FIXED.FDL and follow the prompts for making a sequential file.
2044 looks like the max. record size FORTRAN on VMS is using to write the data. If the file size is really 1000*2044 something is wrong.
What's the output of DUMP/HEADER/BLOCKS=COUNT=0 FOR002.DAT in the lines 'Record size', 'End of file block' and 'End of file byte'?
I would expect that the 2594 bytes are written in two records. Given that there are two bytes for a flag, you will see records with length 2044 and 554. (You can confirm this with a DUMP/RECORD FOR002.DAT/PAGE.) Each record has a record length field of two bytes. That is, you should have a file size of 1000*(2044+2+554+2) = 2602000.
You can double check that with the "End of file" data from the first DUMP command: (End of file block-1)*512 + End of file byte.

Piping to provide a file as input to a C program

I have this set of .gz files and inside each of them is a single text file. This text file needs to be used in a C program. The following code solves this problem somehow where parameters 1 and 2 are integers which I'm receiving as arguments for the C program (argc, argv[]) in main().
gzip -dc xyz.txt.gz | ./program parameter1 parameter2
Can someone explain how the above code works in command line?
How does the text file automatically get passed to the program?
Do I need to write extra code in the C program to receive this text file?
The shell connects the stdout of one command directly to the stdin of the other command through a pipe(7). Neither program has to do anything out of the ordinary to take advantage of this.

How to properly compare results between matlab/octave and C++

I'm writing some piece of code available in Matlab/Octave into C++ code. I only have octave so I will just say octave from now on.
I want to properly compare the results between the octave code and the C++ code. The algorithms I'm writing take as input a 2D matrix, and output another 2D matrix.
To compare the results, I write the input matrix A from octave using the function save A.mat A, with default options. This creates an ascii file A.mat which starts like
# Created by Octave 3.8.1, Tue May 27 12:12:53 2014 CEST <remi#desktop>
# name: values
# type: matrix
# rows: 25
# columns: 5
43.0656 6.752420000000001 68.39323 35.75617 98.85446
...
I run the algorithm using octave and save the output matrix B similarly.
In my C++ code, I load the matrices A and B using the following piece of code:
// I opened the file A.mat with std::ifstream infile(filename);
// and read the first lines starting by # and loaded the matrix dimensions
std::string buffer;
double* matBuffer = new double[rows*cols];
while (std::getline(infile, buffer)) {
std::istringstream iss(buffer);
while (iss >> *matBuffer) {
matBuffer++;
}
}
Then I run the C++ code with the values read from A.mat, and compare the results with the values read from B.mat by computing the mean squared error(MSE) on the coeff of B read vs B computed.
However, with such a design, can I expect that the MSE be 0 between the C++ and octave code? Of course I do the computation on octave and C++ with the same machine. But what about the loss in precision due to writing/reading the matrices in files? Also, I assume that coefficients of octave matrices are stored in double by default, is this correct?
can I expect that the MSE be 0 between the C++ and octave code?
I don't think so, because of the many levels of conversion, a precision loss is hard to avoid.
Also, I assume that coefficients of octave matrices are stored in double by default, is this correct?
Octave uses double precision for internal representation of the values, but again there can be a loss in precision when storing the values in ASCII.
I'd suggest you try to use the binary format for storing the values, which will exclude any problems with precision. You can go with the HDF5 format by using
save -hdf5 A.mat A
You can then use the HDF5 API to read the values in your CPP application.

Output of one c++ file as input of the other?

I have a two C++ source code in which one code generates an array for the specified input while other has to use array for execution. I want to know how can I link two C++ file such that output of the first file is the input for second one ?
Since they're separate programs, that means they each have a main() function. Because of that you can't link them together. What you can do, however, is use the shell to redirect the output from one program to the input of another. For example:
program1 | program2
The above creates a so-called "pipe". What it does is feed program2 with the output of program1. Only the standard input and standard output are redirected that way. In C++ that means std::cin and std::cout. Anything printed on std::cerr or std::clog is not redirected, so make sure to never print errors, warnings or other status/informational messages on std::cout. Only print the payload data and use std::cerr or std::clog for anything else.
Linux: Compile both files and push the content of the first to the second binary with a pipe in terminal else use a socket.. you can try to ouput the data with a binary-stream and the second binary can use the same techniqe to pushs it into a array.. i hope that helps you..

how do I print a binary double array from commandline (unix)

I got binary file, that contains doubles. How do i print that out to a terminal.
I've tried octaldump 'od' but cant figure out the syntax I've tried something like
head -c80 |od -f
But that doesnt work, the man page for od is extremely bad.
I've made a c program that does what I want,
something like assuming 10double chunks.
double tmp[10];
while(fread(tmp,sizeof(double),10,stdin))
for(int i=0;i<10;i++) printf("%f\t",tmp[i]);
thanks.
Have you tried hexdump utility?
hexdump -e ' [iterations]/[byte_count] "[format string]" ' filename
Where format string should be "%f", byte count should be 8, and iterations the amount of floats you want to read
The od command you're looking for is
od -t fD
(That means "floating point values, of double size").