how to read binary file data using dlang - c++

I'm trying to read struct data from specific position in a a binary file.
Found out that I can use import std.stdio and its File, however all i seem to find is all about string handling.
I have c-code written data on binary files that compose of several different structs and they all, as far as I understand coding lay in a oneliner. In order to find specific struct I need to, like in old c,
Open file for reading .... (binary read??)
using sizeof and move to startposition of struct data to read
read data (struct.sizeof data) into receivingbuffer and
Close file
Documentation for std.stdio.File # read talks about reading entire or up to size but can't find how to read as below c-code line?
fseek(filehandle, sizeof(firstStructData), SEEK_SET));
read(filehandle, (char *)nextReceivingBuffer, sizeof(nextReceivingBuffer))
Any ideas or hints?

Try File.seek and File.rawRead. They work like their C counterparts, but rawRead determines the read count from the size of the output buffer you hand it.

Related

Pass Binary string/file content from c++ to node js

I'm trying to pass the content of a binary file from c++ to node using the node-gyp library. I have a process that creates a binary file using the .fit format and I need to pass the content of the file to js to process it. So, my first aproach was to extract the content of the file in a string and try to pass it to node like this.
char c;
std::string content="";
while (file.get(c)){
content+=c;
}
I'm using the following code to pass it to Node
v8::Local<v8::ArrayBuffer> ab = v8::ArrayBuffer::New(args.GetIsolate(), (void*)content.data(), content.size());
args.GetReturnValue().Set(ab);
In node a get an arrayBuffer but when I print the content to a file it is different to the one that show a c++ cout.
How can I pass the binary data succesfully?
Thanks.
Probably the best approach is to write your data to a binary disk file. Write to disk in C++; read from disk in NodeJS.
Very importantly, make sure you specify BINARY MODE.
For example:
myFile.open ("data2.bin", ios::out | ios::binary);
Do not use "strings" (at least not unless you want to uuencode). Use buffers. Here is a good example:
How to read binary files byte by byte in Node.js
var fs = require('fs');
fs.open('file.txt', 'r', function(status, fd) {
if (status) {
console.log(status.message);
return;
}
var buffer = new Buffer(100);
fs.read(fd, buffer, 0, 100, 0, function(err, num) {
...
});
});
You might also find these links helpful:
https://nodejs.org/api/buffer.html
<= Has good examples for specific Node APIs
http://blog.paracode.com/2013/04/24/parsing-binary-data-with-node-dot-js/
<= Good discussion of some of the issues you might face, including "endianness" and "interpreting numbers"
ADDENDUM:
The OP clarified that he's considering using C++ as a NodeJS Add-On (not a standalone C++ program.
Consequently, using buffers is definitely an option. Here is a good tutorial:
https://community.risingstack.com/using-buffers-node-js-c-plus-plus/
If you choose to go this route, I would DEFINITELY download the example code and play with it first, before implementing buffers in your own application.
It depends but for example using redis
Values can be strings (including binary data) of every kind, for
instance you can store a jpeg image inside a value. A value can't be
bigger than 512 MB.
If the file is bigger than 512MB, then you can store it in chunks.
But I wouldnt suggest since this is an in-memory data store
Its easy to implement in both c++ and node.js

Correct way to extract array data from binary?

There is a classic way to embed resource files as a C language array into a binary file, so that we can store some external resource files such as .jpeg or .txt files into a binary.
For example, in the header file we can define an array:
const unsigned char xd_data[] = {
77,90,144,0,3,0,0,0,4,0,0,0,255,255,0,0,184,0,0,0,0,0,0,0,64,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,240,0,0,
0,14,31,186,14,0,180,9,205,33,184,1,76,205,33,84,104,105,115,32,112,114,
111,103,114,97,109,32,99,97,110,110,111,116,32,98,101,32,114,117,110,
32,105,110,32,68,79,83,32,109,111,100,101,46,13,13,10,36,0,0,0,0,0,0,
0,66,163,223,218,6,194,177,137,6,194,177,137,6,194,177,137,105,221,187,
137,13,194,177,137,133,222,191,137,3,194,177,137,105,221,181,137,4,194,
177,137,136,202,238,137,4,194,177,137,6,194,176,137,73,194,177,137,133,
202,236,137,13,194,177,137,48,228,187,137,11,194,177,137,193,196,183,
137,7,194,177,137,82,105,99,104,6,194,177,137,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,80,69,0,0,76,1,4,0,65,162,32,86,0,0,0,0,0,0,0,
0,224,0,47,1,11,1,6,0,0,100,0,0,0,74,0,0,0,0,0,0,228,113,0,0,0,16,0,0,
0,128,0,0,0,0,64,0,0,16,0,0,0,2,0,0,4,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,
224,0,0,0,4,0,0,0,0,0,0,2,0,0,0,0,0,16,0,0,16,0,0,0,0,16,0,0,16,0,0,0,
0,0,0,16,0,0,0,0,0,0,0,0,0,0,0,124,140,0,0,140,0,0,0,0,208,0,0,0,16,0
};
which contains the contents of the resource file, and it will be compile into the final binary.
There are lots of tools and tutorials on the web about this old trick, such as: http://www.rowleydownload.co.uk/arm/documentation/index.htm?http://www.rowleydownload.co.uk/arm/documentation/embed.htm, https://www.fourmilab.ch/xd/ and http://gareus.org/wiki/embedding_resources_in_executables#c_include_method.
However, looks like most of these pages are talking about how to embed the data into binary file using C style array.
My question is, what is the correct way to find the start address of the resource files in the compiled binary in order to extract them? I.e., how can I find the start address of xd_data in the compiled binary?
If you mean finding the byte address in the file where a data block starts just like objdump does but programmatically, then you can use the Binary File Descriptor library (BFD), see here and here.
if you stored data for example an image and you want to load it (for printing or what ever you want) then if you have a function (library) that load it from memory, as example void loadResImage(void * mem); just do loadResImage(xd_data), if not but you have a function that load it from the file, in that case save it to a temp file eg:
int fd=open("tmpfile");
int ret=write(fd,xd_data, sizeof(xd_data));
close(fd);
loadImageFile("tmpfile");
but if you want to access the data outside the program itself (hex editor for example, or an other program), in that case you have to add a starting mark and optionally an ending mark or sizeof data. eg:
const unsigned char xd_data[]={
...
'M','A','G','I','C'};
in example above the end of the data is known, you just do a search to find it. same way, play around and find a suitable way to store the size of the data. but beware of the compiler optimizations.

Looking for data in EXIF format

I got the problem with my program made for downloading the DateTimeOrginal data from the .JPG file. I found the document about it on the internet:
https://ExifTool.org/TagNames/EXIF.html
I see that the data I'm looking for is at 0x9003 address.
So right now what I'm trying to do is:
temp = fopen(name, "rb");
open the file binary
fseek (temp, 0x9003, SEEK_SET);
move the File pointer to the address
fscanf(temp, "%s", str);
and load the data to the char[] structure.
Is atleast any of that correct? I'm still thinking that i got the problem with the address, because after compile that program i see only some trash from the file.
The EXIF data is embedded into the jpeg tag APP1 (0xE1).
The first thing to do is to find the jpef tag 0xE1 in the stream; you have to scan all the jpeg tags (marked by 0xFF+tag, in your case 0xFF,0xE1). After you get the tag, find its length by reading the next 2 bytes (and adjust for high endian), then get the tag's content.
After you get the tag's content, then look in it for the EXIF tag you are interested in (0x9003).
The method readStream in the jpeg class of the open source project Imebra gives you an example on how to parse jpeg tags: https://bitbucket.org/binarno/imebra/src/2eb33b2170e76b5ad2737d1c2d81c1dcaccd19e5/project_files/library/imebra/src/jpegCodec.cpp?at=default#cl-867
Given the style of programming of the OP, I'd recommend Easyexif at https://github.com/mayanklahiri/easyexif
It's relatively easy to integrate. Note that fseek() goes to a file position; it does not search for a certain number.

Reading the content of file other than ".txt" file

How can i read content of a file which is not a simple text file in c/c++? For example, I want to read image file such as .jpg/.png/.bmp and see the value at certain index,to check what colour it is? or if I have a .exe/.rar/.zip and want to know what value is stored at different indices?
I am aware of c style reading file, which is
FILE *fp;
fp = fopen("example.txt","r"); /* open for reading */
char c;
c = getc(fp) ;
I want to know if i replace "example.txt" with "image.png" or so, will it works? will i get correct data?
When you open a non-text file, you'll want to specify binary (untranslated) mode:
FILE *fp = fopen("example.png", "rb");
In a typical case, you do most of your reading from binary files by defining structs that mirror the structures in the file, and then using fread to read from the file into the structure (but this has to be done carefully, to ensure that things like padding in the struct don't differ between the representation in-memory and on-disk).
You would need to open the file in binary mode. This allows you to read the bytes in a "raw" mode where they are unchanged from what was in the file.
However, determining the color of a particular pixel, etc. requires that you fully understand the meaning of the bytes in the file and how they are arranged for the file being read. This second requirement is much more difficult. You'll need to do some research on the format of that file type in order to do that.
yea ofcorse you can open any file in binary mode in c. if you are interested then you can also read some 1st byte of any such non text file.
In most of the cases all different file-format has some fixed header so based on that you can identify the type of that file.
Open any matroska(.mkv) file and read 1st 4 byte you will always have this
0x1A 0x45 0xDF 0xA3
you can also see any file in binary representation hexdump utility in linux
====================
Edit:
such as .jpg/.png/.bmp and see the value at certain index,to
check what colour it is?
here you need to understand the format of that file and based on that you can know on which place's data what information is indicating..!!!

how do I read a huge .gz file (more than 5 gig uncompressed) in c

I have some .gz compressed files which is around 5-7gig uncompressed.
These are flatfiles.
I've written a program that takes a uncompressed file, and reads it line per line, which works perfectly.
Now I want to be able to open the compressed files inmemory and run my little program.
I've looked into zlib but I can't find a good solution.
Loading the entire file is impossible using gzread(gzFile,void *,unsigned), because of the 32bit unsigned int limitation.
I've tried gzgets, but this almost doubles the execution time, vs reading in using gzread.(I tested on a 2gig sample.)
I've also looked into "buffering", such as splitting the gzread process into multiple 2gig chunks, find the last newline using strcchr, and then setting the gzseek.
But gzseek will emulate a total file uncompression. which is very slow.
I fail to see any sane solution to this problem.
I could always do some checking, whether or not a current line actually has a newline (should only occure in the last partially read line), and then read more data from the point in the program where this occurs.
But this could get very ugly.
Does anyhow have any suggestions?
thanks
edit:
I dont need to have the entire file at once,just need one line a time, but I got a fairly huge machine, so if that was the easiest I would have no problems.
For all those that suggest piping the stdin, I've experienced extreme slowdowns compared to opening the file. Here is a small code snippet I made some months ago, that illustrates it.
time ./a.out 59846/59846.txt
# 59846/59846.txt
18255221
real 0m4.321s
user 0m2.884s
sys 0m1.424s
time ./a.out <59846/59846.txt
18255221
real 1m56.544s
user 1m55.043s
sys 0m1.512s
And the source code
#include <iostream>
#include <fstream>
#define LENS 10000
int main(int argc, char **argv){
std::istream *pFile;
if(argc==2)//ifargument supplied
pFile = new std::ifstream(argv[1],std::ios::in);
else //if we want to use stdin
pFile = &std::cin;
char line[LENS];
if(argc==2) //if we are using a filename, print it.
printf("#\t%s\n",argv[1]);
if(!pFile){
printf("Do you have permission to open file?\n");
return 0;
}
int numRow=0;
while(!pFile->eof()) {
numRow++;
pFile->getline(line,LENS);
}
if(argc==2)
delete pFile;
printf("%d\n",numRow);
return 0;
}
thanks for your replies, I'm still waiting the golden apple
edit2:
using the cstyle FILE pointers instead of c++ streams is much much faster. So I think this is the way to go.
Thank for all your input
gzip -cd compressed.gz | yourprogram
just go ahead and read it line by line from stdin as it is uncompressed.
EDIT: Response to your remarks about performance. You're saying reading STDIN line by line is slow compared to reading an uncompressed file directly. The difference lies within terms of buffering. Normally pipe will yield to STDIN as soon as the output becomes available (no, or very small buffering there). You can do "buffered block reads" from STDIN and parse the read blocks yourself to gain performance.
You can achieve the same result with possibly better performance by using gzread() as well. (Read a big chunk, parse the chunk, read the next chunk, repeat)
gzread only reads chunks of the file, you loop on it as you would using a normal read() call.
Do you need to read the entire file into memory ?
If what you need is to read lines, you'd gzread() a sizable chunk(say 8192 bytes) into a buffer, loop through that buffer and find all '\n' characters and process those as individual lines. You'd have to save the last piece incase there is just part of a line, and prepend that to the data you read next time.
You could also read from stdin and invoke your app like
zcat bigfile.gz | ./yourprogram
in which case you can use fgets and similar on stdin. This is also beneficial in that you'd run decompression on one processor and processing the data on another processor :-)
I don't know if this will be an answer to your question, but I believe it's more than a comment:
Some months ago I discovered that the contents of Wikipedia can be downloaded in much the same way as the StackOverflow data dump. Both decompress to XML.
I came across a description of how the multi-gigabyte compressed dump file could be parsed. It was done by Perl scripts, actually, but the relevant part for you was that Bzip2 compression was used.
Bzip2 is a block compression scheme, and the compressed file could be split into manageable pieces, and each part uncompressed individually.
Unfortunately, I don't have a link to share with you, and I can't suggest how you would search for it, except to say that it was described on a Wikipedia 'data dump' or 'blog' page.
EDIT: Actually, I do have a link