Reading the content of file other than ".txt" file

Reading the content of file other than ".txt" file - c++

How can i read content of a file which is not a simple text file in c/c++? For example, I want to read image file such as .jpg/.png/.bmp and see the value at certain index,to check what colour it is? or if I have a .exe/.rar/.zip and want to know what value is stored at different indices?
I am aware of c style reading file, which is
FILE *fp;
fp = fopen("example.txt","r"); /* open for reading */
char c;
c = getc(fp) ;
I want to know if i replace "example.txt" with "image.png" or so, will it works? will i get correct data?

When you open a non-text file, you'll want to specify binary (untranslated) mode:
FILE *fp = fopen("example.png", "rb");
In a typical case, you do most of your reading from binary files by defining structs that mirror the structures in the file, and then using fread to read from the file into the structure (but this has to be done carefully, to ensure that things like padding in the struct don't differ between the representation in-memory and on-disk).

You would need to open the file in binary mode. This allows you to read the bytes in a "raw" mode where they are unchanged from what was in the file.
However, determining the color of a particular pixel, etc. requires that you fully understand the meaning of the bytes in the file and how they are arranged for the file being read. This second requirement is much more difficult. You'll need to do some research on the format of that file type in order to do that.

yea ofcorse you can open any file in binary mode in c. if you are interested then you can also read some 1st byte of any such non text file.
In most of the cases all different file-format has some fixed header so based on that you can identify the type of that file.
Open any matroska(.mkv) file and read 1st 4 byte you will always have this
0x1A 0x45 0xDF 0xA3
you can also see any file in binary representation hexdump utility in linux
====================
Edit:
such as .jpg/.png/.bmp and see the value at certain index,to
check what colour it is?
here you need to understand the format of that file and based on that you can know on which place's data what information is indicating..!!!

Related

Knowing current compressed file size using gzwrite (zlib)

I'm using zlib for c++.
Quote from
http://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/zlib-gzwrite-1.html regarding gzwrite function:
The gzwrite() function shall write data to the compressed file referenced by file, which shall have been opened in a write mode (see gzopen() and gzdopen()). On entry, buf shall point to a buffer containing len bytes of uncompressed data. The gzwrite() function shall compress this data and write it to file. The gzwrite() function shall return the number of uncompressed bytes actually written.
I interpret this as the return value will NOT tell me how much larger the file became when writing. Only how much data was compressed into the file.
The only way to know how large the file is would then be to close it, and read the size from the file system. I have a requirement to only continue to write to the file until it reaches a certain size. Can this be achieved without closing the file?
A workaround would be to write until the uncompressed size reaches my limit and then close the file, read the size from file system and update my best guess of file size based on that, and then re-open the file and continue writing. This would make me close and open the file a few times towards the end (as I'm approaching the size limit).
Another workaround, which would give more of an estimate (which is not what I want really) would be to write until uncompressed size reaches the limit, close the file, read the file size from the file system and calculate the compression ratio so far. The I can use this compression ratio to calculate a new limit for uncompressed file size where the compression should get me down to the limit for the compressed file size. If I repeat this the estimate would improve, but again, not what I'm looking for.
Are there better options?
Preferred option would be if zlib could tell me the compressed file size while the file is still open. I don't see why this information would not be available inside zlib at this point, since compression happens when I call gzwrite and not when i close the file.

zlib provides the function gzoffset(), which does exactly what you're asking.
If for some reason you are stuck with a version of zlib that is more than about eight years old, when gzoffset() was added, then this is easy to do with gzdopen(). You open the output file with fopen() or open(), and provide the file descriptor (using fileno() and dup() if you used fopen()), and then provide that descriptor to gzdopen(). Then you can use ftell() or lseek() at any time to see how much as been written. Be careful to not try to double-close the descriptor. See the comments for gzdopen().

You can work around this issue by using a pipe. The idea is to write the compressed data into a pipe. After that, you read the data from the other end of the pipe, count it and write it to the actual file.
To set this up you need to first open the file to write to via a simple open. Then create a pipe via pipe2 and initialize zlib by passing one of the pipe descriptors to gzdopen:
int out = open("/path/to/file", O_WRONLY | O_CREAT | O_TRUNC);
int p[2];
pipe2(p, O_NONBLOCK);
gzFile zFile = gzdopen(p[0], "w");
You can now write the data first to the pipe and then splice it from the pipe to the out file:
gzwrite(zFile, buf, 1024); //or any other length
size_t bytesWritten = 0;
do {
bytesWritten = splice(p[1], NULL, out, NULL, 1024, SPLICE_F_NONBLOCK | SPLICE_F_MORE);
} while(bytesWritten == 1024);
As you can see, you now have the bytesWritten to tell you how much data was actually written. Simply sum it up in another variable and stop splicing as soon as you have written as much data as you need to (or just splice it in one go by writing everything to the zFile and the splice once with the amount of data you are allowed to store as the fifth parameter. If you want to not compress uneccessary data, simply do it in chunks as shown above).
A note on splice: Splice is linux specific, and is basically just a very efficient copy. You can always replace it with a simple "read and write" combo, i.e. read data from fd[1] into a buffer and then write the data from that buffer into out - splice is just faster and less code.

how to read binary file data using dlang

I'm trying to read struct data from specific position in a a binary file.
Found out that I can use import std.stdio and its File, however all i seem to find is all about string handling.
I have c-code written data on binary files that compose of several different structs and they all, as far as I understand coding lay in a oneliner. In order to find specific struct I need to, like in old c,
Open file for reading .... (binary read??)
using sizeof and move to startposition of struct data to read
read data (struct.sizeof data) into receivingbuffer and
Close file
Documentation for std.stdio.File # read talks about reading entire or up to size but can't find how to read as below c-code line?
fseek(filehandle, sizeof(firstStructData), SEEK_SET));
read(filehandle, (char *)nextReceivingBuffer, sizeof(nextReceivingBuffer))
Any ideas or hints?

Try File.seek and File.rawRead. They work like their C counterparts, but rawRead determines the read count from the size of the output buffer you hand it.

C++ - Missing end of line characters in file read

I am using the C++ streams to read in a bunch of files in a directory and then write them to another directory. Since these files may be of different types, I am using a the generic ios::binary flag when reading/writing these files.
Example code below:
std::fstream inf( "ex.txt", std::ios::in | std::ios::binary);
char c;
while( inf >> c ) {
// writing to another file in binary format
}
The issue I have is that in the case of files containing text, the end of line characters in these text files are not being written to the output file.
Edit: Or at least they do not appear to be as when the newly written file is opened, there is only a single continuous line of characters.
Edit again: The problem (of the continuous string) appears to persist even when the read / write is made in text mode.
Thus, I was wondering if there was a way to check if a file has text or binary and then read/write it appropriately. Else, is there any way to preserve the end of line characters even when opening the file in binary format?
Edit: I am using the g++ 4.8.2 compiler

When you want to manipulate bytes, you need to use read and write methods, not >> << operators.
You can get the intended behavior with inp.flags(inp.flags() & ~std::ios_base::skipws);, though.

Looking for data in EXIF format

I got the problem with my program made for downloading the DateTimeOrginal data from the .JPG file. I found the document about it on the internet:
https://ExifTool.org/TagNames/EXIF.html
I see that the data I'm looking for is at 0x9003 address.
So right now what I'm trying to do is:
temp = fopen(name, "rb");
open the file binary
fseek (temp, 0x9003, SEEK_SET);
move the File pointer to the address
fscanf(temp, "%s", str);
and load the data to the char[] structure.
Is atleast any of that correct? I'm still thinking that i got the problem with the address, because after compile that program i see only some trash from the file.

The EXIF data is embedded into the jpeg tag APP1 (0xE1).
The first thing to do is to find the jpef tag 0xE1 in the stream; you have to scan all the jpeg tags (marked by 0xFF+tag, in your case 0xFF,0xE1). After you get the tag, find its length by reading the next 2 bytes (and adjust for high endian), then get the tag's content.
After you get the tag's content, then look in it for the EXIF tag you are interested in (0x9003).
The method readStream in the jpeg class of the open source project Imebra gives you an example on how to parse jpeg tags: https://bitbucket.org/binarno/imebra/src/2eb33b2170e76b5ad2737d1c2d81c1dcaccd19e5/project_files/library/imebra/src/jpegCodec.cpp?at=default#cl-867

Given the style of programming of the OP, I'd recommend Easyexif at https://github.com/mayanklahiri/easyexif
It's relatively easy to integrate. Note that fseek() goes to a file position; it does not search for a certain number.

is there a way to fopen a file that allows me to edit just a few bytes?

I am writing a class that compresses binary data using a zlib stream. I have a buffer that I fill with the output stream and once it becomes full I dump the buffer out to a file using fopen(filename, 'ab');... What this means is that my program only opens up the file to write to it whenever it has a buffer full of data to dump, it goes and does it and immediately closes it.
The issue is in my format I use an 8 byte header at the beginning of each file which contains the original length and compressed length but I do not know these values until the end of the whole compression process.
What I wanted to do was write 8 bytes of zeros, then append with all my compressed data, then come back at the end during cleanup to fill in those 8 bytes with the size data, but I can't seem to find a way to open the file without bringing it all back into memory. I just want to edit the first 8 bytes of the file. Do I need to use mmap?

Since you're using the file in append mode, you do need to close and re-open it:
open with fopen(filename, "r+b");
write the 8 bytes;
close the file using fclose().
The r+ means
Open for reading and writing. The stream is positioned at the
beginning of the file.
and the b is needed to open in binary mode.
You can use this method to change the data at any position in the file, not just at the beginning: simply use fseek() to seek to the required position before writing.

Use rewind() to take the file pointer back to the start of the file after you write out the last few bytes of data. You can then output your 8 bytes of length info.

If you have flexibility in changing your format, I might suggest this. Define your compressed stream such that it is a sequence of an unknown number of blocks, and each block is preceded by a fixed length integer specifying the number of bytes in the block. The stream is finished when the next block has a size of zero.
The drawback to this format is that there no way for the reader of the stream to know how much data is coming until it's all been read. But the advantage is that it avoids this problem you are trying to solve.
More importantly, it allows you to send a compressed stream of data somewhere as you read the input and you don't have to save it all before sending it. For example, you could write a compression Unix filter that you could put in a pipe stream:
prog1 | yourprog -compress | rsh host yourprog -expand | prog2
Good luck.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading the content of file other than ".txt" file - c++

Related

Knowing current compressed file size using gzwrite (zlib)

how to read binary file data using dlang

C++ - Missing end of line characters in file read

Looking for data in EXIF format

is there a way to fopen a file that allows me to edit just a few bytes?

Categories

Resources