i am currently trying to read .BMP files using C++, but somehow after reading a few bytes the end of the file is reached (fgetc() returns -1). I've reduced the problem to a minimal example:
#include <iostream>
int main()
{
// Open file
FILE* file = fopen("C:/Path/to/file", "r");
// Get and print file size in bytes
fseek(file, 0L, SEEK_END);
std::cout << ftell(file) << std::endl;
rewind(file);
int byte, i = 0;
do
{
// Get each byte
byte = fgetc(file);
// Print i and value of byte
std::cout << i << ", " << byte << std::endl;
i++;
}
// Stop reading when end of file is reached
while (byte != EOF);
std::cin.get();
return 0;
}
When using this to read .BMP files (problem does not occur on simple formats like .txt files), It reads the file length correctly, but finds the EOF way before reaching the end of the file.
For example, using this file, it reads a file length of 120054, but fgetc() returns -1 at i=253.
What exactly am i doing wrong, and how can i fix this problem?
Reading a file in plain "r" mode on DOS/Windows may treat ASCII 26 (^Z) as "end of file". It may also convert line endings from CR LF (13 10) to LF (10), which you also don't want.
Looking at your sample file, I do indeed see that character (it's 1A in hex):
0000340 0c 1f 15 0e 1f 15 0e 1f 14 10 1f 14 10 21 17 10
0000360 21 17 10 22 18 11 23 19 12 25 19 13 26[1a]14 26
The position is 375 octal, which is 253 decimal. Sound familiar? :)
Use "rb":
FILE* file = fopen("C:/Path/to/file", "rb");
Change
FILE* file = fopen("C:/Path/to/file", "r");
to
FILE* file = fopen("C:/Path/to/file", "rb");
to read the file in binary mode. That usually helps to avoid such strange errors.
Related
I have a utility that should copy files from one location to another.
The problem I have is when reading X bytes using the QDataStream and writing it, the number of bytes being read/written exceeds the number of bytes the file has. I see this problem happen with a number of files.
I am using a QDataStream::readRawData() and QDataStream::writeRawData() to facilitate reading/writing to and from files as shown below
QDataStream in(&sourceFile);
QDataStream out(&newFile);
// Read/Write byte containers
qint64 fileBytesRead = 0;
quint64 fileBytesWritten = 0;
qint64 bytesWrittenNow = 0;
quint8* buffer = new quint8[bufSize];
while ((fileBytesRead = in.readRawData((char*)buffer, bufSize)) != 0) {
// Check if we have a read/write mismatch
if (fileBytesRead == -1) {
printCritical(TAG, QString("Mismatch read/write: [R:%1/W:%2], total file write/max [W:%3/M:%4]. File may be corrupted, skipping...").arg(QString::number(fileBytesRead), QString::number(bytesWrittenNow), QString::number(fileBytesWritten), QString::number(storageFile.size)));
// close source file handle
sourceFile.close();
// Close file handle
newFile.close();
return BackupResult::IOError;
}
// Write buffer to file stream
bytesWrittenNow = out.writeRawData((const char*)buffer, fileBytesRead);
// Check if we have a read/write mismatch
if (bytesWrittenNow == -1) {
printCritical(TAG, QString("Mismatch read/write: [R:%1/W:%2], total file write/max [W:%3/M:%4]. File may be corrupted, skipping...").arg(QString::number(fileBytesRead), QString::number(bytesWrittenNow), QString::number(fileBytesWritten), QString::number(storageFile.size)));
// close source file handle
sourceFile.close();
// Close file handle
newFile.close();
return BackupResult::IOError;
}
// Add current buffer size to written bytes
fileBytesWritten += bytesWrittenNow;
if(fileBytesWritten > storageFile.size) {
qWarning() << "Extra bytes read/written exceeding file length"; <================= this line is hit every now and then
}
//...
This problem isn't consistent, but it happens every now and then, I have no idea why. Anyone have thoughts on a possible cause?
The name of the function QDataStream::writeRawData() sounds like ideal for writing binary data. Unfortunately, that's only half of the story.
The open-mode of the file is relevant as well under certain conditions – e.g. if the QFile is opened on Windows with QIODevice::Text:
QIODevice::Text
When reading, the end-of-line terminators are translated to '\n'. When writing, the end-of-line terminators are translated to the local encoding, for example '\r\n' for Win32.
I prepared an MCVE to demonstrate that:
// Qt header:
#include <QtCore>
void write(const QString &fileName, const char *data, size_t size, QIODevice::OpenMode mode)
{
qDebug() << "Open file" << fileName;
QFile qFile(fileName);
qFile.open(mode | QIODevice::WriteOnly);
QDataStream out(&qFile);
const int ret = out.writeRawData(data, size);
qDebug() << ret << "bytes written.";
}
// main application
int main(int argc, char **argv)
{
const char data[] = {
'\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07',
'\x08', '\x09', '\x0a', '\x0b', '\x0c', '\x0d', '\x0e', '\x0f'
};
const size_t size = sizeof data / sizeof *data;
write("data.txt", data, size, 0);
write("test.txt", data, size, QIODevice::Text);
}
Built and tested in VS2017 on Windows 10:
Open file "data.txt"
16 bytes written.
Open file "test.txt"
16 bytes written.
Result inspected with the help of cygwin:
$ ls -l *.txt
-rwxrwx---+ 1 scheff Domänen-Benutzer 427 Jun 23 08:24 CMakeLists.txt
-rwxrwx---+ 1 scheff Domänen-Benutzer 16 Jun 23 08:37 data.txt
-rwxrwx---+ 1 scheff Domänen-Benutzer 17 Jun 23 08:37 test.txt
$
data.txt has 16 bytes as expected but test.txt has 17 bytes. Oops!
$ hexdump -C data.txt
00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
00000010
$ hexdump -C test.txt
00000000 00 01 02 03 04 05 06 07 08 09 0d 0a 0b 0c 0d 0e |................|
00000010 0f |.|
00000011
$
Obviously, the underlying Windows file function “corrected” the \n to \r\n – 09 0a 0b became 09 0d 0a 0b. Hence, there occurs one additional byte which was not part of the originally written data.
Similar effects may happen when the QFile is opened for reading with QIODevice::Text involved.
How can I read the header of an ADTS encoded aac file? I need it to get the buffer length for each frame to read out the whole aac file. But I can't get the right values. Here is my code to read the header and get the buffer length for each frame(Bit 30 - 43), when assuming big endian:
main(){
ifstream file("audio_adts.m4a", ios::binary);
char header[7],buf[1024];
int framesize;
while(file.read(header,7)) {
memset(buf ,0 , 1024);
/* Get header bit 30 - 42 */
framesize = (header[3]&240|header[4]|header[5]&1);
cout << "Framesize including header: "<<framesize<<endl;
file.read(buf,framesize);
/*Do something with buffer*/
}
return 0;
}
The framesize I get with this code is 65, 45 ,45, 45, -17 and then it stops because of the negative value. The actual framesizes are around 200.
Hexdump of first header:
0x000000: ff f9 50 40 01 3f fc
Your extraction of the framesize appears to have the shifts << missing, needed to get the extracted bit into the right locations
The bit masks does not look like they are matching the /*bit 30-42*/ comment.
Also, change the char to unsigned char as you otherwise will run into all kind of sign extension issues when you are doing this type of bit manipulation (which is the cause for your negative value error)
The way I calculated it:
unsigned int AAC_frame_len = ((AAC_44100_buf[3]&0x03)<<11|(AAC_44100_buf[4]&0xFF)<<3|(AAC_44100_buf[5]&0xE0)>>5);
Just wondering, if I read a PNG file as a binary file, and I know how to write the hex numbers into another plain txt or whatever file, then how can I recreate the PNG file with those hex numbers?
This is the code I use to read from a PNG file and write to another plain txt file:
unsigned char x;
ifile.open("foo.png",ios::binary);
ifile>>noskipws>>hex;
while(ifile>>x){
ofile<<setw(2)<<setfill('0')<<(int)x;
//do some formatting stuff to the ofile, ofile declaration omitted
//some ifs to see if IEND is read in, which is definitely correct
//if IEND, break, so the last four hex numbers in ofile are 49 45 4E 44
}
//read another 4 bytes and write to ofile, which are AE 42 60 82, the check sum
The reason why I am doing this is because I have some PNG files which have some irrelevant messages after IEND chunk, and I want to get rid of them and only keep the chunks related to the actual picture and split them into different files. By "irrelevant messages" I mean they are not the actual part of the picture but I have some other use with them.
It's easy, you just need to read every 2 characters and convert them from hex back to binary.
unsigned char x;
char buf[3] = {0};
ifile.open("foo.hex");
while(ifile>>buf[0]>>buf[1]){
char *end;
x = (unsigned char) strtol(buf, &end, 16);
if (*end == 0) // no conversion error
// output the byte
I'm trying to extract the parameter with which an app was called by using the data inside cmdline.
If I start an application instance like this:
myapp 1 2
and then cat the cmdline of myapp I will see something like myapp12.
I needed to extract these values and I used this piece of code to do it
pid_t proc_id = getpid();
sprintf(buf,"/proc/%i/cmdline",proc_id);
FILE * pFile;
pFile = fopen (buf,"r");
if (pFile!=NULL)
{
fread(buf,100,100,pFile);
cout << "PID " << proc_id << endl;
string str = buf;
cout << buf << endl;
size_t found=str.find_last_of("/\\");
cout << " file: " << str.substr(found+1) << endl;
fclose (pFile);
}
But what I am getting is only the app name and no parameters...
Update coppied from answer:
well, my question now seems to be how do I read the cmdline file without it stopping at the first NULL character...
fopen(cmdline, "rb")
doesn't do anything else so...
/usr/bin/strings /proc/1337/cmdline usually do the job for me.
All of the command line parameters (what would come through as the argv[] array) are actually null-separated strings in /proc/XXX/cmdline.
abatkin#penguin:~> hexdump -C /proc/28460/cmdline
00000000 70 65 72 6c 00 2d 65 00 31 20 77 68 69 6c 65 20 |perl.-e.1 while |
00000010 74 72 75 65 00 |true.|
This explains why when you cat'ed cmdline they were all "stuck" together (cat ignored the invalid NULL characters) and why your cout stopped after the first command line argument (the process name) since it thought that the process name was a null-terminated string and stopped looking for more characters at that point.
Processing Command Line Arguments
To process the command line arguments, you have a couple options. If you just want the entire command line as one giant string, loop from the 0 to (numRead - 2) (where numRead is the number of characters read) and replace any NULL bytes (curByte == 0) with spaces. Then just make sure to set the last character to be a NULL byte too (in case things got truncated due to the fixed-size buffer).
If you instead want an array with all of the arguments, you need to be more creative. One option would be to loop from 0 to (numRead - 1) and could all of the NULL bytes that you find. Then allocate an array of char*'s of that length. Then loop back through the command line, setting the beginning of every string (i.e. the first byte in the array, plus each byte following a NULL byte) to consecutive elements of the array of char*'s.
Just know that since you read to a fixed-size buffer, anything beyond that buffer would be truncated. So remember that whatever you do, you probably need to manually make sure that the end of the last string ends up being NULL terminated, otherwise most string handling functions won't know where the string ends and will keep on going forever.
Many files supplied in |path| have incorrect size (proc files etc). Hence, the file is read sequentially as opposed to a one-shot read.
std::string cmdline;
char buf[1024];
size_t len;
FILE *fp = fopen("/proc/self/cmdline", "rb");
if (fp) {
while ((len = fread(buf, 1, sizeof(buf), fp)) > 0) {
cmdline.append(buf, len); // note: `len` here is very important
}
}
Then, you get the whole command line also the parameters. But the reason you can only get the app name or the file path and no parameters when you try to print it is the command line you've got is separated with '\0', something like "./exefile_name\0-param1=p1\0-param2=p2". So you can only get the part that ahead the first '\0', because the machine assumes that the string is end at that place. You also need to tokenize it with '\0' first, then try to print it or use it.
std::stringstream tokens(cmdline);
std::string tmp;
while (getline(tokens, tmp, '\0')){
std::cout << tmp << std::endl;
}
fclose(fp);
EDIT: Apparently, the problem is in the read function: I checked the data in a hex editer
02 00 00 00 01 00 00 00 00 00 00 00
So the zero is being stored as zero, just not read as zero.
Because when I use my normal store-in-bin file function:
int a = 0;
file.write(reinterpret_cast<char*>(&a), sizeof(a));
It stores 0 as the char version, or "\0", which obviously isn't stored (because it's a null value?) so when I call my function to read the zero value, it reads the value right after it (or right before if it would be the last in the file). So how can I store zero in a .bin file properly?
EDIT: Here are some of the functions relating to the read/write process:
//Init program: creates a sector.bin for another program to read from.
#include<fstream>
using namespace std;
int main()
{
fstream file;
file.open("sector.bin", ios::out | ios::binary);
if(!file.is_open())
{
file.open("sector.bin", ios::out | ios::binary);
file.close();
file.open("sector.bin", ios::out | ios::binary);
if(!file.is_open())
{
return -1;
}
}
file.seekp(file.beg);
int a = 2;
int b = 1;
int c = 0;
file.write(reinterpret_cast<char*>(&a), sizeof(a));
file.write(reinterpret_cast<char*>(&b), sizeof(b));
file.write(reinterpret_cast<char*>(&c), sizeof(c));
file.close();
return 0;
}
//Read function: part of another program that intializes variables based off
//of sector.bin
void sector::Init(std::fstream& file)
{
int top_i = FileRead(file,0);
std::cout<<top_i<<std::endl;
for(int i = 0; i < top_i; i++)
{
accessLV[i] = FileRead(file,i+1);
std::cout<<accessLV[i]<<std::endl;
}
std::cin.ignore();
viral_data.add(new X1(5,5,'X'));
viral_data.add(new X1(9,9,'X'));
player.set(0,0,'O');
return;
}
//the FileRead used in init
int FileRead(std::fstream& file, int pos)
{
int data;
file.seekg(file.beg + pos);
file.read(reinterpret_cast<char*>(&data), sizeof(data));
return data;
}
Also, the output for using sector::Init is as follows:
2
1
1
The ouput that I was trying to write into the bin was
2
1
0
So either the 0 is being read/written as a 1, or its not being written and Init is reading the last value twice.
int num = 0;
write( fd, &num, sizeof( int ));
It's not clear what do you mean by "storing integer value 0" in a file. Files contain bytes, not integers. Do you need to store sizeof(int) 0-bytes, or just one '\0' byte?
P.S. I also would guess the problem might be in your read code. Did you look at your .bin file in a hex editor?
P.P.S. Your problem is in seekg() function usage. Instead of passing the offset in bytes, you pass pos. It should be pos * sizeof(int) instead.
I'm not sure what you want to do, to me it seems the code you provided does what you're asking for:
int main() {
std::ofstream file("/tmp/tst.out");
int a = 0;
file.write(reinterpret_cast<char*>(&a), sizeof(a));
return 0;
}
This results in a file of four bytes size that contains the binary representation of a zero integer:
$ hexdump /tmp/tst.out
0000000 0000 0000
0000004
If you want to store the integer as it's ASCII representation you should use formatted stream output with <<:
std::ofstream file("/tmp/tst.out");
int a = 0;
file << a << std::endl;
This way you get:
$ cat /tmp/tst.out
0
You need to think what format the binary file should contain - something you don't have to do in the same way with text files, which is why many times a text file is used.
Assuming a (32-bit) machine where sizeof(int) == 4 (and CHAR_BITS == 8), then you can store 4 bytes that are all zero at the current file location using native format, then what you've got there should work, I think. You could experiment with other values such as 0x01020304, you will see the byte layout on your machine.
Of course, you need to be careful reading it back in, reversing the procedure used for writing. And don't forget to reposition the file before trying to re-read the data just written.