File size not equal to memory size? - c++

I'm trying to write-to-disk an array containing 11.26 million uint16_t values. The total memory size should be ~22 MB. However, the size of my file is 52MB. I'm using fprintf to write the array to disk. I thought maybe the values were being promoted. I tried to be explicit but it seems to make no difference. The size of my file is stubbornly unchanged.
What am I doing wrong? Code follows.
#define __STDC_FORMAT_MACROS
...
uint32_t dbsize = 11262336;
uint16_t* db_ = new uint16_t[dbsize_];
...
char fname[256] = "foo";
FILE* f = fopen(fname, "wb");
if(f == NULL)
{
return;
}
fprintf(f, "%i\t", dbsize_);
for(uint32_t i = 0; i < dbsize_; i++)
{
fprintf(f, "%" SCNu16 "", db_[i]);
}
fclose(f);

You're writing ASCII to your file, not binary.
Try writing your array like this instead of using fprintf in a loop.
fwrite(db_, sizeof(db_[0]), dbsize, f);
fprintf always formats numbers and other types to text, whether you've opened the file in binary mode or not. Binary mode just keeps the runtime from doing things like converting \n to \r\n.

fprintf will convert you number to a series of ASCII characters and write them to a file. Depending on its value, a 32-bit int will be from 1 to 10 characters long when expressed as a string. You need to use fwrite to write raw binary values to a file.

The source of confusion is likely to be that the "b" in FILE* f = fopen(fname, "wb"); does not do what you think it does.
Most significantly, it doesn't change any of the print or scan statements to use binary values instead of ASCII values. Like others have said - use fwrite instead.

Related

how to copy binary data of a file

Basically, I am trying to read binary data of a file by using fread() and print it on screen using printf(), now, the problem is that when it prints it out, it actually don't show it as binary 1 and 0 but printing symbols and stuff which I don't know what they are.
This is how I am doing it:
#include <stdio.h>
#include <windows.h>
int main(){
size_t sizeForB, sizeForT;
char ForBinary[BUFSIZ], ForText[BUFSIZ];
char RFB [] = "C:\\users\\(Unknown)\\Desktop\\hi.mp4" ; // Step 1
FILE *ReadBFrom = fopen(RFB , "rb" );
if(ReadBFrom == NULL){
printf("Following File were Not found: %s", RFB);
return -1;
} else {
printf("Following File were found: %s\n", RFB); // Step 2
while(sizeForB = fread(ForBinary, 1, BUFSIZ, ReadBFrom)){ // Step 1
printf("%s", ForBinary);
}
fclose(ReadBFrom);
}
return 0;
}
I would really appreciate if someone could help me out to read the actual binary data of a file as binary (0,1).
while(sizeForB = fread(ForBinary, 1, BUFSIZ, ReadBFrom)){
printf("%s", ForBinary); }
This is wrong on many levels. First of all you said it is binary file - which means there might not be text in it in the first place, and you are using %s format specifier which is used to print null terminated strings. Again since this is binary file, and there might not be text in it in the first place, %s is the wrong format specifier to use. And even if there was text inside this file, you are not sure that fread would read a "complete" null terminated string that you could pass to printf with format specifier %s.
What you may want to do is, read each byte form a file, convert it to a binary representation (google how to convert integer to binary string say, e.g., here), and print binary representation for each that byte.
Basically pseudocode:
foreach (byte b in FileContents)
{
string s = convertToBinary(b);
println(s);
}
How to view files in binary in the terminal?
Either
"hexdump -C yourfile.bin" perhaps, unless you want to edit it of course. Most linux distros have hexdump by default (but obviously not all).
or
xxd -b file
To simply read a file and print it in binary (ones and zeros), read it one char at a time. Then for each bit, print a '0' or '1'. Can print Most or Least significant bit first. Suggest MSb.
if (ReadBFrom) {
int ch;
while ((ch = fgetc(ReadBFrom)) != EOF) {
unsigned mask = 1u << (CHAR_BIT - 1); // CHAR_BIT is typically 8
while (mask) {
putchar(mask & ch ? '1' : '0');
mask >>= 1;
}
}
fclose(ReadBFrom);
}

Reading in raw encoded nrrd data file into double

Does anyone know how to read in a file with raw encoding? So stumped.... I am trying to read in floats or doubles (I think). I have been stuck on this for a few weeks. Thank you!
File that I am trying to read from:
http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.raw
Description of raw encoding:
hello://teem.sourceforge.net/nrrd/format.html#encoding (change hello to http to go to page)
- "raw" - The data appears on disk exactly the same as in memory, in terms of byte values and byte ordering. Produced by write() and fwrite(), suitable for read() or fread().
Info of file:
http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.nhdr - I think the only things that matter here are the big endian (still trying to understand what that means from google) and raw encoding.
My current approach, uncertain if it's correct:
//Function ripped off from example of c++ ifstream::read reference page
void scantensor(string filename){
ifstream tdata(filename, ifstream::binary); // not sure if I should put ifstream::binary here
// other things I tried
// ifstream tdata(filename) ifstream tdata(filename, ios::in)
if(tdata){
tdata.seekg(0, tdata.end);
int length = tdata.tellg();
tdata.seekg(0, tdata.beg);
char* buffer = new char[length];
tdata.read(buffer, length);
tdata.close();
double* d;
d = (double*) buffer;
} else cerr << "failed" << endl;
}
/* P.S. I attempted to print the first 100 elements of the array.
Then I print 100 other elements at some arbitrary array indices (i.e. 9,900 - 10,000). I actually kept increasing the number of 0's until I ran out of bound at 100,000,000 (I don't think that's how it works lol but I was just playing around to see what happens)
Here's the part that makes me suspicious: so the ifstream different has different constructors like the ones I tried above.
the first 100 values are always the same.
if I use ifstream::binary, then I get some values for the 100 arbitrary printing
if I use the other two options, then I get -6.27744e+066 for all 100 of them
So for now I am going to assume that ifstream::binary is the correct one. The thing is, I am not sure if the file I provided is how binary files actually look like. I am also unsure if these are the actual numbers that I am supposed to read in or just casting gone wrong. I do realize that my casting from char* to double* can be unsafe, and I got that from one of the threads.
*/
I really appreciate it!
Edit 1: Right now the data being read in using the above method is apparently "incorrect" since in paraview the values are:
Dxx,Dxy,Dxz,Dyy,Dyz,Dzz
[0, 1], [-15.4006, 13.2248], [-5.32436, 5.39517], [-5.32915, 5.96026], [-17.87, 19.0954], [-6.02961, 5.24771], [-13.9861, 14.0524]
It's a 3 x 3 symmetric matrix, so 7 distinct values, 7 ranges of values.
The floats that I am currently parsing from the file right now are very large (i.e. -4.68855e-229, -1.32351e+120).
Perhaps somebody knows how to extract the floats from Paraview?
Since you want to work with doubles, I recommend to read the data from file as buffer of doubles:
const long machineMemory = 0x40000000; // 1 GB
FILE* file = fopen("c:\\data.bin", "rb");
if (file)
{
int size = machineMemory / sizeof(double);
if (size > 0)
{
double* data = new double[size];
int read(0);
while (read = fread(data, sizeof(double), size, file))
{
// Process data here (read = number of doubles)
}
delete [] data;
}
fclose(file);
}

Storing an image file into a buffer (gif,jpeg etc).

I'm trying to load an image file into a buffer in order to send it through a scket. The problem that I'm having is that the program creates a buffer with a valid size but it does not copy the whole file into the buffer. My code is as follow
//imgload.cpp
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
using namespace std;
int main(int argc,char *argv){
FILE *f = NULL;
char filename[80];
char *buffer = NULL;
long file_bytes = 0;
char c = '\0';
int i = 0;
printf("-Enter a file to open:");
gets(filename);
f = fopen(filename,"rb");
if (f == NULL){
printf("\nError opening file.\n");
}else{
fseek(f,0,SEEK_END);
file_bytes = ftell(f);
fseek(f,0,SEEK_SET);
buffer = new char[file_bytes+10];
}
if (buffer != NULL){
printf("-%d + 10 bytes allocated\n",file_bytes);
}else{
printf("-Could not allocate memory\n");
// Call exit?.
}
while (c != EOF){
c = fgetc(f);
buffer[i] = c;
i++;
}
c = '\0';
buffer[i-1] = '\0'; // helps remove randome characters in buffer when copying is finished..
i = 0;
printf("buffer size is now: %d\n",strlen(buffer));
//release buffer to os and cleanup....
return 0;
}
> output
c:\Users\Desktop>imgload
-Enter a file to open:img.gif
-3491 + 10 bytes allocated
buffer size is now: 9
c:\Users\Desktop>imgload
-Enter a file to open:img2.gif
-1261 + 10 bytes allocated
buffer size is now: 7
From the output I can see that it's allocating the correct size for each image 3491 and 1261 bytes (i doubled checked the file sizes through windows and the sizes being allocated are correct) but the buffer sizes after supposedly copying is 9 and 7 bytes long. Why is it not copying the entire data?.
You are wrong. Image is binary data, nor string data. So there are two errors:
1) You can't check end of file with EOF constant. Because EOF is often defined as 0xFF and it is valid byte in binary file. So use feof() function to check for end of file. Or also you may check current position in file with maximal possible (you got it before with ftell()).
2) As file is binary it may contain \0 in middle. So you can't use string function to work with such data.
Also I see that you use C++ language. Tell me please why you use classical C syntax for file working? I think that using C++ features such as file streams, containers and iterators will simplify your program.
P.S. And I want to say that you program will have problems with really big files. Who knows maybe you will try to work with them. If 'yes', rewrite ftell/fseek functions to their int64 (long long int) equivalents. Also you'll need to fix array counter. Another good idea is to read file by blocks. Reading byte by byte is dramatically slower.
All this is unneeded and actually makes no sense:
c = '\0';
buffer[i-1] = '\0';
i = 0;
printf("buffer size is now: %d\n",strlen(buffer));
Don't use strlen for binary data. strlen stops at the first NUL (\0) byte. A binary file may contain many such bytes, so NUL can't be used.
-3491 + 10 bytes allocated /* There are 3491 bytes in the file. */
buffer size is now: 9 /* The first byte with the value 0. */
In conclusion, drop that part. You already have the size of the file.
You are reading a binary file like a text file. You can't check for EOF as this could be anywhere in the binary file.

What is the proper method of reading and parsing data files in C++?

What is an efficient, proper way of reading in a data file with mixed characters? For example, I have a data file that contains a mixture of data loaded from other files, 32-bit integers, characters and strings. Currently, I am using an fstream object, but it gets stopped once it hits an int32 or the end of a string. if i add random data onto the end of the string in the data file, it seems to follow through with the rest of the file. This leads me to believe that the null-termination added onto strings is messing it up. Here's an example of loading in the file:
void main()
{
fstream fin("C://mark.dat", ios::in|ios::binary|ios::ate);
char *mymemory = 0;
int size;
size = 0;
if (fin.is_open())
{
size = static_cast<int>(fin.tellg());
mymemory = new char[static_cast<int>(size+1)];
memset(mymemory, 0, static_cast<int>(size + 1));
fin.seekg(0, ios::beg);
fin.read(mymemory, size);
fin.close();
printf(mymemory);
std::string hithere;
hithere = cin.get();
}
}
Why might this code stop after reading in an integer or a string? How might one get around this? Is this the wrong approach when dealing with these types of files? Should I be using fstream at all?
Have you ever considered that the file reading is working perfectly and it is printf(mymemory) that is stopping at the first null?
Have a look with the debugger and see if I am right.
Also, if you want to print someone else's buffer, use puts(mymemory) or printf("%s", mymemory). Don't accept someone else's input for the format string, it could crash your program.
Try
for (int i = 0; i < size ; ++i)
{
// 0 - pad with 0s
// 2 - to two zeros max
// X - a Hex value with capital A-F (0A, 1B, etc)
printf("%02X ", (int)mymemory[i]);
if (i % 32 == 0)
printf("\n"); //New line every 32 bytes
}
as a way to dump your data file back out as hex.

Fastest way to create large file in c++?

Create a flat text file in c++ around 50 - 100 MB
with the content 'Added first line' should be inserted in to the file for 4 million times
using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file
The fastest way to create a file of a certain size is to simply create a zero-length file using creat() or open() and then change the size using chsize(). This will simply allocate blocks on the disk for the file, the contents will be whatever happened to be in those blocks. It's very fast since no buffer writing needs to take place.
Not sure I understand the question. Do you want to ensure that every character in the file is a printable ASCII character? If so, what about this? Fills the file with "abcdefghabc...."
#include <stdio.h>
int main ()
{
const int FILE_SiZE = 50000; //size in KB
const int BUFFER_SIZE = 1024;
char buffer [BUFFER_SIZE + 1];
int i;
for(i = 0; i < BUFFER_SIZE; i++)
buffer[i] = (char)(i%8 + 'a');
buffer[BUFFER_SIZE] = '\0';
FILE *pFile = fopen ("somefile.txt", "w");
for (i = 0; i < FILE_SIZE; i++)
fprintf(pFile, buffer);
fclose(pFile);
return 0;
}
You haven't mentioned the OS but I'll assume creat/open/close/write are available.
For truly efficient writing and assuming, say, a 4k page and disk block size and a repeated string:
open the file.
allocate 4k * number of chars in your repeated string, ideally aligned to a page boundary.
print repeated string into the memory 4k times, filling the blocks precisely.
Use write() to write out the blocks to disk as many times as necessary. You may wish to write a partial piece for the last block to get the size to come out right.
close the file.
This bypasses the buffering of fopen() and friends, which is good and bad: their buffering means that they're nice and fast, but they are still not going to be as efficient as this, which has no overhead of working with the buffer.
This can easily be written in C++ or C, but does assume that you're going to use POSIX calls rather than iostream or stdio for efficiency's sake, so it's outside the core library specification.
I faced the same problem, creating a ~500MB file on Windows very fast.
The larger buffer you pass to fwrite() the fastest you'll be.
int i;
FILE *fp;
fp = fopen(fname,"wb");
if (fp != NULL) {
// create big block's data
uint8_t b[278528]; // some big chunk size
for( i = 0; i < sizeof(b); i++ ) // custom initialization if != 0x00
{
b[i] = 0xFF;
}
// write all blocks to file
for( i = 0; i < TOT_BLOCKS; i++ )
fwrite(&b, sizeof(b), 1, fp);
fclose (fp);
}
Now at least on my Win7, MinGW, creates file almost instantly.
Compared to fwrite() 1 byte at time, that will complete in 10 Secs.
Passing 4k buffer will complete in 2 Secs.
Fastest way to create large file in c++?
Ok. I assume fastest way means the one that takes the smallest run time.
Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times.
preallocate the file using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file
create a string containing the "Added first line\n" a thousand times.
find it's length.
preallocate the file using old style file io
fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file
open the file for read/write
loop 4000 times,
writing the string to the file.
close the file.
That's my best guess.
I'm sure there are a lot of ways to do it.