I would like to know if I can use fread to read data into an integer buffer.
I see fread() takes void * as the first parameter. So can't I just pass an integer
buffer (typecast to void *) and then use this to read howmuchevery bytes I want to from the file, as long as the buffer is big enough ?
ie. cant i do:
int buffer[10];
fread((void *)buffer, sizeof(int), 10, somefile);
// print contents of buffer
for(int i = 0; i < 10; i++)
cout << buffer[i] << endl;
What is wrong here ?
Thanks
This should work if you wrote the ints to the file using something like fwrite ("binary" write). If the file is human-readable (you can open it with a text editor and see numbers that make sense) you probably want fscanf / cin.
As others have mentioned fread should be able to do what you want
provided the input is in the binary format you expect. One caveat
I would add is that the code will have platform dependencies and
will not function correctly if the input file is moved between
platforms with differently sized integers or different
endian-nesses (sp).
Also, you should always check your return values; fread could fail.
Yes you can use fread to read into an array of integers
int buffer[10];
size_t readElements = fread((void *)buffer, sizeof(int), 10, somefile);
for(int i = 0; i < readElements; i++)
cout << buffer[i] << endl
You can check the number of elements fread returns to print out.
EDIT: provided you are reading from a file in binary mode and the values were written as cnicutar mentioned with fwrite.
I was trying the same and was getting the same result as yours, large int value when trying to read integer using fread() from a file and finally got the reason for it.
So suppose if your input file contains only:
"5"
"5 5 5"
The details I got from http://www.programmersheaven.com/mb/beginnercpp/396198/396198/fread-returns-invalid-integer/
fread() reads binary data (even if the file is opened in 'text'-mode). The number 540352565 in hex is 0x20352035, the 0x20 is the ASCII code of a space and 0x35 is the ASCII code of a '5' (they are in reversed order because using a little-endian machine).
So what fread does is read the ASCII codes from the file and builds an int from it, expecting binary data. This should explain the behavior when reading the '5 5 5' file. The same happens when reading the file with a single '5', but only one byte can be read (or two if it is followed by a newline) and fread should fail if it reads less than sizeof(int) bytes, which is 4 (in this case).
As the reaction to response is that it still does not work, I will provide here complete code, so you can try it out.
Please note that following code does NOT contain proper checks, and CAN crash if file does not exist, there is no memory left, no rights, etc.
In code should be added check for each open, close, read, write operations.
Moreover, I would allocate the buffer dynamically.
int* buffer = new int[10];
That is because I do not feel good when normal array is taken as pointer. But whatever. Please also note, that using correct type (uint32_t, 16, 8, int, short...) should be done to save space, according to number range.
Following code will create file and write there correct data that you can then read.
FILE* somefile;
somefile = fopen("/root/Desktop/CAH/scripts/cryptor C++/OUT/TOCRYPT/wee", "wb");
int buffer[10];
for(int i = 0; i < 10; i++)
buffer[i] = 15;
fwrite((void *)buffer, sizeof(int), 10, somefile);
// print contents of buffer
for(int i = 0; i < 10; i++)
cout << buffer[i] << endl;
fclose(somefile);
somefile = fopen("/root/Desktop/CAH/scripts/cryptor C++/OUT/TOCRYPT/wee", "rb");
fread((void *)buffer, sizeof(int), 10, somefile);
// print contents of buffer
for(int i = 0; i < 10; i++)
cout << buffer[i] << endl;
fclose(somefile);
Related
I want to read double values from a binary file and store them in a vector. My values have the following form: 73.6634, 73.3295, 72.6764 and so on. I have this code that read and store data in memory. It works perfectly with char types since the read function has as input a char type (istream& read (char* s, streamsize n)). When I try to convert char type to double I get obviously integer values as 74, 73, 73 and so on. Is there any function which allows me to read directly double values or any other way of doing that?
If I change char * memblock to double * memblock and memblock = new char[] to memblock = new double[] , I get errors when compiling because again read function can only have char type input variable...
Thanks, I will appreciate your help :)
// reading an entire binary file
#include <iostream>
#include <fstream>
using namespace std;
int main () {
streampos size;
char * memblock;
int i=0;
ifstream file ("example.bin", ios::in|ios::binary|ios::ate);
if (file.is_open())
{
size = file.tellg();
cout << "size=" << size << "\n";
memblock = new char [size];
file.seekg (0, ios::beg);
file.read (memblock, size);
file.close();
cout << "the entire file content is in memory \n";
for(i=0; i<=10; i++)
{
double value = memblock [i];
cout << "value ("<<i<<")=" << value << "\n";
}
};
delete[] memblock;
}
else cout << "Unable to open file";
return 0;
}
(sorry about the "Like I'm 5" tone, I have no idea how much you know or don't)
Intro Binary Data
As you probably know, your computer doesn't think about numbers the way you do.
To start, the computer thinks about all numbers in a "base 2" system. But it doesn't stop there. Your computer also associates a fixed size to all the numbers. It creates a fixed "width" of the numbers. This size is (almost always) in bytes, or groups of 4 digits. This is (pretty close to) the equivalent of, when you do math on the numbers [1,15,30002] you look at all the numbers as
[
00000001
00000015
00030002
]
(doubles are a little weirder, but I'll get to that in a second).
Lets pretend for demonstrative purposes that each 2 characters above represent a single byte of data. This means that, in the computer, it thinks about the numbers like this:
[
00,00,00,01
00,00,00,15
00,03,00,02
]
File IO is all done along a "byte"(char) size: it typically has no idea what it is reading. It is up to YOU to figure that out. When writing binary data to a file (from an array atleast) we just dump it all. So in the example above, if we write it all to the file like this:
[00,00,00,01,00,00,00,15,00,03,00,02]
But you'll have to reinterpret it, back into the type of 4 bytes.
Luckily, this is stupidly easy to do in c++:
size = file.tellg();
cout << "size=" << size << "\n";
memblock = new char [size];
file.seekg (0, ios::beg);
file.read (memblock, size);
file.close();
cout << "the entire file content is in memory \n";
double* double_values = (double*)memblock;//reinterpret as doubles
for(i=0; i<=10; i++)
{
double value = double_values[i];
cout << "value ("<<i<<")=" << value << "\n";
}
What this basically does is say, interpret those bytes (char) as double.
edit: Endianness
Endiannessis (again, LI5) the order of which the computer writes the number. You are used to fifteen being written left to right (25, twenty-five) but it would be just as valid to write the number from right to left (52, five-twenty). We have big-endian (Most Significan Byte at lowest address) and little-endian (MSB at highest address).
This was never standardized between architectures or virtual machines...but if they disagree you can get weird results.
A special case: doubles
Not really in line with your question, but I have to point out that doubles are a special case: while reading and writing looks the same, the underlying data isn't just a simple number. I like to think of doubles as the "scientific notation" of computers. The double standard uses a base and power to get your number. in the same amount of space as a long it stores (sign)(a^x). This gives a much larger dynamic range of representation of the values, BUT you loose a certain sense of "human readability" of the bytes, and you get the SAME number of values so you can loose precision (though its relative precision, just like scientific notation, so you may not be able to distinguish from a billion and 1 from a billion and 2, but that 1 and 2 are TINY compared to the number).
writing data in C++
We might as well point out one quirk of C++: you gotta make sure when you write the data, it doesn't try to reformat the file to ascii. http://www.cplusplus.com/forum/general/21018/
The issue is this -- there is no guarantee that binary data written by another program (you said Matlab) can be read back by another program by merely casting, unless you know that the data written by this secondary program is the same as data written by your program.
It may not be sufficient to just cast -- you need to know the exact form of the data that is written. You need to know the binary format (for example IEEE), the number of bytes each value occupies, endianess, etc. so that you can interpret the data correctly.
What you should do is this -- write a small program that writes out the number you claim this file has to another file. Then look at the file you just wrote in a hex editor. Then take the file you're attempting to read that was created by MatLab and compare the contents side-by-side with the one you just wrote. Do you see a pattern? If not, then either you have to find one, or forget about it and get the two files to be the same.
I wrote a simple program to read a TXT file. The problem is the file contains some '\0' characters. Here's a sample :
And here's the solution I've found to solve my problem :
FILE *pInput = fopen("Encoded.txt", "rb");
fseek(pInput, 0, SEEK_END);
size_t size = ftell(pInput);
fseek(pInput, 0, SEEK_SET);
char *buffer = new char[size];
for (int i = 0; i < size; i++)
buffer[i] = fgetc(pInput);
I would like to replace the following code :
for (int i = 0; i < size; i++)
buffer[i] = fgetc(pInput);
By just a simple function call. Is there a function which can do this job ?
I tried with fread, fgets but they stop to read at the first '\0' character.
Thanks a lot in advance for your help.
fread is fine for reading arbitrary binary; it returns the number of elements read, which is a value you should store and use in all dealings with your buffer. (Read some documentation on fread to find out how it works.)
(On the other hand, with fgets you won't be able to find out how many characters were read because a pointer to a [assumedly null-terminated] C-string is all you get out of it.)
You need to ensure that your handling of your resultant buffer is zero-safe. That means no strlen or the like, which are all designed to work on ASCII input (more or less).
Quoting cplusplus.com and removing the plumbering that you'll find in the link:
// Open the file with the pointer at the end
ifstream file("example.bin", ios::in|ios::binary|ios::ate);
// Get the file size
streampos size = file.tellg();
// Allocate a block
char* memblock = new char [size];
// We were at the end go to the begining
file.seekg 0, ios::beg);
// Read the whole file
file.read(memblock, size);
Et voilĂ !
I'm trying to implement an i/o intensive quicksort (C++ qsort) on a very large dataset. In the interests of speed, I'd like to read in a chunk of data at a time into a buffer and then use qsort to sort it inside the buffer. (I am currently working with text files but would like to move to binary soon.) However, my data is composed of variable-length records, and qsort needs to be told the length of the record in order to sort. Is there any way to standardize this? The only thing I could think of was rather convoluted: my program currently reads from the buffer until it hits a linefeed character ('10' in ascii), transferring each character over to another array. When it finds a linefeed (the delimiter in the input file), it fills the number of spaces remaining in the buffer for that record (record size is set to 30) with null characters. This way, I should end up with a buffer full of fixed-size records to give qsort.
I know there are several problems with my approach, one being that it's just clumsy, another that the record size might conceivably be larger than 30, but is generally much less. Is there a better way of doing this?
As well, my current code doesn't even work. When I debug it, it seems to be transferring characters from one buffer to the other, but when I try to print out the buffer, it contains only the first record.
Here is my code:
FILE *fp;
unsigned char *buff;
unsigned char *realbuff;
FILE *inputFiles[NUM_INPUT_FILES];
buff = (unsigned char *) malloc(2048);
realbuff = (unsigned char *) malloc(NUM_RECORDS * RECORD_SIZE);
fp = fopen("postings0.txt", "r");
if(fp)
{
fread(buff, 1, 2048, fp);
/*for(int i=0; i <30; i++)
cout << buff[i] <<endl;*/
int y=0;
int recordcounter = 0;
//cout << buff;
for(int i=0;i <100; i++)
{
if(buff[i] != char(10))
{
realbuff[y] = buff[i];
y++;
recordcounter++;
}
else
{
if(recordcounter < RECORD_SIZE)
for(int j=recordcounter; j < RECORD_SIZE;j++)
{
realbuff[y] = char(0);
y++;
}
recordcounter = 0;
}
}
cout << realbuff <<endl;
cout << buff;
}
else
cout << "sorry";
Thank you very much,
bsg
The qsort function can only work on fixed length record (like you say). In order to sort variable length records, you need an array of pointers to them and then have qsort sort the array of pointers. This may be more efficient too, as pointers are much faster to move around than large chunks of data are.
The same goes for std::sort, which would be recommended because it is type safe. Just be sure to supply a comparison predicate (a less than function) taking pointers as its arguments as the third parameter.
How about using c++ file streams for parsing your file ?
Checkout this example(website name is strange, no offense!!) which returns the record as a STL vector
and then you can use STL Sort algorithm.
What is an efficient, proper way of reading in a data file with mixed characters? For example, I have a data file that contains a mixture of data loaded from other files, 32-bit integers, characters and strings. Currently, I am using an fstream object, but it gets stopped once it hits an int32 or the end of a string. if i add random data onto the end of the string in the data file, it seems to follow through with the rest of the file. This leads me to believe that the null-termination added onto strings is messing it up. Here's an example of loading in the file:
void main()
{
fstream fin("C://mark.dat", ios::in|ios::binary|ios::ate);
char *mymemory = 0;
int size;
size = 0;
if (fin.is_open())
{
size = static_cast<int>(fin.tellg());
mymemory = new char[static_cast<int>(size+1)];
memset(mymemory, 0, static_cast<int>(size + 1));
fin.seekg(0, ios::beg);
fin.read(mymemory, size);
fin.close();
printf(mymemory);
std::string hithere;
hithere = cin.get();
}
}
Why might this code stop after reading in an integer or a string? How might one get around this? Is this the wrong approach when dealing with these types of files? Should I be using fstream at all?
Have you ever considered that the file reading is working perfectly and it is printf(mymemory) that is stopping at the first null?
Have a look with the debugger and see if I am right.
Also, if you want to print someone else's buffer, use puts(mymemory) or printf("%s", mymemory). Don't accept someone else's input for the format string, it could crash your program.
Try
for (int i = 0; i < size ; ++i)
{
// 0 - pad with 0s
// 2 - to two zeros max
// X - a Hex value with capital A-F (0A, 1B, etc)
printf("%02X ", (int)mymemory[i]);
if (i % 32 == 0)
printf("\n"); //New line every 32 bytes
}
as a way to dump your data file back out as hex.
Create a flat text file in c++ around 50 - 100 MB
with the content 'Added first line' should be inserted in to the file for 4 million times
using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file
The fastest way to create a file of a certain size is to simply create a zero-length file using creat() or open() and then change the size using chsize(). This will simply allocate blocks on the disk for the file, the contents will be whatever happened to be in those blocks. It's very fast since no buffer writing needs to take place.
Not sure I understand the question. Do you want to ensure that every character in the file is a printable ASCII character? If so, what about this? Fills the file with "abcdefghabc...."
#include <stdio.h>
int main ()
{
const int FILE_SiZE = 50000; //size in KB
const int BUFFER_SIZE = 1024;
char buffer [BUFFER_SIZE + 1];
int i;
for(i = 0; i < BUFFER_SIZE; i++)
buffer[i] = (char)(i%8 + 'a');
buffer[BUFFER_SIZE] = '\0';
FILE *pFile = fopen ("somefile.txt", "w");
for (i = 0; i < FILE_SIZE; i++)
fprintf(pFile, buffer);
fclose(pFile);
return 0;
}
You haven't mentioned the OS but I'll assume creat/open/close/write are available.
For truly efficient writing and assuming, say, a 4k page and disk block size and a repeated string:
open the file.
allocate 4k * number of chars in your repeated string, ideally aligned to a page boundary.
print repeated string into the memory 4k times, filling the blocks precisely.
Use write() to write out the blocks to disk as many times as necessary. You may wish to write a partial piece for the last block to get the size to come out right.
close the file.
This bypasses the buffering of fopen() and friends, which is good and bad: their buffering means that they're nice and fast, but they are still not going to be as efficient as this, which has no overhead of working with the buffer.
This can easily be written in C++ or C, but does assume that you're going to use POSIX calls rather than iostream or stdio for efficiency's sake, so it's outside the core library specification.
I faced the same problem, creating a ~500MB file on Windows very fast.
The larger buffer you pass to fwrite() the fastest you'll be.
int i;
FILE *fp;
fp = fopen(fname,"wb");
if (fp != NULL) {
// create big block's data
uint8_t b[278528]; // some big chunk size
for( i = 0; i < sizeof(b); i++ ) // custom initialization if != 0x00
{
b[i] = 0xFF;
}
// write all blocks to file
for( i = 0; i < TOT_BLOCKS; i++ )
fwrite(&b, sizeof(b), 1, fp);
fclose (fp);
}
Now at least on my Win7, MinGW, creates file almost instantly.
Compared to fwrite() 1 byte at time, that will complete in 10 Secs.
Passing 4k buffer will complete in 2 Secs.
Fastest way to create large file in c++?
Ok. I assume fastest way means the one that takes the smallest run time.
Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times.
preallocate the file using old style file io
fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file
create a string containing the "Added first line\n" a thousand times.
find it's length.
preallocate the file using old style file io
fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file
open the file for read/write
loop 4000 times,
writing the string to the file.
close the file.
That's my best guess.
I'm sure there are a lot of ways to do it.