fread equivalent with fstream - c++

In C one can write (disregarding any checks on purpose)
const int bytes = 10;
FILE* fp = fopen("file.bin","rb");
char* buffer = malloc(bytes);
int n = fread( buffer, sizeof(char), bytes, fp );
...
and n will contain the actual number of bytes read which could be smaller than 10 (bytes).
how do you do the equivalent in C++ ?
I have this but it seems suboptimal (feels so verbose and does extra I/O), is there a better way?
const int bytes = 10;
ifstream char> pf("file.bin",ios::binary);
vector<char> v(bytes);
pf.read(&v[0],bytes);
if ( pf.fail() )
{
pf.clear();
pf.seekg(0,SEEK_END);
n = static_cast<int>(pf.tellg());
}
else
{
n = bytes;
}
...

Call the gcount member function directly after your call to read.
pf.read(&v[0],bytes);
int n = pf.gcount();

According to http://www.cplusplus.com/reference/iostream/istream/read/:
istream& read(char* s, streamsize n);
Read block of data
Reads a block of data of n characters and stores it in the array pointed by s.
If the End-of-File is reached before n characters have been read, the array will contain all the elements read until it, and the failbit and eofbit will be set (which can be checked with members fail and eof respectively).
Notice that this is an unformatted input function and what is extracted is not stored as a c-string format, therefore no ending null-character is appended at the end of the character sequence.
Calling member gcount after this function the total number of characters read can be obtained.
So pf.gcount() tells you how many bytes were read.

pf.read(&v[0],bytes);
streamsize bytesread = pf.gcount();

Related

Weird seek behaviour in C and C++ [duplicate]

I did a sample project to read a file into a buffer.
When I use the tellg() function it gives me a larger value than the
read function is actually read from the file. I think that there is a bug.
here is my code:
EDIT:
void read_file (const char* name, int *size , char*& buffer)
{
ifstream file;
file.open(name,ios::in|ios::binary);
*size = 0;
if (file.is_open())
{
// get length of file
file.seekg(0,std::ios_base::end);
int length = *size = file.tellg();
file.seekg(0,std::ios_base::beg);
// allocate buffer in size of file
buffer = new char[length];
// read
file.read(buffer,length);
cout << file.gcount() << endl;
}
file.close();
}
main:
void main()
{
int size = 0;
char* buffer = NULL;
read_file("File.txt",&size,buffer);
for (int i = 0; i < size; i++)
cout << buffer[i];
cout << endl;
}
tellg does not report the size of the file, nor the offset
from the beginning in bytes. It reports a token value which can
later be used to seek to the same place, and nothing more.
(It's not even guaranteed that you can convert the type to an
integral type.)
At least according to the language specification: in practice,
on Unix systems, the value returned will be the offset in bytes
from the beginning of the file, and under Windows, it will be
the offset from the beginning of the file for files opened in
binary mode. For Windows (and most non-Unix systems), in text
mode, there is no direct and immediate mapping between what
tellg returns and the number of bytes you must read to get to
that position. Under Windows, all you can really count on is
that the value will be no less than the number of bytes you have
to read (and in most real cases, won't be too much greater,
although it can be up to two times more).
If it is important to know exactly how many bytes you can read,
the only way of reliably doing so is by reading. You should be
able to do this with something like:
#include <limits>
file.ignore( std::numeric_limits<std::streamsize>::max() );
std::streamsize length = file.gcount();
file.clear(); // Since ignore will have set eof.
file.seekg( 0, std::ios_base::beg );
Finally, two other remarks concerning your code:
First, the line:
*buffer = new char[length];
shouldn't compile: you have declared buffer to be a char*,
so *buffer has type char, and is not a pointer. Given what
you seem to be doing, you probably want to declare buffer as
a char**. But a much better solution would be to declare it
as a std::vector<char>& or a std::string&. (That way, you
don't have to return the size as well, and you won't leak memory
if there is an exception.)
Second, the loop condition at the end is wrong. If you really
want to read one character at a time,
while ( file.get( buffer[i] ) ) {
++ i;
}
should do the trick. A better solution would probably be to
read blocks of data:
while ( file.read( buffer + i, N ) || file.gcount() != 0 ) {
i += file.gcount();
}
or even:
file.read( buffer, size );
size = file.gcount();
EDIT: I just noticed a third error: if you fail to open the
file, you don't tell the caller. At the very least, you should
set the size to 0 (but some sort of more precise error
handling is probably better).
In C++17 there are std::filesystem file_size methods and functions, so that can streamline the whole task.
std::filesystem::file_size - cppreference.com
std::filesystem::directory_entry::file_size - cppreference.com
With those functions/methods there's a chance not to open a file, but read cached data (especially with the std::filesystem::directory_entry::file_size method)
Those functions also require only directory read permissions and not file read permission (as tellg() does)
void read_file (int *size, char* name,char* buffer)
*buffer = new char[length];
These lines do look like a bug: you create an char array and save to buffer[0] char. Then you read a file to buffer, which is still uninitialized.
You need to pass buffer by pointer:
void read_file (int *size, char* name,char** buffer)
*buffer = new char[length];
Or by reference, which is the c++ way and is less error prone:
void read_file (int *size, char* name,char*& buffer)
buffer = new char[length];
...
fseek(fptr, 0L, SEEK_END);
filesz = ftell(fptr);
will do the file if file opened through fopen
using ifstream,
in.seekg(0,ifstream::end);
dilesz = in.tellg();
would do similar

fstream::read() read empty if input size too big

I have tried to read a file by using istream& read (char* s, streamsize n). I have read the description at: http://www.cplusplus.com/reference/istream/istream/read/ saying
If the input sequence runs out of characters to extract (i.e., the end-of-file is reached) before n characters have been successfully read, the array pointed to by s contains all the characters read until that point, and both the eofbit and failbit flags are set for the stream.
Because of that I have put the n with a very large number because I trust the caller that able to allocate enough buffer to read. But I always receive 0 read, I have tried following code to read txt file with 90 bytes:
std::wstring name(L"C:\\Users\\dle\\Documents\\01_Project\\01_VirtualMachine\\99_SharedFolder\\lala.txt");
std::ifstream ifs;
ifs.open(name, ifstream::binary | ifstream::in);
if (ifs)
{
// get length of file:
ifs.seekg(0, ifs.end);
int length = ifs.tellg();
ifs.seekg(0, ifs.beg);
char *buffer = new char[length];
ifs.read(buffer, UINT32_MAX);
int success = ifs.gcount();
cout << "success: " << success << endl;
cout << "size: " << size;
ifs.close();
}
I even tried with smaller number, eg: 500,000 and it still failed. I have realized that the "n" and the size of file related somehow, the "n" could not be larger than file size too much or else it will read empty....
I know we could fix that easily by putting correct size to read() but I wonder why it happened like that? I should read till EOF then stop right? Could anyone explain to me why please?
EDIT: I just simply want to read to EOF by utilizing istream& read without caring about file size. According to the definition of istream& read(char*s, streamsize n)it should work.
ifs.read(buffer, UINT32_MAX);
The second parameter to fstream::read is std::streamsize, which is defined as (emphasis mine)...
...a signed integral type...
I therefore guess (as I don't have a Windows environment to test on at this point) that you're working on a machine where std::streamsize is 32bit, and you're looking at your UINT32_MAX ending up as a -1 (and #john testing on a machine where sizeof( std::streamsize ) > 4 so that his UINT32_MAX doesn't wrap into the negative.)
Try again with std::numeric_limits< std::streamsize >::max()... or even better yet, use length because, well, you have the file size right there and don't have to rely on the EOF behavior of fstream::read to save you.
I am not sure whether C++ changed the definition of streams from what the C standard says, but note that C's definition on binary streams states that they...
...may, however, have an implementation-defined number of null characters appended to the end of the stream.
So your, or the user's, assumption that a buffer big enough to hold the data written earlier is big enough to hold the data read till EOF might actually fail.

searching an unsigned char array for characters

I have a binary data file that I am trying to read. The values in the file are 8-bit unsigned integers, with "record" delimiters that are ASCII text ($MSG, $GRP, for example). I read the data as one big chunk, as follows:
unsigned char *inBuff = (unsigned char*)malloc(file_size*sizeof(unsigned char));
result = fread(inBuff, sizeof(unsigned char), file_size, pFile);
I need to search this array to find records that start with $GRP (so I can then read the data that follows), can someone suggest a good way to do this? I have tried several things, and none of them have worked. For example, my most recent attempt was:
std::stringstream str1;
str1 << inBuff;
std::string strTxt = str1.str();
However, when I check the length on this, it is only 5. I looked at the file in Notepad, and noticed that the sixth character is a NULL. So it seems like it is cutting off there because of the NULL. Any ideas?
Assuming the fread does not return a -1, the value in it will tell you how many bytes are available to search.
It is unreasonable to expect to be able to do a string search on binary data, as there my be NUL characters in the binary data which will cause the length function to terminate early.
One possibly way is to to search for the data is to use memcmp on the buffer, with your search key, and length of the search key.
(As per my comment)
C str functions assume zero-terminated strings. Any C string function will stop at the very first binary 0. Use memchr to locate the $ and then use strncmp or memcmp. In particular, do not assume the byte immediately after the 4-byte identifier is a binary 0.
In code (C, not tested):
/* recordId should point to a simple string such as "$GRP" */
unsigned char *find_record (unsigned char *data, size_t max_length, char *recordId)
{
unsigned char *ptr;
size_t remaining_length;
ptr = startOfData;
if (strlen(recordId) > max_length)
return NULL;
remaining_length = max_length;
do
{
/* fast scan for the first character only */
ptr = memchr (ptr, recordId[0], remaining_length);
if (!ptr)
return NULL;
/* first character matches, test entire string */
if (!memcmp (ptr, recordId, strlen(recordId))
return ptr;
/* no match; test onwards from the next possible position */
ptr++;
/* take care not to overrun end of data */
/* It's tempting to test
remaining_length = ptr - startOfData;
but there is a chance this will end up negative, and
size_t does not like to be negative.
*/
if (ptr >= startOfData+max_length)
break;
remaining_length = ptr-startOfData;
} while (1);
return NULL;
}

C++ Char pointer to char array

None of the posted answers I've read work, so I'm asking again.
I'm trying to copy the string data pointed to by a char pointer into a char array.
I have a function that reads from a ifstream into a char array
char* FileReader::getNextBytes(int numberOfBytes) {
char *buf = new char[numberOfBytes];
file.read(buf, numberOfBytes);
return buf;
}
I then have a struct :
struct Packet {
char data[MAX_DATA_SIZE]; // can hold file name or data
} packet;
I want to copy what is returned from getNextBytes(MAX_DATA_SIZE) into packet.data;
EDIT: Let me show you what I'm getting with all the answers gotten below (memcpy, strcpy, passing as parameter). I'm thinking the error comes from somewhere else. I'm reading a file as binary (it's a png). I'll loop while the fstream is good() and read from the fstream into the buf (which might be the data array). I want to see the length of what I've read :
cout << strlen(packet.data) << endl;
This returns different sizes every time:
8
529
60
46
358
66
156
After that, apparently there are no bytes left to read although the file is 13K + bytes long.
This can be done using standard library function memcpy, which is declared in / :
strcpy(packet.data, buf);
This requires file.read returns proper char series that ends with '\0'. You might also want to ensure numberOfBytes is big enough to accommodate the whole string. Otherwise you could possibly get segmentation fault.
//if buf not properly null terminated added a null char at the end
buf[numberofbytes] = "\0"
//copy the string from buf to struc
strcpy(packet.data, buf);
//or
strncpy(packet.data, buf);
Edit:
Whether or not this is being handled as a string is a very important distinction. In your question, you referred to it as a "string", which is what got us all confused.
Without any library assistance:
char result = reader.getNextBytes(MAX_DATA_SIZE);
for (int i = 0; i < MAX_DATA_SIZE; ++MAX_DATA_SIZE) {
packet.data[i] = result[i];
}
delete [] result;
Using #include <cstring>:
memcpy(packet.data, result, MAX_DATA_SIZE);
Or for extra credit, rewrite getNextBytes so it has an output parameter:
char* FileReader::getNextBytes(int numberOfBytes, char* buf) {
file.read(buf, numberOfBytes);
return buf;
}
Then it's just:
reader.getNextBytes(MAX_DATA_SIZE, packet.data);
Edit 2:
To get the length of a file:
file.seekg (0, ios::end);
int length = file.tellg();
file.seekg (0, ios::beg);
And with that in hand...
char* buffer = new char[length];
file.read(buffer, length);
Now you have the entire file in buffer.
strlen is not a valid way to determine the amount of binary data. strlen just reads until it finds '\0', nothing more. If you want to read a chunk of binary data, just use a std::vector, resize it to the amount of bytes you read from the file, and return it as value. Problem solved.

How to set maximum read length for a stream in C++?

I'm reading data from a stream into a char array of a given length, and I'd like to make the maximum width of read to be large enough to fit in that char array.
The reason I use a char array is that part of my specification is that the length of any individual token cannot exceed a certain value, so I'm saving myself some constructor calls.
I thought width() did what I wanted, but I was apparently wrong...
EDIT: I'm using the stream extraction operators to perform the extraction, since these are flat text files with values separated by whitespace.
If you're processing text, you're looking for the get function: http://cppreference.com/wiki/io/get
const int size = 200;
char myArray[size] = {};
cin.get(myArray, size);
Note: only size - 1 characters are read, which leaves a NULL terminator in myArray.
If it's raw data, you'd probably prefer read: http://cppreference.com/wiki/io/read
const int size = 200;
char myArray[size] = {};
cin.read(myArray, size);
size bytes are read.
char x[4];
cin.width(4);
cin >> x;
cout << x;
Input: "abcdef"
Output: "abc"
(x[3] is null terminating char)
Width works fine in this case.
Note: Empirical testing indicates that the cin.width call only lasts for one stream operation. It may be more convenient to use cin >> setw(4) >> x; instead, though this requires iomanip.