How should I think about the parameters of fread()? - c++

It's easy to find popular conventions for C-style I/O. What's more difficult is finding explanations as to why they are such. It's common to see a read with statements like:
fread(buffer, sizeof(buffer), 1, ptr);
How should a programmer think about using the parameters size and n of fread()?
For example, if my input file is 100 bytes, should I opt for a larger size with fewer n or read more objects of a smaller size?
If the size-to-be-read and n exceed the byte-size of an input file, what happens? Are the excess bytes that were read composed, colloquially speaking, of "junk values"?

size_t fread(void * restrict ptr, size_t size, size_t n, FILE * restrict stream);
How should a programmer think about using the parameters size and n of fread()?
When reading into an array:
size is the size of called pointer's de-referenced type.
n in the number of elements.
some_type destination[max_number_of_elements];
size_t num_read = fread(destination, sizeof *destination, max_number_of_elements, inf);
printf("Number of elements read %zu\n", num_read);
if (num_read == 0) {
if (feof(inf)) puts("Nothing read as at end-of-file");
else if (ferror(inf)) puts("Input error occurred");
else {
// since sizeof *destination and max_number_of_elements cannot be 0 here
// something strange has occurred (UB somewhere prior?)
}
For example, if my input file is 100 bytes, should I opt for a larger size with fewer n or read more objects of a smaller size?
In the case, the size of the data is 1, the max count 100.
#define MAX_FILE_SIZE 100
uint8_t destination[MAX_FILE_SIZE];
size_t num_read = fread(destination, sizeof *destination, MAX_FILE_SIZE, inf);
If the size-to-be-read and n exceed the byte-size of an input file, what happens?
The destination is not filled. Use the return value to determine.
Are the excess bytes that were read composed, colloquially speaking, of "junk values"?
No. There values before fread() remain the same, (as long as the return was not 0 and ferror() not set). If the destination was not initialized/assigned, then yes, it may be though of as junk.
Separate size, n allows fread() to function as desired even if size * n overflows size_t math. With current flat memory models, rarely is this needed.

First, the while (!feof(ptr)) is wrong and a really bad anti-pattern. There are situations where it can work, but it's almost always gratuitously more complicted that correct idiomatic usage. The return value of fread or other stdio read functions already tells you if it didn't succeed, and you usually need to be able to handle that immediately rather than waiting for the next loop iteration to start. If whatever resource you're learning from is teaching this while (!feof(ptr)) thing, you should probably stop trusting it as a source for learning C.
Now, on to your specific question about the size and n arguments: having them separate is completely gratuitous and not useful. Just pass the desired length to read for one of them, and 1 for the other. If you want to be able to determine how many bytes were already read if you hit end-of-file or an error, you need to pass 1 for size and the requested number of bytes as n. Otherwise, if any read shorter than expected is an error, it sometimes makes sense to switch them; then the only possible return values are 1 and 0 (success and error, respectively).
For an understanding of why it's the case that how you use these two arguments don't matter, all the stdio read functions, including fread, are specified as if they happened via repeated calls to fgetc. It does not matter if you have size*n such calls or n*size such calls, because multiplication of numbers commutes.

Related

How to write vector<bool> to binary file?

I need to write a boolean vector to a binary file. I searched stackoverflow for similar questions and didn't find anything that worked. I tried to write it myself, here's what I came up with:
vector<bool> bits = { 1, 0, 1, 1 };
ofstream file("file.bin", ios::out | ios::binary);
uint32_t size = (bits.size() / 8) + ((bits.size() % 8 == 0) ? 0 : 1);
file.write(reinterpret_cast<const char*>(&bits[0]), size);
I was hoping it would write 1011**** (where * is a random 1/0). But I got an error:
error C2102: '&' requires l-value
Theoretically, I could do some kind of loop and add 8 bools to char one by one, then write the char to a file, and then repeat many times. But I have a rather large vector, it will be very slow and inefficient. Is there any other way to write the entire vector at once. Bitset is not suitable since I need to add bits to the vector.
vector<bool> may or may not be packed and you can't access the internal data directly, at least not portable.
So you have to iterate over the bits one by one and combined them into bytes (yes, bytes, c++ has bytes now, don't use char, use uint8_t for older c++).
As you say writing out each byte is slow. But why would you write out each byte? You know how big the vector is so create a suitable buffer, fill it and write it out in one go. At a minimum write out chunks of bytes at once.
Since vector<bool> doesn't have the data() function, getting the address of its internal storage requires some ugly hacks (although it works for listdc++ I strongly discourage it)
file.write(
reinterpret_cast<const char*>(
*reinterpret_cast<std::uint64_t* const*>(&bits)),
size);

Aligning buffer to an N-byte boundary but not a 2N-byte one?

I would like to allocate some char buffers0, to be passed to an external non-C++ function, that have a specific alignment requirement.
The requirement is that the buffer be aligned to a N-byte1 boundary, but not to a 2N boundary. For example, if N is 64, then an the pointer to this buffer p should satisfy ((uintptr_t)p) % 64 == 0 and ((uintptr_t)p) % 128 != 0 - at least on platforms where pointers have the usual interpretation as a plain address when cast to uintptr_t.
Is there a reasonable way to do this with the standard facilities of C++11?
If not, is there is a reasonable way to do this outside the standard facilities2 which works in practice for modern compilers and platforms?
The buffer will be passed to an outside routine (adhering to the C ABI but written in asm). The required alignment will usually be greater than 16, but less than 8192.
Over-allocation or any other minor wasted-resource issues are totally fine. I'm more interested in correctness and portability than wasting a few bytes or milliseconds.
Something that works on both the heap and stack is ideal, but anything that works on either is still pretty good (with a preference towards heap allocation).
0 This could be with operator new[] or malloc or perhaps some other method that is alignment-aware: whatever makes sense.
1 As usual, N is a power of two.
2 Yes, I understand an answer of this type causes language-lawyers to become apoplectic, so if that's you just ignore this part.
Logically, to satisfy "aligned to N, but not 2N", we align to 2N then add N to the pointer. Note that this will over-allocate N bytes.
So, assuming we want to allocate B bytes, if you just want stack space, alignas would work, perhaps.
alignas(N*2) char buffer[B+N];
char *p = buffer + N;
If you want heap space, std::aligned_storage might do:
typedef std::aligned_storage<B+N,N*2>::type ALIGNED_CHAR;
ALIGNED_CHAR buffer;
char *p = reinterpret_cast<char *>(&buffer) + N;
I've not tested either out, but the documentation suggests it should be OK.
You can use _aligned_malloc(nbytes,alignment) (in MSVC) or _mm_malloc(nbytes,alignment) (on other compilers) to allocate (on the heap) nbytes of memory aligned to alignment bytes, which must be an integer power of two.
Then you can use the trick from Ken's answer to avoid alignment to 2N:
void*ptr_alloc = _mm_malloc(nbytes+N,2*N);
void*ptr = static_cast<void*>(static_cast<char*>(ptr_alloc) + N);
/* do your number crunching */
_mm_free(ptr_alloc);
We must ensure to keep the pointer returned by _mm_malloc() for later de-allocation, which must be done via _mm_free().

Why bad_alloc() exception thrown in case of size_t

I am working on the below piece of code and when I'm executing this code, I'm getting a std::bad_alloc exception:
int _tmain(int argc, _TCHAR* argv[])
{
FILE * pFile;
size_t state;
pFile = fopen("C:\\shared.tmp", "rb");
if (pFile != NULL)
{
size_t rt = fread(&state, sizeof(int), 1, pFile);
char *string = NULL;
string= new char[state + 1];
fclose(pFile);
}
return 0;
}
This below line causing exception to be thrown:
string = new char[state + 1];
Why this is happening and how can I fix this?
You're passing the address of an uninitialized 64-bit (8 bytes, on modern 64-bit systems) variable, state, and tell fread to read sizeof(int) (32 bits, 4 bytes on those same systems) bytes from the file into this variable.
This will overwrite 4 bytes of the variable with the value read, but leave the other 4 uninitialized. Which 4 bytes it overwrites depends on the architecture (the least significant on Intel CPUs, the most significant on big-endian-configured ARMs), but the result will most likely be garbage either way, because 4 bytes were left uninitialized and could contain anything.
In your case, most likely they are the most significant bytes, and contain at least one non-zero bit, meaning that you then try to allocate far beyond 4GB of memory, which you don't have.
The solution is to make state a std::uint32_t (since you apparently expect the file to contain 4 bytes representing an unsigned integer; don't forget to include <cstdint>) and pass sizeof(std::uint32_t), and in general make sure that for every fread and similar call where you pass in a pointer and a size, you make sure that the thing the pointer points to actually has exactly the size you pass along. Passing a size_t* and sizeof(int) does not fulfill these requirements on 64-bit systems, and since the size of C++'s basic types is not guaranteed, you generally don't want to use them for binary I/O at all.
There are a various things which you could improve in your C++ code, but there are a number of reasons, why you end up with this behaviour:
First, the variable state is of type size_t, but your code attempts to initialize its value using fread(&state, sizeof(int), 1, pFile);. Now, if sizeof(state) != sizeof(int) then you have undefined behaviour. If sizeof(state) < sizeof(int), then the fread statement usually overwrites some arbitrary memory after the storage for variable state. This leads to undefined behaviour (e.g. state might have some random large value, and allocation fails).
Second, if sizeof(state) > sizeof(int), then state is only partially initialized and its actual value depends on both the initialized (by fread) and the uninitialized bits. So its value can be a large number and allocation may fail.
Third, the if sizeof(state) == sizeof(int) then it just might be that the the value read is too large, and allocation simply fails because you run out of memory.
Fourth, the value you read from the file might have some different encoding or endianness. For example, if value was written to the file in big-endian format, but is fread on a little-endian CPU, might cause the bytes to be incorrectly swapped. You might need to swap the bytes before using the value read.
I suggest you instead use some fixed-width integer type from <cstdint> (or <stdint.h> for pre-C++11), such as std::uint64_t for variable state, read the value using fread(&state, sizeof(state), 1, pFile);, and then byte-swap state if the endianness of your CPU doesn't match the endianness of the value stored in the file.
You should decide what the maximum number of characters you are willing to allocate is and error out if state is greater than that. Almost certainly, it is.

C signed-integer-based attacks

I was reading this question and one of the comments mentioned C signed-integer-based attacks.
I know what is an int overflow is, but I don't understand how can this be used to attack
a program.
what exactly is meant by attacking a program ? and if you know the program has this bug, how can you use it ?
Is this only limited to signed int.
If yes then why?
and what is the case in C++ ?
my apologies if the question is trivial
For example, there was a bug in the getpeername function from FreeBSD.
To illustrate it, let's take a function void copyFromKernel(char* dest, int size) that copies from a restricted memory area size bytes.
As you might already know, the memcpy function is declared like that:
void * memcpy ( void * destination, const void * source, size_t num );
Where size_t is an unsigned type. If in our function, we do something like:
void copy_from_kernel(void *user_dest, int maxlen) {
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
}
, where KSIZE is the maximum number of bytes we want to allow for the user to copy. If the caller sends a positive value for maxlen, the function works as expected. But if the caller sends a negative value for maxlen, then the comparison would pass and memcpy's third parameter would be that negative value. As it is converted to unsigned, the number of bytes copied would be huge, thus the caller may get restricted data.
A very simple case could be an overflow on id in the following example. Imagine that id is the id of a user. And you create a ton of fake users or I don't know to create an overflow. And 0 is the id of the administrator.
if (id > 0) {
// you don't have many privileges
} else {
// you are basically root and can do evil stuff.
}
Most "anti-overflow" code is a combination of a range check followed by using the index to access memory. So if you can use int wrap-around to pass the range check (e.g. "if (index < max)" with index being negative) then you can access memory outside the intended target (e.g. "array[index] = value"). This coding mistake is less likely using unsigned.
read about bit representations of signed and unsigned integers.
basically if the int is signed... and if it is accessible by the user (for example, it is loaded from user input)... if the user puts a bigger number then the integer can contain, the int will be in result negative...
if this int is the size of something in the program, it will turn out negative.
there is no such problem with unsigned ints
You can compare it with sql injection.
You supply the input with a very large no which the integer can't store and the program may show undefined behavior.
Whenever we store a bigger no that the capacity of int it wraps around and can turn out to be a negative no, which can be used by attacker. for eg:-
int j ;
cin>>j ;
for(i=0;i<j;++i)
{
// Do some stuff
}
Now if the attacker enters a bigger no j becomes negative beacuse of wrapping around, and the for loop code is skipped by the program.

Convert short int[] to char*

How can I convert short int[] to char*?
short int test[4000];
char* test2;
I tried this:
test2 = (char*)test[4000]
Error--> PTR is not valid
Like this:
test2 = (char*)test;
test[4000] means the 4001st item of array test, not the array itself.
In general though, this is not a good idea. In the very least, your program won't be portable between big-endian and little-endian systems. Nevertheless, if you are coding for a specific microcontroller for example, it's ok.
What you are doing is likely a bad idea, but...
test2 = (char*) test;
So you have a buffer in form of an array, and you want to write the binary contents of it into a file. You do it like this:
if (fwrite(test, sizeof(test), 1, f) < 1)
{
// handle error here (write failed)
}
fwrite() function is used to write binary data to files (and fread() to read). It takes a void* pointer, so it can work with any type (and C++ implicitly converts any other pointer/array to it).
The sizeof(test) determines the exact size of the array. If you don't want to write the whole of it (i.e. just filled part of it), you want to use sizeof(short) * N, where N is the number of filled elements.
1 here means that there is one block of data to write; so fwrite() will write the whole data at once. f is the file you're writing to. And it returns the number of blocks written (so 1 on success and 0 on failure).
For completeness, I should note that's only one of the approaches to use of fwrite(). It may be a bit more semantic to use something like:
fwrite(test, sizeof(short), N, f)
but then the fwrite() may actually write only part of the data, and you will need to care about that. In other words, if it returned less than N, you'd have to retry writing the remaining part.