seekg, tellg, zero-based counting and file size - c++

So, I am curious as I've been using this rather useful approach to get the size of the file, but something bothers me. Locking a stream to a file on the file system and
fileStream.seekg(0, std::ios::beg);
int beginning = fileStream.tellg();
yields 0. That's to be expected, we use the benefits of zero-based counting. What is interesting to me is that a file of 512 bytes would have positions in the range of [0, 511], therefore, returning the value of
fileStream.seekg(0, std::ios::end);
int end = (int)fileStream.tellg(); // We don't care for +4GB here
should yield 511 for end, because that's the last byte. The last position within the file loaded. So any buffer used to read in the stream would only get 511 bytes, rather than 512 bytes.
But it works, so you can see my confusion. What gives? I'm at a loss. Where does the +1 come?

After,
fileStream.seekg(0, std::ios::end);
the file pointer is position just after the last byte (#511). This is what the number 512 indicates. Here, 511 would mean just before the last byte.
Let's consider a file that's two bytes long:
position 0 is before the first byte;
position 1 is before the second byte;
position 2 is before the (non-existent) third byte, i.e. at the end of the file.

Related

Is there a way to convert an array of bytes to a number in C++?

I have read 4 bytes as an array from a file from a SD-Card on an Arduino Mega. Now I want to convert this array in one number, so that I can work with the number as integer(The bytes are a length of the next File section). Is there any included function for my problem or must I code my own?
I read the File into the byte array with the file.read() function from SDFat:
byte array[4]; //creates the byte array
file.read(array,4); //reads 4 bytes from the file and stores it in the array
I hope, you can understand my Problem.
It depends on the endianess of the stored bytes.
If the endianess matches the one of your target system (I believe the Atmegas are big endian) you can just do
int32_t number = *(int32_t*)array;
to get a 32 bit integer.
If the endianess is not matching you have to shift the bytes around yourself, for a little endian encoded number:
int32_t number = uint32_t(array[3]) << 24 | uint32_t(array[2]) << 16 | uint32_t(array[1]) << 8 | uint32_t(array[0]);

C++/C: Prepend length to Char[] in bytes (binary/hex)

I'm looking to send UDP datagrams from a client, to a server, and back.
The server needs to add a header to each datagram (represented as a char[]) in byte format, which I've struggled to find examples of. I know how to send it as actual text characters, but I want to send it perhaps as "effectively" binary form (eg, if the length were to be 40 bytes then I'd want to prepend 0x28 , or the 2 byte unsigned equivalent, rather than as '0028' in ASCII char form or similar, which would be 4 bytes instead of a potential 2.
As far as I can work out my best option is below:
unsigned int length = dataLength; //length of the data received
char test[512] = { (char)length };
Is this approach valid, or will it cause problems later?
Further, this gives me a hard limit of 255 if I'm not mistaken. How can I best represent it as 2 bytes to extend my maximum length.
EDIT: I need the length of each datagram to be prepended because I will be building each datagram into a larger frame, and the recipient needs to be able to take the frame apart into each information element, which I think means I should need the length included so the recipient and work out where each element ends and the next begins
You probably need something like this:
char somestring[] = "Hello World!";
char sendbuffer[1000];
int length = strlen(somestring);
sendbuffer[0] = length % 0xff; // put LSB of length
sendbuffer[1] = (length >> 8) & 0xff; // put MSB of length
strcpy(&sendbuffer[2], somestring); // copy the string right after the length
sendbuffer is the buffer that will be sent; I fixed it to a maximum length of 1000 allowing for strings up to an length of 997 beeing sent (1000 - 2 bytes for the length - 1 byte for NUL terminator).
LSB means least significant byte and MSB means most significant byte. Here we put the LSB first and the MSB second, this convention is called little endian, the other way round would be big endian. You need to be sure that on the receiver side that the length is correctly decoded. If the architecture on the receiver side has an other endianness than the sender, the length on the receiver side may be decoded wrong depending on the code. Google "endianness" for more details.
sendbuffer will look like this in memory:
0x0c 0x00 0x48 0x65 0x6c 0x6c ...
| 12 |'H' |'e' |'l' |'l '| ...
//... Decoding (assuming short is a 16 bit type on the receiver side)
// first method (won't work if endiannnes is different on receiver side)
int decodedlength = *((unsigned short*)sendbuffer);
// second method (endiannness safe)
int decodedlength2 = (unsigned char)sendbuffer[0] | (unsigned char)sendbuffer[1] << 8;
char decodedstring[1000];
strcpy(decodedstring, &sendbuffer[2]);
Possible optimisation:
If the majority of the strings you send have a length shorter than 255, you can optimize and not prepending systematically two bytes but only one byte most of the time, but that's another story.

C++ - Reading number of bits per pixel from BMP file

I am trying to get number of bits per pixel in a bmp file. According to Wikipedia, it is supposed to be at 28th byte. So after reading a file:
// Przejscie do bajtu pod ktorym zapisana jest liczba bitow na pixel
plik.seekg(28, ios::beg);
// Read number of bytes used per pixel
int liczbaBitow;
plik.read((char*)&liczbaBitow, 2);
cout << "liczba bitow " << liczbaBitow << endl;
But liczbaBitow (variable that is supposed to hold number of bits per pixel value) is -859045864. I don't know where it comes from... I'm pretty lost.
Any ideas?
To clarify #TheBluefish's answer, this code has the bug
// Read number of bytes used per pixel
int liczbaBitow;
plik.read((char*)&liczbaBitow, 2);
When you use (char*)&libczbaBitow, you're taking the address of a 4 byte integer, and telling the code to put 2 bytes there.
The other two bytes of that integer are unspecified and uninitialized. In this case, they're 0xCC because that's the stack initialization value used by the system.
But if you're calling this from another function or repeatedly, you can expect the stack to contain other bogus values.
If you initialize the variable, you'll get the value you expect.
But there's another bug.. Byte order matters here too. This code is assuming that the machine native byte order exactly matches the byte order from the file specification. There are a number of different bitmap formats, but from your reference, the wikipedia article says:
All of the integer values are stored in little-endian format (i.e. least-significant byte first).
That's the same as yours, which is obviously also x86 little endian. Other fields aren't defined to be little endian, so as you proceed to decode the image, you'll have to watch for it.
Ideally, you'd read into a byte array and put the bytes where they belong.
See Convert Little Endian to Big Endian
int libczbaBitow;
unsigned char bpp[2];
plik.read(bpp, 2);
libczbaBitow = bpp[0] | (bpp[1]<<8);
-859045864 can be represented in hexadecimal as 0xCCCC0018.
Reading the second byte gives us 0x0018 = 24bpp.
What is most likely happening here, is that liczbaBitow is being initialized to 0xCCCCCCCC; while your plik.read is only writing the lower 16 bits and leaving the upper 16 bits unchanged. Changing that line should fix this issue:
int liczbaBitow = 0;
Though, especially with something like this, it's best to use a datatype that exactly matches your data:
int16_t liczbaBitow = 0;
This can be found in <cstdint>.

C++ Modulo to align my data

I am compiling several smaller files into one big file.
I am trying to make it so that each small file begins at a certain granularity, in my case 4096.
Therefore I am filling the gap between each file.
To do that I used
//Have a look at the current file size
unsigned long iStart=ftell(outfile);
//Calculate how many bytes we have to add to fill the gap to fulfill the granularity
unsigned long iBytesToWrite=iStart % 4096;
//write some empty bytes to fill the gap
vector <unsigned char>nBytes;
nBytes.resize(iBytesToWrite+1);
fwrite(&nBytes[0],iBytesToWrite,1,outfile);
//Now have a look at the file size again
iStart=ftell(outfile);
//And check granularity
unsigned long iCheck=iStart % 4096;
if (iCheck!=0)
{
DebugBreak();
}
However iCheck returns
iCheck = 3503
I expected it to be 0.
Does anybody see my mistake?
iStart % 4096 is the number of bytes since the previous 4k-boundary. You want the number of bytes until the next 4k-boundary, which is (4096 - iStart % 4096) % 4096.
You could replace the outer modulo operator with an if, since it's only purpose is to correct 4096 to 0 and leave all the other values untouched. That would be worthwhile if the value of 4096 were, say, a prime. But since 4096 is actually 4096, which is a power of 2, the compiler will do the modulo operation with a bit mask (at least, provided that iStart is unsigned), so the above expression will probably be more efficient.
By the way, you're allowed to fseek a file to a position beyond the end, and the file will be filled with NUL bytes. So you actually don't have do all that work yourself:
The fseek() function shall allow the file-position indicator to be set beyond the end of existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap.
(Posix 2008)

sprintf_s problem

I have a funny problem using this function.
I use it as follow:
int nSeq = 1;
char cBuf[8];
int j = sprintf_s(cBuf, sizeof(cBuf), "%08d", nSeq);
And every time I get an exception. The exception is buffer to small.
When I changed the second field in the function to sizeof(cBuf) + 1.
Why do I need to add one if I only want to copy 8 bytes and I have an array that contains 8 bytes?
Your buffer contains 8 places. Your string contains 8 characters and a null character to close it.
Your string will require terminating '\0' and 8 bytes of data(00000001) due to %08d.
So you have to size as 9.
All sprintf functions add a null to terminate a string. So in effect your string is 9 characters long. 8 bytes of text and the ending zero