C++ Modulo to align my data - c++

I am compiling several smaller files into one big file.
I am trying to make it so that each small file begins at a certain granularity, in my case 4096.
Therefore I am filling the gap between each file.
To do that I used
//Have a look at the current file size
unsigned long iStart=ftell(outfile);
//Calculate how many bytes we have to add to fill the gap to fulfill the granularity
unsigned long iBytesToWrite=iStart % 4096;
//write some empty bytes to fill the gap
vector <unsigned char>nBytes;
nBytes.resize(iBytesToWrite+1);
fwrite(&nBytes[0],iBytesToWrite,1,outfile);
//Now have a look at the file size again
iStart=ftell(outfile);
//And check granularity
unsigned long iCheck=iStart % 4096;
if (iCheck!=0)
{
DebugBreak();
}
However iCheck returns
iCheck = 3503
I expected it to be 0.
Does anybody see my mistake?

iStart % 4096 is the number of bytes since the previous 4k-boundary. You want the number of bytes until the next 4k-boundary, which is (4096 - iStart % 4096) % 4096.
You could replace the outer modulo operator with an if, since it's only purpose is to correct 4096 to 0 and leave all the other values untouched. That would be worthwhile if the value of 4096 were, say, a prime. But since 4096 is actually 4096, which is a power of 2, the compiler will do the modulo operation with a bit mask (at least, provided that iStart is unsigned), so the above expression will probably be more efficient.
By the way, you're allowed to fseek a file to a position beyond the end, and the file will be filled with NUL bytes. So you actually don't have do all that work yourself:
The fseek() function shall allow the file-position indicator to be set beyond the end of existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap.
(Posix 2008)

Related

C++ - Reading number of bits per pixel from BMP file

I am trying to get number of bits per pixel in a bmp file. According to Wikipedia, it is supposed to be at 28th byte. So after reading a file:
// Przejscie do bajtu pod ktorym zapisana jest liczba bitow na pixel
plik.seekg(28, ios::beg);
// Read number of bytes used per pixel
int liczbaBitow;
plik.read((char*)&liczbaBitow, 2);
cout << "liczba bitow " << liczbaBitow << endl;
But liczbaBitow (variable that is supposed to hold number of bits per pixel value) is -859045864. I don't know where it comes from... I'm pretty lost.
Any ideas?
To clarify #TheBluefish's answer, this code has the bug
// Read number of bytes used per pixel
int liczbaBitow;
plik.read((char*)&liczbaBitow, 2);
When you use (char*)&libczbaBitow, you're taking the address of a 4 byte integer, and telling the code to put 2 bytes there.
The other two bytes of that integer are unspecified and uninitialized. In this case, they're 0xCC because that's the stack initialization value used by the system.
But if you're calling this from another function or repeatedly, you can expect the stack to contain other bogus values.
If you initialize the variable, you'll get the value you expect.
But there's another bug.. Byte order matters here too. This code is assuming that the machine native byte order exactly matches the byte order from the file specification. There are a number of different bitmap formats, but from your reference, the wikipedia article says:
All of the integer values are stored in little-endian format (i.e. least-significant byte first).
That's the same as yours, which is obviously also x86 little endian. Other fields aren't defined to be little endian, so as you proceed to decode the image, you'll have to watch for it.
Ideally, you'd read into a byte array and put the bytes where they belong.
See Convert Little Endian to Big Endian
int libczbaBitow;
unsigned char bpp[2];
plik.read(bpp, 2);
libczbaBitow = bpp[0] | (bpp[1]<<8);
-859045864 can be represented in hexadecimal as 0xCCCC0018.
Reading the second byte gives us 0x0018 = 24bpp.
What is most likely happening here, is that liczbaBitow is being initialized to 0xCCCCCCCC; while your plik.read is only writing the lower 16 bits and leaving the upper 16 bits unchanged. Changing that line should fix this issue:
int liczbaBitow = 0;
Though, especially with something like this, it's best to use a datatype that exactly matches your data:
int16_t liczbaBitow = 0;
This can be found in <cstdint>.

seekg, tellg, zero-based counting and file size

So, I am curious as I've been using this rather useful approach to get the size of the file, but something bothers me. Locking a stream to a file on the file system and
fileStream.seekg(0, std::ios::beg);
int beginning = fileStream.tellg();
yields 0. That's to be expected, we use the benefits of zero-based counting. What is interesting to me is that a file of 512 bytes would have positions in the range of [0, 511], therefore, returning the value of
fileStream.seekg(0, std::ios::end);
int end = (int)fileStream.tellg(); // We don't care for +4GB here
should yield 511 for end, because that's the last byte. The last position within the file loaded. So any buffer used to read in the stream would only get 511 bytes, rather than 512 bytes.
But it works, so you can see my confusion. What gives? I'm at a loss. Where does the +1 come?
After,
fileStream.seekg(0, std::ios::end);
the file pointer is position just after the last byte (#511). This is what the number 512 indicates. Here, 511 would mean just before the last byte.
Let's consider a file that's two bytes long:
position 0 is before the first byte;
position 1 is before the second byte;
position 2 is before the (non-existent) third byte, i.e. at the end of the file.

Question on Infinte Loop in C++

This is kind of a curiosity.
I'm studying C++. I was asked to reproduce an infinite loop, for example one that prints a series of powers:
#include <iostream>
int main()
{
int powerOfTwo = 1;
while (true)
{
powerOfTwo *= 2;
cout << powerOfTwo << endl;
}
}
The result kinda troubled me. With the Python interpreter, for example, I used to get an effective infinite loop printing a power of two each time it iterates (until the IDE would stop for exceeding iteration's limit, of course). With this C++ program instead I get a series of 0. But, if I change this to a finite loop, and that is to say I only change the condition statement to:
(powerOfTwo <= 100)
the code works well, printing 2, 4, 16, ..., 128.
So my question is: why an infinite loop in C++ works in this way? Why it seems to not evaluate the while body at all?
Edit: I'm using Code::Blocks and compiling with g++.
In the infinite loop case you see 0 because the int overflows after 32 iterations to 0 and 0*2 == 0.
Look at the first few lines of output. http://ideone.com/zESrn
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
0
0
0
In Python, integers can hold an arbitrary number of digits. C++ does not work this way, its integers only have a limited precision (normally 32 bits, but this depends on the platform). Multiplication by 2 is implemented by bitwise shifting an integer one bit to the left. What is happening is that you initially have only the first bit in the integer set:
powerOfTwo = 1; // 0x00000001 = 0b00000000000000000000000000000001
After your loop iterates 31 times, the bit will have shifted to the very last position in the integer.
powerOfTwo = -2147483648; // 0x80000000 = 0b10000000000000000000000000000000
The next multiplication by two, the bit is shifted all the way out of the integer (since it has limited precision), and you end up with zero.
powerOfTwo = 0; // 0x00000000 = = 0b00000000000000000000000000000000
From then on, you are stuck, since 0 * 2 is always 0. If you watch your program in "slow motion", you would see an initial burst of powers of 2, followed by an infinite loop of zeroes.
In Python, on the other hand, your code would work as expected - Python integers can expand to hold any arbitrary number of digits, so your single set bit will never "shift off the end" of the integer. The number will simply keep expanding so that the bit is never lost, and you will never wrap back around and get trapped at zero.
Actually it prints powers of two until powerOfTwo gets overflowed and becomes 0. Then 0*2 = 0 and so on. http://ideone.com/XUuHS
I c++ it has a limited size - so therefore is able to compute even if errror
but the whole true makes the case
In C++ you will cause an overflow pretty soon, your int variable won't be able to handle big numbers.
int: 4 bytes signed can handle the range –2,147,483,648 to 2,147,483,647
So as #freerider said, your compiler is maybe optimizing the code for you.
I guess you know all data-type concept in C,C++, so you are declaring powerOfTwo as a integer.
so the range of integer get followed accordingly, if you want an continuous loop you can use char as datatype and by using data conversion you can get infinite loop for you function.
Carefully examine the output of the program. You don't really get an infinite series of zeroes. You get 32 numbers, followed by an infinite series of zeroes.
The thirty-two numbers are the first thirty-two powers of two:
1
2
4
8
...
(2 raised to the 30th)
(2 raised to the 31st)
0
0
0
The problem is how C represents numbers, as finite quantities. Since your mathematical quantity is no longer representable in the C int, C puts some other number in its place. In particular, it puts the true value modulo 2^32. But 2^32 mod 2^32 is zero, so there you are.

C/C++ Bit Array or Bit Vector

I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -
Are they used as boolean flags?
Can one use int arrays instead? (more memory of course, but..)
What's this concept of Bit-Masking?
If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
I am looking for applications, so that I can understand better. for Eg -
Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing
numbers?
For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?
I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.
I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?
Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.
Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:
00001101
& 00001000
= 00001000
I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.
Now, in C:
int func(...)
{
int retval = 0;
if ( sometestthatmeans an error )
{
retval += 1;
}
if ( sometestthatmeans an error )
{
retval += 2;
}
return retval
}
int anotherfunc(...)
{
uint8_t x = func(...)
/* binary and with 8 and shift 3 plaes to the right
* so that the resultant expression is either 1 or 0 */
if ( ( ( x & 0x08 ) >> 3 ) == 1 )
{
/* that error occurred */
}
}
Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.
I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.
regarding the usage the bits array :
if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).
after reading all numbers - the missing numbers are the indices of the zeros in the array.
for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0.
if we read the numbers : 3, 2, 2
the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers
BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory
Sivan
Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed.
This gets handy if you have lots of such data so you save memory at large.
Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.
But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.
Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.
We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.
There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).
Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.
#include <stdio.h>
#include <errno.h>
#include <stdint.h>
int main(){
/* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
{
printf("writing test file\n");
int x = 0;
FILE * f = fopen("testfile.txt", "w");
for (x=0; x < 1000000; ++x){
if (x == 765 || x == 777760 || x == 777791){
continue;
}
fprintf(f, "%d\n", x);
}
fprintf(f, "%d\n", 57768); /* this one is a duplicate */
fclose(f);
}
uint32_t bitarray[1000000 / 32];
/* read file containing integers in the range [1,1000000] */
/* any non number is considered as separator */
/* the goal is to find missing numbers */
printf("Reading test file\n");
{
unsigned int x = 0;
FILE * f = fopen("testfile.txt", "r");
while (1 == fscanf(f, " %u",&x)){
bitarray[x / 32] |= 1 << (x % 32);
}
fclose(f);
}
/* find missing number in bitarray */
{
int x = 0;
for (x=0; x < (1000000 / 32) ; ++x){
int n = bitarray[x];
if (n != (uint32_t)-1){
printf("Missing number(s) between %d and %d [%x]\n",
x * 32, (x+1) * 32, bitarray[x]);
int b;
for (b = 0 ; b < 32 ; ++b){
if (0 == (n & (1 << b))){
printf("missing number is %d\n", x*32+b);
}
}
}
}
}
}
That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.

How is a pipe reading with a size of 4 bytes into a 4 byte int returning more data?

Reading from a pipe:
unsigned int sample_in = 0; //4 bytes - 32bits, right?
unsigned int len = sizeof(sample_in); // = 4 in debugger
while (len > 0)
{
if (0 == ReadFile(hRead,
&sample_in,
sizeof(sample_in),
&bytesRead,
0))
{
printf("ReadFile failed\n");
}
len-= bytesRead; //bytesRead always = 4, so far
}
In the debugger, first iteration through:
sample_in = 536739282 //36 bits?
How is this possible if sample in is an unsigned int? I think I'm missing something very basic, go easy on me!
Thanks
Judging from your comment that says //36 bits? I suspect that you're expecting the data to be sent in a BCD-style format: In other words, where each digit is a number that takes up four bits, or two digits per byte. This way would result in wasted space however, you would use four bits, but values "10" to "15" aren't used.
In fact integers are represented in binary internally, thus allowing a 32-bit number to represent up to 2^32 different values. This comes out to 4,294,967,295 (unsigned) which happens to be rather larger than the number you saw in sample_in.
536739282 is well within the maximum boundary of an unsigned 4 byte integer, which is upwards of 4 billion.
536,739,282 will easily fit in an unsigned int and 32bits. The cap on an unsigned int is 4,200,000,000 or so.
unsigned int, your 4 byte unsigned integer, allows for values from 0 to 4,294,967,295. This will easily fit your value of 536,739,282. (This would, in fact, even fit in a standard signed int.)
For details on allowable ranges, see MSDN's Data Type Ranges page for C++.