Started working on screen capturing software specifically targeted for Windows. While looking through an example on MSDN for Capturing an Image I found myself a bit confused.
Keep in mind when I refer to the size of the bitmap that does not include headers and so forth associated with an actual file. I'm talking about raw pixel data. I would have thought that the formula should be (width*height)*bits-per-pixel. However, according to the example this is the proper way to calculate the size:
DWORD dwBmpSize = ((bmpScreen.bmWidth * bi.biBitCount + 31) / 32) * 4 * bmpScreen.bmHeight;
and or: ((width*bits-per-pixel + 31) / 32) * 4 * height
I don't understand why there's the extra calculations involving 31, 32 and 4. Perhaps padding? I'm not sure but any explanations would be quite appreciated. I've already tried Googling and didn't find any particularly helpful results.
The bits representing the bitmap pixels are packed in rows. The size of each row is rounded up to a multiple of 4 bytes (a 32-bit DWORD) by padding.
(bits_per_row + 31)/32 * 4 ensures the round up to the next multiple of 32 bits. The answer is in bytes, rather than bits hence *4 rather than *32.
See: https://en.wikipedia.org/wiki/BMP_file_format
Under Bitmap Header Types you'll find the following:
The scan lines are DWORD aligned [...]. They must be padded for scan line widths, in bytes, that are not evenly divisible by four [...]. For example, a 10- by 10-pixel 24-bpp bitmap will have two padding bytes at the end of each scan line.
The formula
((bmpScreen.bmWidth * bi.biBitCount + 31) / 32) * 4
establishes DWORD-alignment (in bytes). The trailing * 4 is really the result of * 32 / 8, where the multiplication with 32 produces a value that's a multiple of 32 (in bits), and the division by 8 translates it back to bytes.
Although this does produce the desired result, I prefer a different implementation. A DWORD is 32 bits, i.e. a power of 2. Rounding up to a power of 2 can be implemented using the following formula:
(value + ((1 << n) - 1)) & ~((1 << n) - 1)
Adding (1 << n) - 1 adjusts the initial value to go past the next n-th power of 2 (unless it already is an n-th power of 2). (1 << n) - 1 evaluates to a value, where the n least significant bits are set, ~((1 << n) - 1) negates that, i.e. all bits but the n least significant bits are set. This serves as a mask to remove the n least significant bits of the adjusted initial value.
Applied to this specific case, where a DWORD is 32 bits, i.e. n is 5, and (1 << n) - 1 evaluates to 31. value is the raw scanline width in bits:
auto raw_scanline_width_in_bits{ bmpScreen.bmWidth * bi.biBitCount };
auto aligned_scanline_width_in_bits{ (raw_scanline_width_in_bits + 31) & ~31 };
auto aligned_scanline_width_in_bytes{ raw_scanline_width_in_bits / 8 };
This produces the same results, but provides a different perspective, that may be more accessible to some.
Related
What does & ~(minOffsetAlignment - 1) mean?
Does it mean address of destructor?
This is the code snippet I got it from.
VkDeviceSize is uint64_t.
VkDeviceSize getAlignment(VkDeviceSize instanceSize, VkDeviceSize minOffsetAlignment)
{
if (minOffsetAlignment > 0)
{
return (instanceSize + minOffsetAlignment - 1) & ~(minOffsetAlignment - 1);
}
return instanceSize;
}
It's a common trick to align a certain number to a certain boundary. It assumes that minOffsetAlignment is a power of two.
If minOffsetAlignment is a power of two, its binary form will be a one followed by zeros. For example if 8, the binary will be b00001000.
If you subtract one, it will become a mask where all the bits that can be changed will be flagged as one. For the same example, it will be b00000111 (all numbers from 0 to 7).
If you take the complement of this number, it becomes a mask for clearing. In the example, b11111000.
If you AND (&) any number against this mask, it will have the effect to zero all the bits relative to numbers below the alignment.
For example, say I have the number 9 b00001001. Doing 9&7 is b00001001 & b11111000 which results in b00001000 or 8.
The resulting value is the computed value aligned for the given amount.
What does &~ mean?
In this case, & is the bitwise and operator. It is a binary operator. Each bit of the result is set if the corresponding bit is set for both operands, otherwise the bit is unset. In this case, the left hand operand is (instanceSize + minOffsetAlignment - 1) and the right hand operand is ~(minOffsetAlignment - 1)
~ is the bitwise not operator. It is a unary operator. Each bit of the result is set if the corresponding bit is unset in the operand, otherwise the bit is unset. In this case, the operand is (minOffsetAlignment - 1)
The idea of this code is to say, "Given a particular (object) instance size, how many bytes of memory are used if the allocator has to round-up object sizes to some power-of-two multiple." Note that if the instance size is already a multiple of the power of two (e.g. 8), then no rounding-up is needed.
It is often the case that memory needs to be allocated in multiples of the hardware's word size in bytes. On a 32-bit platform that would mean a multiple of 4, on 64-bit, a multiple of 8. Note however that the multiple (and thus minOffsetAlignment) must be a power of 2 (the author of this code, for better or worse, is 100% assuming the person calling this function knows this).
Let's assume for the rest of my answer that we're working with a hardware word size of 64-bit, and that our platform must allocate memory in chunks of 8 bytes (64 bits divided by 8 bits per byte == 8 bytes). So minOffsetAlignment will be passed as 8.
Consequently, if the instance size is 1-8 bytes, our function must return 8. If it's 9-16 bytes, it must return 16, and so on.
This answer shows how to round up a value to some nearest integer provided the value being rounded isn't already a multiple of that integer.
In the case of rounding up to the nearest multiple of 8, the procedure is to add 7, then do integer division to divide by 8, then multiply by 8.
So:
(0 + 7) / 8 * 8 == 0 // a zero-byte instance requires zero memory bytes
(1 + 7) / 8 * 8 == 8 // a one-byte instance requires eight memory bytes
(2 + 7) / 8 * 8 == 8 // a two-byte instance requires eight memory bytes
...
(8 + 7) / 8 * 8 == 8 // an eight-byte instance requires eight memory bytes
(9 + 7) / 8 * 8 == 16 // a nine-byte instance requires sixteen memory bytes
...
What the code in your question is doing is exactly the above algorithm, but it's using bitwise operations instead of addition, division, and multiplication which on some hardware is faster.
Now, how does it do this?
First we have to get the (instanceSize + 7) part. That's here:
(instanceSize + minOffsetAlignment - 1)
Then we have to divide it by 8, truncate the remainder, and multiply the result of the division by 8.
The last part does that all in one step, and this explains why minOffsetAlignment had to be a power of two.
First, we see:
minOffsetAlignment - 1
If minOffsetAlignment is 8, then its binary value is 0b00001000.
Subtracting 1 from 8 gives 7, which in binary is 0b00000111.
Now the complement of 7 is taken:
~(minOffsetAlignment - 1)
This inverts all the bits so we get ...1111000 (for a 64-bit integer such as VkDeviceSize there are 61 leading 1's and 3 trailing 0's).
Now, putting the whole statement together:
(instanceSize + minOffsetAlignment - 1) & ~(minOffsetAlignment - 1)
We see the & operator will clear the last three bits of whatever the result of (instanceSize + minOffsetAlignment - 1) is.
This forces the return value to be a multiple of 8 (since any binary integer with the last three bits 0 is a multiple of 8), but given that we already added 7 it also rounded it up to the nearest multiple of 8 provided instanceSize wasn't already a multiple of 8.
I have a file which is nothing but an array of longs (8-byte integers).
I know that each consecutive long is larger in value than its predecessor.
What simple and complicated ways are there to compress this data?
What I have thought of:
Assessing the largest difference and storing only the difference between the longs, assuming it takes fewer bits per long to represent the difference.
Any suggestions are welcome.
Overall, your idea seems correct.
Let's say you have L0 .. L1 ... Ln-1 .. Ln your n unsigned longs, such as
Lk+1 >= Lk, and obviously by definition L0 >= 0 and Lk <= Lmax.
Let's call diffk = Lk+1 - Lk the difference function which represents the data you plan on compressing.
If we study the sum of diff0 .. diff1 ... diffn-2 .. diffn-1 :
diff0 + diff1 + .. + diffn-2 + diffn-1 = L1 - L0 + L2 - L1 + .. + Ln-1 - Ln-2 + Ln - Ln-1,
This simplifies to Ln - L0 : the sum of all differences is equal to Ln - L0.
So on average the function diffn has a value of (Ln - L0) / (n - 1), whereas Lk has an average value of (Ln - L0) / 2.
As n >> 2, we can therefore say that average(diffn) << average(Ln), which means you will need less bits to compress the diffn information.
As of how to do such a compression, without more information on the data distribution it's difficult to assess what the best possible algorithm could be.
You can start with a trivial scheme like this :
if(diffk < 128) return a byte with the first bit set, and the 7 remaining bits containing the value of diffk
else if(diffk < 16384) return a byte with the first bit unset, and the 7 remaining bits containing the first 7 bits of the value of diffk, then another byte with the first bit set and the 7 bits remaining containing the rest of the value of diffk, etc...
To decompress it's very easy, you read a byte, if the first bit is set, you know you only need to read the next 7 bits, if not, you read and store 7 bits and you go on to the next byte, etc.
I'm reading about loading DDS textures. I read this article and saw this posting. (I also read the wiki about S3TC)
I understood most of the code, but there's two lines I didn't quite get.
blockSize = (format == GL_COMPRESSED_RGBA_S3TC_DXT1_EXT) ? 8 : 16;
and:
size = ((width + 3) / 4) * ((height + 3) / 4) * blockSize;
and:
bufsize = mipMapCount > 1 ? linearSize * 2 : linearSize;
What is blockSize? and why are we using 8 for DXT1 and 16 for the rest?
What is happening exactly when we're calculating size? More
specifically why are we adding 3, dividing by 4 then multiplying
by blockSize?
Why are we multiplying by 2 if mipMapCount > 1?
DXT1-5 formats are also called BCn formats (it ends with numbers but not exactly the same ones) and BC stands for block compression. Pixels are not stored separately, it only stores a block of data for the equivalent of 4x4 pixels.
The 1st line checks if it's DXT1, because it has a size of 8 byte per block. DXT3 and DXT5 have use 16 bytes per block. (Note that newer formats exist and at least one of them is 8 bytes/block: BC4).
The 2nd rounds up the dimensions of the texture to a multiple of the dimensions of a block. This is required since these formats can only store blocks, not pixels. For example, if you have a texture of 15x6 pixels, and since BCn blocks are 4x4 pixels, you will need to store 4 blocks per column, and 2 blocks per row, even if the last column/row of blocks will only be partially filled.
One way of rounding up a positive integer (let's call it i) to a multiple of another positive integer (let's call it m), is:
(i + m - 1) / m * m
Here, we need get the number of blocks on each dimension and then multiply by the size of a block to get the total size of the texture. To do that we round up width and height to the next multiple of 4, divide by 4 to get the number of block and finally and multiply it by the size of the block:
size = (((width + 3) / 4 * 4) * ((height + 3) / 4 * 4)) / 4 * blockSize;
// ^ ^ ^
If you look closely, there's a *4 followed by a /4 that can be simplified. If you do that, you'll get exactly the same code you had. The conclusion to all this could be comment any code that's not perfectly obvious :P
The 3rd line may be an approximation to calculate a buffer size big enough to store the whole mipmap chain easily. But I'm not sure what this linearSize is; it correspond to dwPitchOrLinearSize in the DDS header. In any case, you don't really need this value since you can calculate the size of each level easily with the code above.
I am not able to understand the below code with respect to the comment provided. What does this code does, and what would be the equivalent code for 8-aligned?
/* segment size must be 4-aligned */
attr->options.ssize &= ~3;
Here, ssize is of unsigned int type.
Since 4 in binary is 100, any value aligned to 4-byte boundaries (i.e. a multiple of 4) will have the last two bits set to zero.
3 in binary is 11, and ~3 is the bitwise negation of those bits, i.e., ...1111100. Performing a bitwise AND with that value will keep every bit the same, except the last two which will be cleared (bit & 1 == bit, and bit & 0 == 0). This gives us a the next lower or equal value that is a multiple of 4.
To do the same operation for 8 (1000 in binary), we need to clear out the lowest three bits. We can do that with the bitwise negation of the binary 111, i.e., ~7.
All powers of two (1, 2, 4, 8, 16, 32...) can be aligned by simple a and operation.
This gives the size rounded down:
size &= ~(alignment - 1);
or if you want to round up:
size = (size + alignment-1) & ~(alignment-1);
The "alignment-1", as long as it's a value that is a power of two, will give you "all ones" up to the bit just under the power of two. ~ inverts all the bits, so you get ones for zeros and zeros for ones.
You can check that something is a power of two by:
bool power_of_two = !(alignment & (alignment-1))
This works because, for example 4:
4 = 00000100
4-1 = 00000011
& --------
0 = 00000000
or for 16:
16 = 00010000
16-1 = 00001111
& --------
0 = 00000000
If we use 5 instead:
5 = 00000101
4-1 = 00000100
& --------
4 = 00000100
So not a power of two!
Perhaps more understandable comment would be
/* make segment size 4-aligned
by zeroing two least significant bits,
effectively rounding down */
Then at least for me, immediate question pops to my mind: should it really be rounded down, when it is size? Wouldn't rounding up be more appropriate:
attr->options.ssize = (attr->options.ssize + 3) & ~3;
As already said in other answers, to make it 8-aligned, 3 bits need to be zeroed, so use 7 instead of 3. So, we might make it into a function:
unsigned size_align(unsigned size, unsigned bit_count_to_zero)
{
unsigned bits = (1 << bit_count_to_zero) - 1;
return (size + bits) & ~bits;
}
~3 is the bit pattern ...111100. When you do a bitwise AND with that pattern, it clears the bottom two bits, i.e. rounds down to the nearest multiple of 4.
~7 does the same thing for 8-aligned.
The code ensures the bottom two bits of ssize are cleared, guaranteeing that ssize is a multiple of 4. Equivalent code for 8-aligned would be
attr->options.ssize &= ~7;
number = number & ~3
The number is rounded off to the nearest multiple of 4 that is lesser than number
Ex:
if number is 0,1,2 or 3, the `number` is rounded off to 0
similarly if number is 4,5,6,or 7,numberis rounded off to 4
But if this is related to memory alignment, the memory must be aligned upwards and not downwards.
I need some division algorithm which can handle big integers (128-bit).
I've already asked how to do it via bit shifting operators. However, my current implementation seems to ask for a better approach
Basically, I store numbers as two long long unsigned int's in the format
A * 2 ^ 64 + B with B < 2 ^ 64.
This number is divisible by 24 and I want to divide it by 24.
My current approach is to transform it like
A * 2 ^ 64 + B A B
-------------- = ---- * 2^64 + ----
24 24 24
A A mod 24 B B mod 24
= floor( ---- ) * 2^64 + ---------- * 2^64 + floor( ---- ) + ----------
24 24.0 24 24.0
However, this is buggy.
(Note that floor is A / 24 and that mod is A % 24. The normal divisions are stored in long double, the integers are stored in long long unsigned int.
Since 24 is equal to 11000 in binary, the second summand shouldn't change something in the range of the fourth summand since it is shifted 64 bits to the left.
So, if A * 2 ^ 64 + B is divisible by 24, and B is not, it shows easily that it bugs since it returns some non-integral number.
What is the error in my implementation?
The easiest way I can think of to do this is to treat the 128-bit numbers as four 32-bit numbers:
A_B_C_D = A*2^96 + B*2^64 + C*2^32 + D
And then do long division by 24:
E = A/24 (with remainder Q)
F = Q_B/24 (with remainder R)
G = R_C/24 (with remainder S)
H = S_D/24 (with remainder T)
Where X_Y means X*2^32 + Y.
Then the answer is E_F_G_H with remainder T. At any point you only need division of 64-bit numbers, so this should be doable with integer operations only.
Could this possibly be solved with inverse multiplication? The first thing to note is that 24 == 8 * 3 so the result of
a / 24 == (a >> 3) / 3
Let x = (a >> 3) then the result of the division is 8 * (x / 3). Now it remains to find the value of x / 3.
Modular arithmetic states that there exists a number n such that n * 3 == 1 (mod 2^128). This gives:
x / 3 = (x * n) / (n * 3) = x * n
It remains to find the constant n. There's an explanation on how to do this on wikipedia. You'll also have to implement functionality to multiply to 128 bit numbers.
Hope this helps.
/A.B.
You shouldn't be using long double for your "normal divisions" but integers there as well. long double doesn't have enough significant figures to get the answer right (and anyway the whole point is to do this with integer operations, correct?).
Since 24 is equal to 11000 in binary, the second summand shouldn't change something in the range of the fourth summand since it is shifted 64 bits to the left.
Your formula is written in real numbers. (A mod 24) / 24 can have an arbitrary number of decimals (1/24 is for instance 0.041666666...) and can therefore interfere with the fourth term in your decomposition, even once multiplied by 2^64.
The property that Y*2^64 does not interfere with the lower weight binary digits in an addition only works when Y is an integer.
Don't.
Go grab a library to do this stuff - you'll be incredibly thankful you chose to when debugging weird errors.
Snippets.org had a C/C++ BigInt library on it's site a while ago, Google also turned up the following: http://mattmccutchen.net/bigint/