understanding vtu file size - c++

I am having a problem with understanding/managing sizes of .vtu files in VTK. I need to write CFD output for hexahedral meshes with millions of cells and nodes. So, I am looking at ways to improve the efficiency of storage. I started with simple test cases.
Case1: 80x40x40 hexahedral mesh with 8 points for each hexahedron. So, 128000 cells and 1024000 points in total. Let's call it C1.vtu.
Case2: 80x40x40 hexahedral mesh with only unique points. So, 128000 cells and 136161 points in total. Let's call it C2.vtu.
I store one vector field (velocity) for each point in each case. I use vtkFloatArray for this data. The size of C1.vtu is 7.5 MB, and C2.vtu file is 3.0MB.
This is not what I expected when I created C2.vtu. As I store only about 13% of points (of Case1) in Case2, I expected that C2.vtu would be reduced accordingly (at least 5 times). However, the reduction is only 2.5 times.
I would like to understand what is going on internally. Also, I appreciate any insights on reducing the file size further.
I am using vtk6.2 with C++ on Ubuntu12.04.

It sounds like you have compression enabled in the writer; does writer->GetCompressor() return a non-NULL pointer? If so, then that is almost surely the reason for the difference in file sizes. Without compression, I would expect larger file sizes that you are reporting. As the comments above noted, unstructured storage adds connectivity overhead. Consider your meshes C1 and C2:
C1
connectivity size = 128000 * (1 cell type + 1 cell offset + 8 point IDs) * (4 or 8 bytes per integer)
point coordinate size = 1024000 * (3 coords) * (4 or 8 bytes per coord)
vector field size = 1024000 * (3 components per tuple) * (4 or 8 bytes per component)
that would be 28.32 MiB at a minimum (all int32/float32) yet you report it is 7.5 MB
C2
connectivity size = 128000 * (1 cell type + 1 cell offset + 8 point IDs) * (4 or 8 bytes per integer)
point coordinate size = 136161 * (3 coords) * (4 or 8 bytes per coord)
vector field size = 136161 * (3 components per tuple) * (4 or 8 bytes per component)
that would be 8 MiB at a minimum, but you report 3 MB.

Related

How to calculate bitmap size?

Started working on screen capturing software specifically targeted for Windows. While looking through an example on MSDN for Capturing an Image I found myself a bit confused.
Keep in mind when I refer to the size of the bitmap that does not include headers and so forth associated with an actual file. I'm talking about raw pixel data. I would have thought that the formula should be (width*height)*bits-per-pixel. However, according to the example this is the proper way to calculate the size:
DWORD dwBmpSize = ((bmpScreen.bmWidth * bi.biBitCount + 31) / 32) * 4 * bmpScreen.bmHeight;
and or: ((width*bits-per-pixel + 31) / 32) * 4 * height
I don't understand why there's the extra calculations involving 31, 32 and 4. Perhaps padding? I'm not sure but any explanations would be quite appreciated. I've already tried Googling and didn't find any particularly helpful results.
The bits representing the bitmap pixels are packed in rows. The size of each row is rounded up to a multiple of 4 bytes (a 32-bit DWORD) by padding.
(bits_per_row + 31)/32 * 4 ensures the round up to the next multiple of 32 bits. The answer is in bytes, rather than bits hence *4 rather than *32.
See: https://en.wikipedia.org/wiki/BMP_file_format
Under Bitmap Header Types you'll find the following:
The scan lines are DWORD aligned [...]. They must be padded for scan line widths, in bytes, that are not evenly divisible by four [...]. For example, a 10- by 10-pixel 24-bpp bitmap will have two padding bytes at the end of each scan line.
The formula
((bmpScreen.bmWidth * bi.biBitCount + 31) / 32) * 4
establishes DWORD-alignment (in bytes). The trailing * 4 is really the result of * 32 / 8, where the multiplication with 32 produces a value that's a multiple of 32 (in bits), and the division by 8 translates it back to bytes.
Although this does produce the desired result, I prefer a different implementation. A DWORD is 32 bits, i.e. a power of 2. Rounding up to a power of 2 can be implemented using the following formula:
(value + ((1 << n) - 1)) & ~((1 << n) - 1)
Adding (1 << n) - 1 adjusts the initial value to go past the next n-th power of 2 (unless it already is an n-th power of 2). (1 << n) - 1 evaluates to a value, where the n least significant bits are set, ~((1 << n) - 1) negates that, i.e. all bits but the n least significant bits are set. This serves as a mask to remove the n least significant bits of the adjusted initial value.
Applied to this specific case, where a DWORD is 32 bits, i.e. n is 5, and (1 << n) - 1 evaluates to 31. value is the raw scanline width in bits:
auto raw_scanline_width_in_bits{ bmpScreen.bmWidth * bi.biBitCount };
auto aligned_scanline_width_in_bits{ (raw_scanline_width_in_bits + 31) & ~31 };
auto aligned_scanline_width_in_bytes{ raw_scanline_width_in_bits / 8 };
This produces the same results, but provides a different perspective, that may be more accessible to some.

Loading DDS textures?

I'm reading about loading DDS textures. I read this article and saw this posting. (I also read the wiki about S3TC)
I understood most of the code, but there's two lines I didn't quite get.
blockSize = (format == GL_COMPRESSED_RGBA_S3TC_DXT1_EXT) ? 8 : 16;
and:
size = ((width + 3) / 4) * ((height + 3) / 4) * blockSize;
and:
bufsize = mipMapCount > 1 ? linearSize * 2 : linearSize;
What is blockSize? and why are we using 8 for DXT1 and 16 for the rest?
What is happening exactly when we're calculating size? More
specifically why are we adding 3, dividing by 4 then multiplying
by blockSize?
Why are we multiplying by 2 if mipMapCount > 1?
DXT1-5 formats are also called BCn formats (it ends with numbers but not exactly the same ones) and BC stands for block compression. Pixels are not stored separately, it only stores a block of data for the equivalent of 4x4 pixels.
The 1st line checks if it's DXT1, because it has a size of 8 byte per block. DXT3 and DXT5 have use 16 bytes per block. (Note that newer formats exist and at least one of them is 8 bytes/block: BC4).
The 2nd rounds up the dimensions of the texture to a multiple of the dimensions of a block. This is required since these formats can only store blocks, not pixels. For example, if you have a texture of 15x6 pixels, and since BCn blocks are 4x4 pixels, you will need to store 4 blocks per column, and 2 blocks per row, even if the last column/row of blocks will only be partially filled.
One way of rounding up a positive integer (let's call it i) to a multiple of another positive integer (let's call it m), is:
(i + m - 1) / m * m
Here, we need get the number of blocks on each dimension and then multiply by the size of a block to get the total size of the texture. To do that we round up width and height to the next multiple of 4, divide by 4 to get the number of block and finally and multiply it by the size of the block:
size = (((width + 3) / 4 * 4) * ((height + 3) / 4 * 4)) / 4 * blockSize;
// ^ ^ ^
If you look closely, there's a *4 followed by a /4 that can be simplified. If you do that, you'll get exactly the same code you had. The conclusion to all this could be comment any code that's not perfectly obvious :P
The 3rd line may be an approximation to calculate a buffer size big enough to store the whole mipmap chain easily. But I'm not sure what this linearSize is; it correspond to dwPitchOrLinearSize in the DDS header. In any case, you don't really need this value since you can calculate the size of each level easily with the code above.

packing an array of 3 values in buffer

I have the following problem I am unable to solve gracefully.
I have a data type that can take 3 possible values (0,1,2).
I have an array of 20 element of this data type.
As I want to encode the information on the least amount of memory, I did the following :
consider that each element can take up to 4 values (2 bits)
each char holds 8 bits, so I can put 4 times an element
5 char holds 40 bits, so I can store 20 elements.
I have done this and it works time.
However I'm interested evaluating the space gained by using the fact that my element can only take 3 values and not 4.
Every possible combination gives us 3 to the 20th power, which is 3,486,784,401. However 256 to the 4th power gives us 4,294,967,296 , which is greater. This means I could encode my data on 4 char .
Is there an generic method to do the 2nd idea here ? The 1st idea is simple to implement with bit mask / bit shifts. However since 3 values doesn't fit in an integer number of bits, I have no idea how to encode / decode any of these values into an array of 4 char.
Do you have any idea or reference on how it's done ? I think there must be a general method. If anything I'm interested about the feasability of this
edit : this could be simplified to : how to store 5 values from 0 to 2 into 1 byte only (as 256 >= 3^5 = 243)
You should be able to do what you said using 4 bytes. Assume that you store the 20 values into a single int32_t called value, here is how you would extract any particular element:
element[0] = value % 3;
element[1] = (value / 3) % 3;
element[2] = (value / 9) % 3;
...
element[19] = (value / 1162261467) % 3; // 1162261467 = 3 ^ 19
Or as a loop:
for (i=0;i<20;i++) {
element[i] = value % 3;
value /= 3;
}
To build value from element, you would just do the reverse, something like this:
value = 0;
for (i=19;i>=0;i--)
value = value * 3 + element[i];
There is a generic way to figure out how much bits you need:
If your data type has N different values, then you need log(N) / log(2) bits to store this value. For instance in your example, log(3) / log(2) equals 1.585 bits.
Of course in reality you will to pack a fixed amount of values in an integer number of bits, so you have to multiply this 1.585 with that amount and round up. For instance if you pack 5 of them:
1.585 × 5 = 7.925, meaning that 5 of your values just fit in one 8-bit char.
The way to unpack the values has been shown in JS1's answer. The generic formula for unpacking is element[i] = (value / (N ^ i) ) mod N
Final note, this is only meaningful if you really need to optimize memory usage. For comparison, here are some popular ways people pack these value types. Most of the time the extra space taken up is not a problem.
an array of bool: uses 8 bits to store one bool. And a lot of people really dislike the behavior of std::vector<bool>.
enum Bla { BLA_A, BLA_B, BLA_C}; an array or vector of Bla probably uses 32 bits per element (sizeof(Bla) == sizeof(int)).

Concurrent matrix sum - past Exam paper

I'm currently studying in my 3rd year of university - my exam for Computer Systems and Concurrency and I'm confused about a past paper question. Nobody - even the lecturer - has answered my question.
Question:
Consider the following GPU that consists of 8 multiprocessors clocked at 1.5 GHz, each of which contains 8 multithreaded single-precision floating-point units and integer processing units. It has a memory system that consists of 8 partitions of 1GHz Graphics DDR3DRAM, each 8 bytes wide and with 256 MB of capacity. Making reasonable assumptions (state them), and a naive matrix multiplication algorithm, compute how much time the computation C = A * B would take. A, B, and C are n * n matrices and n is determined by the amount of memory the system has.
Answer given in solutions:
> Assuming it has a single-precision FP multiply-add instruction,
Single-precision FP multiply-add performance =
\#MPs * #SP/MP * #FLOPs/instr/SP * #instr/clock * #clocks/sec =
8 * 8 * 2 * 1 * 1.5 G = 192 GFlops / second
Total DDR3RAM memory size = 8 * 256 MB = 2048 MB
The peak DDR3 bandwidth = #Partitions * #bytes/transfer * #transfers/clock * #clocks/sec = 8 * 8 * 2 * 1G = 128 GB/sec
>Modern computers have 32-bit single precision So, if we want 3 n*n SP matrices,
maximum n is
3n^2 * 4 <= 2048 * 1024 * 1024
>nmax = 13377 = n
>The number of operations that a naive mm algorithm (triply nested loop) needs is calculated as follows:
>For each element of the
result, we need n multiply-adds For each row of the result,
>we need n * n multiply-adds For the entire result matrix, we need n * n * n multiply-adds Thus, approximately 2393 GFlops.
> Assuming no cache, we have loading of 2 matrices and storing of 1 to the graphics memory.
>That is 3 * n^2 = 512 GB of data. This process will take 512 / 128 = 4 seconds
Also, the processing will take 2393 / 192 = 12.46 seconds Thus the
entire matrix multiplication will take 16.46 seconds.
Now my questions is - how does the calculation of 3*((13377)^2) = 536,832,387
translate to 536,832,387 = 512 GB.
That is 536.8 Million values. Each value is 4 bytes long. The memory interface is 8 bytes wide - assuming the GPU cannot fetch 2 values and split them - that effectively doubles the size of the reads and writes. Therefore the 2GB of Memory used is effectively read/written twice (because 8 bytes are read and 4 ignored) Therefore only 4GB of data is passed between the RAM and the GPU.
Can someone please tell me where I am going wrong as the only way I can think of is that 536.8 Million Result is the value of the memory operations in KB - which is not stated anywhere.

Coding an enhanced LSB reverser

I'm stumbling upon a steganographied image with a divided IDAT structure of 12 blocks (the last LSB slightly smaller) (.PNG). I'll elaborate a bit on the structure of the issue before I get to the real point of my question since I need to clarify some of the things so please do not mark it as off-topic since it is not. I just have to explain the notion behind the script so that I may get to the issue itself. It definitely has embedded data into itself. The data seems to have been concealed by altering the enhanced LSB values eliminating the high-level bits for each pixel except for the last least significant bit. So all bytes are going to be 0 or 1 since 0 or 1 on a 256 values range won't give any visible color. Basically, a 0 stays at 0, and a 1 becomes maximum value, or 255. I've been analyzing this image in many different ways, but don't see anything odd beyond the utter lack of one value in any of the three color values (RGB) and the heightened presence of another value in 1/3 of the color values. Studying these and replacing bytes has given me nothing, however, and I am at a loss to whether this avenue is even worth pursuing.
Hence, I'm looking into developing a script in rather Python, PHP or C/C++ that would reverse the process and 'restore' the enhanced LSBs.
I've converted it to a 24-bit .BMP and tracking down the red curve from a chi-square steganalysis, it's certain that there is a steganographied data within the file.
First, there is a little bit more than 8 vertical zones. Which means that there is hidden data little bit more than 8kB. One pixel can be used to hide three bits (one in the LSB of each RGB color tone). So we can hide (98x225)x3 bits. To get the number of kilobytes, we divide by 8 and by 1024: ((98x225)x3)/(8x1024). Well, that should be around 8.1 kilobytes. But that ain't the case here.
The analisys of the APPO and APP1 markers of a .JPG extension of the file also give some awkward outputs:
Start Offset: 0x00000000
*** Marker: SOI (xFFD8) ***
OFFSET: 0x00000000
*** Marker: APP0 (xFFE0) ***
OFFSET: 0x00000002
length = 16
identifier = [JFIF]
version = [1.1]
density = 96 x 96 DPI (dots per inch)
thumbnail = 0 x 0
*** Marker: APP1 (xFFE1) ***
OFFSET: 0x00000014
length = 58
Identifier = [Exif]
Identifier TIFF = x[4D 4D 00 2A 00 00 00 08 ]
Endian = Motorola (big)
TAG Mark x002A = x[002A]
EXIF IFD0 # Absolute x[00000026]
Dir Length = x[0003]
[IFD0.x5110 ] =
[IFD0.x5111 ] = 0
[IFD0.x5112 ] = 0
Offset to Next IFD = [00000000]
*** Marker: DQT (xFFDB) ***
Define a Quantization Table.
OFFSET: 0x00000050
Table length = 67
----
Precision=8 bits
Destination ID=0 (Luminance)
DQT, Row #0: 2 1 1 2 3 5 6 7
DQT, Row #1: 1 1 2 2 3 7 7 7
DQT, Row #2: 2 2 2 3 5 7 8 7
DQT, Row #3: 2 2 3 3 6 10 10 7
DQT, Row #4: 2 3 4 7 8 13 12 9
DQT, Row #5: 3 4 7 8 10 12 14 11
DQT, Row #6: 6 8 9 10 12 15 14 12
DQT, Row #7: 9 11 11 12 13 12 12 12
Approx quality factor = 94.02 (scaling=11.97 variance=1.37)
I'm nearly convinced that there is no encryption algorithm applied therefore no key implementation follows the concealment. My notion is that of coding a script that would shift the LSB values and return the originals. I've ran the file under several structure analyses, statistical attacks, BPCS,
The histogram of the image shows a specific color with an unusual spike to it. I've manipulated that as best I can to try and view any hidden data, but to no avail. Those are the histograms of the RGB values as follows:
Then there are the multiple IDAT chunks. But, I've put together a similar image by defining random color values at each pixel location, and I too wound up with several of these. So far, I've also found very little inside them. Even more interesting, is the way that color values are repeated in the image. It seems, that the frequency of reused colors could hold some clue. But, I have yet to fully understand that relationship, if one exists. Additionally, there is only a single column and a single row of pixels that do not possess a full value of 255 on their alpha channel. I've even interpreted the X, Y, A, R, G, and B values of every pixel in the image as ASCII, but wound up with nothing too legible. Even the green curve of the average of LSBs cannot tell us anything. There is no evident break. Here are several other histograms which show the weird curve of the blue value from the RGB:
But the red curve, the output of the chi-square analysis, shows some difference. It can see something that we cannot see. Statistical detection is more sensitive than our eyes, and I guess that was my final point. However, there is also a sort of latency in the red curve. Even without hidden data, it starts at maximum and stays like that for some time. It's close to a false positive. It looks like the LSB in the image and is very close to random, and the algorithm needs a large population (remember the analysis is done on an incrementing population of pixels) before reaching a threshold where it can decide that actually, they are not random after all, and the red curve starts to go down. The same sort of latency happens with hidden data. You hide 1 or 2 kb, but the red curve does not go down right after this amount of data. It waits a little bit, here respectively at around 1.3 kb and 2.6 kb. Here is a representation of the data types from a hex editor:
byte = 166
signed byte = -90
word = 40,358
signed word = -25,178
double word = 3,444,481,446
signed double word = -850,485,850
quad = 3,226,549,723,063,033,254
signed quad = 3,226,549,723,063,033,254
float = -216652384.
double = 5.51490063721e-093
word motorola = 42,653
double word motorola = 2,795,327,181
quad motorola = 12,005,838,827,773,085,484
Here's another spectrum to confirm the behavior of the blue (RGB) value.
Please note that I needed to go through all of this in order to clarify the situation and the programming matter that I'm in pursuit of. This by itself makes my question NOT off-topic so I'd be glad if it doesn't get marked as such. Thank you.
In case of an image with LSB enhancement applied, I cannot think of a way to reverse it back to its original state because there is no clue about the original values of RGBs. They are set to either 255 or 0 depending on their Least Significant Bit. The other option I see round here is if this is some sort of protocol to include quantum steganography.
Matlab and some steganalysis techniques could be the key to your issue though.
Here's a Java chi-square class for some statistical analysis:
private long[] pov = new long[256];
and three methods as
public double[] getExpected() {
double[] result = new double[pov.length / 2];
for (int i = 0; i < result.length; i++) {
double avg = (pov[2 * i] + pov[2 * i + 1]) / 2;
result[i] = avg;
}
return result;
}
public void incPov(int i) {
pov[i]++;
}
public long[] getPov() {
long[] result = new long[pov.length / 2];
for (int i = 0; i < result.length; i++) {
result[i] = pov[2 * i + 1];
}
return result;
or try with some bitwise shift operations as:
int pRGB = image.getRGB(x, y);
int alpha = (pRGB >> 24) & 0xFF;
int blue = (pRGB >> 16) & 0xFF;
int green = (pRGB >> 8) & 0xFF;
int red = pRGB & 0xFF;