I've had to deal with bitfields in structs recently, and came accross a behaviour I can't explain.
The following struct should be 9 bytes, according to individual sizeof. But doing a sizeof of the main struct yields 10 bytes.
The following program yields "10; 1 1 2 1 2 1 1 =9"
int main(){
struct{
uint8_t doubleoscillator;
struct{
char monophonic : 1;
char hold : 1;
char padding : 6;
} test;
int16_t osc1_multisound; //int
int8_t osc1_octave; // -2..1
int16_t osc2_multisound; //int
int8_t osc2_octave; // -2..1
int8_t intervall;
}osc;
std::cout << sizeof(osc) << "; ";
int a[7];
a[0] = sizeof(osc.doubleoscillator);
a[1] = sizeof(osc.test);
a[2] = sizeof(osc.osc1_multisound);
a[3] = sizeof(osc.osc1_octave);
a[4] = sizeof(osc.osc2_multisound);
a[5] = sizeof(osc.osc2_octave);
a[6] = sizeof(osc.intervall);
int total = 0;
for(int i=0;i<7;i++){
std::cout << a[i] << " ";
total += a[i];
}
std::cout << " = " << total << std::endl;
return 0;
}
Why do the sum individual sizeof() of the internal variables of the struct yield a different result from a sizeof() of the osc struct ?
Primarily for performance reasons, padding is added before each member of a struct to align said member in the structure's memory layout. Thus ocs2_multisound likely has a padding byte before it to ensure it appears at a number of bytes into the struct that is a multiple of 2 (because int16_t has an alignment of 2).
Additionally, after all that is done, the structure's total size is padded to a multiple of its strictest alignment requirement (i.e. the highest alignment of any held field). This is so that e.g. elements of an array of said type will all be properly aligned.
The alignment of a type can be checked at compile-time via alignof(T) where T is the type.
The increased size is unavoidable in this case, but the common advice for cutting down on padding bytes is to order struct members in order of descending alignment. This is because the next item is guaranteed to be properly aligned without the need for padding because the previous field was either the same alignment or stricter alignment. So if any padding is added, it will only be to pad the total size of the structure, rather than (wasted) padding between fields.
The reason for alignment is primarily for efficiency nowadays. Reading an unaligned block of memory on hardware that supports it typically is about twice as slow because it's actually reading the two memory blocks around it and extracting what it needs. However there's also hardware that simply will not work if you try to read/write unaligned memory. Such hardware typically triggers a hardware exception in this event.
Related
I'm using Visual Studio 2019, and I noticed that in debug builds, the variables are allocated so far apart from one another. I looked at Project Properties and tried searching online but could not find anything. I ran the following code below in both Debug and Release mode and here are the respective outputs.
int main() {
int a = 3;
int b = 5;
int c = 8;
int d[5] = { 10,10,10,10,10 };
int e = 14;
std::cout << "a: " << &a
<< "\nb: " << &b
<< "\nc: " << &c
<< "\nd_start: " << &d[0]
<< "\nd_end: " << &d[4] + 1
<< "\ne: " << &e
<< std::endl;
}
As you can see below, variables are allocated as you would expect (one after the other) with no wasted memory in between. Even the last variable, e, is optimized to slot between c and d.
// Release_x64 Build Ouput
a: 0000003893EFFC40
b: 0000003893EFFC44
c: 0000003893EFFC48
d_start: 0000003893EFFC50
d_end: 0000003893EFFC64
e: 0000003893EFFC4C // e is optimized in between c and d
Below is the output that confuses me. Here you can see that a and b are allocated 32 bytes apart! So there is 28 bytes of wasted/uninitialized memory between them. The same thing happens for other variables except for the int d[5]. d has 32 uninitialized bytes after c but only has 24 uninitialized bytes before e.
// Debug_x64 Build Output
a: 00000086D7EFF3F4
b: 00000086D7EFF414
c: 00000086D7EFF434
d_start: 00000086D7EFF458
d_end: 00000086D7EFF46C
e: 00000086D7EFF484
My question is that why is this happening? Why does the MSVC allocate these variables so far apart from one another and what determines how much space to separate them by so that it's different for arrays?
The debug version of the allocates storage differently than the release version. In particular, the debug version allocates some space at the beginning and end of each block of storage, so its allocation patterns are somewhat different.
The debug allocator also checks the storage at the start and end of the block it allocated to see if it has been damaged in any way.
Storage is allocated in quantized chunks, where the quantum is unspecified but is something like 16, or 32 bytes. Thus, if you allocated a DWORD array of six elements (size = 6 * sizeof(DWORD) bytes = 24 bytes) then the allocator will actually deliver 32 bytes (one 32-byte quantum or two 16-byte quanta). So if you write element [6] (the seventh element) you overwrite some of the "dead space" and the error is not detected. But in the release version, the quantum might be 8 bytes, and three 8-byte quanta would be allocated, and writing the [6] element of the array would overwrite a part of the storage allocator data structure that belongs to the next chunk. After that it is all downhill. There error might not even show up until the program exits! You can construct similar "boundary condition" situations for any size quantum. Because the quantum size is the same for both versions of the allocator, but the debug version of the allocator adds hidden space for its own purposes, you will get different storage allocation patterns in debug and release mode.
I have Bus Error in such code:
char* mem_original;
int int_var = 987411;
mem_original = new char [250];
memcpy(&mem_original[250-sizeof(int)], &int_var, sizeof(int));
...
const unsigned char* mem_u_const = (unsigned char*)mem_original;
...
const unsigned char *location = mem_u_const + 250 - sizeof(int);
std::cout << "sizeof(int) = " << sizeof(int) << std::endl;//it's printed out as 4
std::cout << "byte 0 = " << int(*location) << std::endl;
std::cout << "byte 1 = " << int(*(location+1)) << std::endl;
std::cout << "byte 2 = " << int(*(location+2)) << std::endl;
std::cout << "byte 3 = " << int(*(location+3)) << std::endl;
int original_var = *((const int*)location);
std::cout << "original_var = " << original_var << std::endl;
That works well few times, printing out:
sizeof(int) = 4
byte 0 = 0
byte 1 = 15
byte 2 = 17
byte 3 = 19
original_var = 987411
And then it fails with:
sizeof(int) = 4
byte 0 = 0
byte 1 = 15
byte 2 = 17
byte 3 = 19
Bus Error
It's built & run on Solaris OS (C++ 5.12)
Same code on Linux (gcc 4.12) & Windows (msvc-9.0) is working well.
We can see:
memory was allocated on the heap by new[].
memory is accessible (we can read it byte by byte)
memory contains exactly what there should be, not corrupted.
So what may be reason for Bus Error? Where should I look?
UPD:
If I memcpy(...) location in the end to original_var, it works. But what the problem in *((const int*)location) ?
This is a common issue for developers with no experience on hardware that has alignment restrictions - such as SPARC. x86 hardware is very forgiving of misaligned access, albeit with performance impacts. Other types of hardware? SIGBUS.
This line of code:
int original_var = *((const int*)location);
invokes undefined behavior. You're taking an unsigned char * and interpreting what it points to as an int. You can't do that safely. Period. It's undefined behavior - for the very reason you're experiencing.
You're violating the strict aliasing rule. See What is the strict aliasing rule? Put simply, you can't refer to an object of one type as another type. A char * does not and can not refer to an int.
Oracle's Solaris Studio compilers actually provide a command-line argument that will let you get away with that on SPARC hardware - -xmemalign=1i (see https://docs.oracle.com/cd/E19205-01/819-5265/bjavc/index.html). Although to be fair to GCC, without that option, the forcing you do in your code will still SIGBUS under the Studio compiler.
Or, as you've already noted, you can use memcpy() to copy bytes around no matter what they are - as long as you know the source object is safe to copy into the target object - yes, there are cases when that's not true.
I get the following warning when I compile your code:
main.cpp:19:26: warning: cast from 'const unsigned char *' to 'const int *' increases required alignment from 1 to 4 [-Wcast-align]
int original_var = *((const int*)location);
^~~~~~~~~~~~~~~~~~~~
This seems to be the cause of the bus error, because improperly aligned access can cause a bus error.
Although I don’t have access to a SPARC right now to test this, I’m pretty sure from my experiences on that platform that this line is your problem:
const unsigned char *location = mem_u_const + 250 - sizeof(int);
The mem_u_const block was originally allocated by new for an array of characters. Since sizeof(unsigned char) is 1 and sizeof(int) is 4, you are adding 246 bytes. This is not a multiple of 4.
On SPARC, the CPU can only read 4-byte words if they are aligned to 4-byte boundaries. Your attempt to read a misaligned word is what causes the bus error.
I recommend allocating a struct with an array of unsigned char followed by an int, rather than a bunch of pointer math and casts like the one that caused this bug.
I am trying to store a very big number in a uint64_t like:
int main(int argc, char** argv) {
uint64_t ml = sizeof(void*)*(1<<63);
cout << "ml=" << ml << "\n";
const char* r;
const char* mR=r+ml;
return 0;
}
But I do not know why am I getting the output as 0, despite of storing it in a uint64_t datatype?
EDIT: char* mR is my memory buffer and I can increase my memory buffer to at most ml. I want to make use of 64GB RAM machine. So, can you suggest how much should I increment mR to..as I want to use all the available RAM. That is to what value should I set ml to?
Try
uint64_t ml = ((uint64_t)1)<<63;
or just
uint64_t ml = 0x8000000000000000;
Just 1 << 63 uses integers, and if the shift value is too big, it is undefined behavior. In you case, it may result in 0 due to overflow.
Please note that if you multiply 0x8000000000000000 by sizeof(void*), you'll likely get overflow too.
If you want to allocate 64G of memory, that would be:
char* buffer = new char[64ULL * 1024 * 1024 * 1024];
or simply:
char* buffer = new char[1ULL << 36];
Note that 64G is 2^36 bytes, which is far, far less than the 2^63 number that you're trying to use. Although, typically when you use that much memory, it's because your program organically uses it through various operations... not by just allocating it in one large chunk.
Just use:
uint64_t ml = sizeof(void*) * (1ULL << 63);
Because, as AlexD already said, 1 << 63 uses integers, and 1 << 63 is actually 0.
Even after you correct the shift to (uint64_t)1 << 63, if sizeof(void*) is any even number (and it assuredly is), then your product will be divisible by 2^64, and thus be zero when stored in a uint64_t.
I could not fully understand the consequences of what I read here: Casting an int pointer to a char ptr and vice versa
In short, would this work?
set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
if ((uintmax_t)buffer % 4) {//misaligned
for (int i = 0; i < 4; i++) {
buffer[i] = 0xff;
}
} else {//4-byte alignment
*((uint32_t*) buffer) = MASK;
}
}
Edit
There was a long discussion (it was in the comments, which mysteriously got deleted) about what type the pointer should be casted to in order to check the alignment. The subject is now addressed here.
This conversion is safe if you are filling same value in all 4 bytes. If byte order matters then this conversion is not safe.
Because when you use integer to fill 4 Bytes at a time it will fill 4 Bytes but order depends on the endianness.
No, it won't work in every case. Aside from endianness, which may or may not be an issue, you assume that the alignment of uint32_t is 4. But this quantity is implementation-defined (C11 Draft N1570 Section 6.2.8). You can use the _Alignof operator to get the alignment in a portable way.
Second, the effective type (ibid. Sec. 6.5) of the location pointed to by buffer may not be compatible to uint32_t (e.g. if buffer points to an unsigned char array). In that case you break strict aliasing rules once you try reading through the array itself or through a pointer of different type.
Assuming that the pointer actually points to an array of unsigned char, the following code will work
typedef union { unsigned char chr[sizeof(uint32_t)]; uint32_t u32; } conv_t;
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffffU;
if ((uintptr_t)buffer % _Alignof(uint32_t)) {// misaligned
for (size_t i = 0; i < sizeof(uint32_t); i++) {
buffer[i] = 0xffU;
}
} else { // correct alignment
conv_t *cnv = (conv_t *) buffer;
cnv->u32 = MASK;
}
}
This code might be of help to you. It shows a 32-bit number being built by assigning its contents a byte at a time, forcing misalignment. It compiles and works on my machine.
#include<stdint.h>
#include<stdio.h>
#include<inttypes.h>
#include<stdlib.h>
int main () {
uint32_t *data = (uint32_t*)malloc(sizeof(uint32_t)*2);
char *buf = (char*)data;
uintptr_t addr = (uintptr_t)buf;
int i,j;
i = !(addr%4) ? 1 : 0;
uint32_t x = (1<<6)-1;
for( j=0;j<4;j++ ) buf[i+j] = ((char*)&x)[j];
printf("%" PRIu32 "\n",*((uint32_t*) (addr+i)) );
}
As mentioned by #Learner, endianness must be obeyed. The code above is not portable and would break on a big endian machine.
Note that my compiler throws the error "cast from ‘char*’ to ‘unsigned int’ loses precision [-fpermissive]" when trying to cast a char* to an unsigned int, as done in the original post. This post explains that uintptr_t should be used instead.
In addition to the endian issue, which has already been mentioned here:
CHAR_BIT - the number of bits per char - should also be considered.
It is 8 on most platforms, where for (int i=0; i<4; i++) should work fine.
A safer way of doing it would be for (int i=0; i<sizeof(uint32_t); i++).
Alternatively, you can include <limits.h> and use for (int i=0; i<32/CHAR_BIT; i++).
Use reinterpret_cast<>() if you want to ensure the underlying data does not "change shape".
As Learner has mentioned, when you store data in machine memory endianess becomes a factor. If you know how the data is stored correctly in memory (correct endianess) and you are specifically testing its layout as an alternate representation, then you would want to use reinterpret_cast<>() to test that memory, as a specific type, without modifying the original storage.
Below, I've modified your example to use reinterpret_cast<>():
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
if (*reinterpret_cast<unsigned int *>(buffer) % 4) {//misaligned
for (int i = 0; i < 4; i++) {
buffer[i] = 0xff;
}
} else {//4-byte alignment
*reinterpret_cast<unsigned int *>(buffer) = MASK;
}
}
It should also be noted, your function appears to set the buffer (32-bytes of contiguous memory) to 0xFFFFFFFF, regardless of which branch it takes.
Your code is perfect for working with any architecture with 32bit and up. There is no issue with byte ordering since all your source bytes are 0xFF.
At x86 or x64 machines, the extra work necessary to deal with eventually unaligned access to RAM are managed by the CPU and transparent to the programmer (since Pentium II), with some performance cost at each access. So, if you are just setting the first four bytes of a buffer a few times, you are good to simplify your function:
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
*((uint32_t *)buffer) = MASK;
}
Some readings:
A Linux kernel doc about UNALIGNED MEMORY ACCESSES
Intel Architecture Optimization Manual, section 3.4
Windows Data Alignment on IPF, x86, and x64
A Practical 'Aligned vs. unaligned memory access', by Alexander Sandler
I have a long list of numbers between 0 and 67600. Now I want to store them using an array that is 67600 elements long. An element is set to 1 if a number was in the set and it is set to 0 if the number is not in the set. ie. each time I need only 1bit information for storing the presence of a number. Is there any hack in C/C++ that helps me achieve this?
In C++ you can use std::vector<bool> if the size is dynamic (it's a special case of std::vector, see this) otherwise there is std::bitset (prefer std::bitset if possible.) There is also boost::dynamic_bitset if you need to set/change the size at runtime. You can find info on it here, it is pretty cool!
In C (and C++) you can manually implement this with bitwise operators. A good summary of common operations is here. One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. << and >> are undefined when shifting negative integers. You will need to allocate arrays of some integral type like uint32_t. If you want to store N bits, it will take N/32 of these uint32_ts. Bit i is stored in the i % 32'th bit of the i / 32'th uint32_t. You may want to use a differently sized integral type depending on your architecture and other constraints. Note: prefer using an existing implementation (e.g. as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions.
There are a number of tricks that will maybe only consume one bit: e.g. arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. See this link.
Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc/new it would not be sane for your allocator to track data to such a small precision due to overhead. That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :)
You should use std::bitset.
std::bitset functions like an array of bool (actually like std::array, since it copies by value), but only uses 1 bit of storage for each element.
Another option is vector<bool>, which I don't recommend because:
It uses slower pointer indirection and heap memory to enable resizing, which you don't need.
That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*.
*For example, a standard-conforming function could expect &container.front() to produce a pointer to the first element of any container type, which fails with std::vector<bool>. Perhaps a nitpick for your usage case, but still worth knowing about.
There is in fact! std::vector<bool> has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool
See the doc, it stores it as efficiently as possible.
Edit: as somebody else said, std::bitset is also available: http://en.cppreference.com/w/cpp/utility/bitset
If you want to write it in C, have an array of char that is 67601 bits in length (67601/8 = 8451) and then turn on/off the appropriate bit for each value.
Others have given the right idea. Here's my own implementation of a bitsarr, or 'array' of bits. An unsigned char is one byte, so it's essentially an array of unsigned chars that stores information in individual bits. I added the option of storing TWO or FOUR bit values in addition to ONE bit values, because those both divide 8 (the size of a byte), and would be useful if you want to store a huge number of integers that will range from 0-3 or 0-15.
When setting and getting, the math is done in the functions, so you can just give it an index as if it were a normal array--it knows where to look.
Also, it's the user's responsibility to not pass a value to set that's too large, or it will screw up other values. It could be modified so that overflow loops back around to 0, but that would just make it more convoluted, so I decided to trust myself.
#include<stdio.h>
#include <stdlib.h>
#define BYTE 8
typedef enum {ONE=1, TWO=2, FOUR=4} numbits;
typedef struct bitsarr{
unsigned char* buckets;
numbits n;
} bitsarr;
bitsarr new_bitsarr(int size, numbits n)
{
int b = sizeof(unsigned char)*BYTE;
int numbuckets = (size*n + b - 1)/b;
bitsarr ret;
ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
ret.n = n;
return ret;
}
void bitsarr_delete(bitsarr xp)
{
free(xp.buckets);
}
void bitsarr_set(bitsarr *xp, int index, int value)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);
//longer version
/*unsigned int width, width_in_place, zeros, old, newbits, new;
width = (1 << xp->n) - 1;
width_in_place = width << innerdex*xp->n;
zeros = ~width_in_place;
old = xp->buckets[buckdex];
old = old & zeros;
newbits = value << innerdex*xp->n;
new = newbits | old;
xp->buckets[buckdex] = new; */
}
int bitsarr_get(bitsarr *xp, int index)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;
//longer version
/*unsigned int width = (1 << xp->n) - 1;
unsigned int width_in_place = width << innerdex*xp->n;
unsigned int val = xp->buckets[buckdex];
unsigned int retshifted = width_in_place & val;
unsigned int ret = retshifted >> innerdex*xp->n;
return ret; */
}
int main()
{
bitsarr x = new_bitsarr(100, FOUR);
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, 15-i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
bitsarr_delete(x);
}