let's say I have a struct of type A that is POD, and a void pointer p.
Can I safely cast p to a pointer to A, then read/write to the A structure pointed by p ?
Is it guaranteed to work everytime, even if the alignment of A is 8 and p points to an odd memory adress? (worst case)
I am not concerned about performance issues, I just want to know if it's supposed to work according to the standard and / or if it's portable enough on mainstream platforms.
edit : I'm also interested to know if there's any difference depending on x86 / 64 bits architecture
Thanks!
Yes, you can cast a pointer to class A to a class B.
Essentially, you are telling the compiler to use stencil class B when referring to the memory location of the class A variable.
Generally, this is not safe because the values at the locations will have different meanings and positions.
Usually, this type of casting is used for interpreting a buffer of uint8_t as a structured object. Another usage is when there is a union.
So the term safe depends on the context that the operation is used in.
Edit 1: Alignment
Most modern processors can handle alignment issues. The processor may require more operations to fetch the data, which will slow down the performance.
For example with a 16-bit processor, a 16-bit value aligned on an odd address will require two fetches (since it only fetches at event addresses):
+----------+----------------------------+
| address | value |
+----------+----------------------------+
| 16 | N/A |
+----------+----------------------------+
| 17 | 1st 8 bits of 16-bit value |
+----------+----------------------------+
| 18 | 2nd 8 bits of 16-bit value |
+----------+----------------------------+
| 19 | N/A |
+----------+----------------------------+
Since the processor only fetches values at even addresses, fetching the value will require 2 fetches. The first fetch at address 16 obtains the first 8 bits of the 16-bit variable. The second fetch obtains the second 8 bits of the variable.
The processor also has to perform some operations to get the bits in the preferred order in one "word". These operations will also negatively affect the performance.
Related
What is stack alignment?
Why is it used?
Can it be controlled by compiler settings?
The details of this question are taken from a problem faced when trying to use ffmpeg libraries with msvc, however what I'm really interested in is an explanation of what is "stack alignment".
The Details:
When runnig my msvc complied program which links to avcodec I get the
following error: "Compiler did not align stack variables. Libavcodec has
been miscompiled", followed by a crash in avcodec.dll.
avcodec.dll was not compiled with msvc, so I'm unable to see what is going on inside.
When running ffmpeg.exe and using the same avcodec.dll everything works well.
ffmpeg.exe was not compiled with msvc, it was complied with gcc / mingw (same as avcodec.dll)
Thanks,
Dan
Alignment of variables in memory (a short history).
In the past computers had an 8 bits databus. This means, that each clock cycle 8 bits of information could be processed. Which was fine then.
Then came 16 bit computers. Due to downward compatibility and other issues, the 8 bit byte was kept and the 16 bit word was introduced. Each word was 2 bytes. And each clock cycle 16 bits of information could be processed. But this posed a small problem.
Let's look at a memory map:
+----+
|0000|
|0001|
+----+
|0002|
|0003|
+----+
|0004|
|0005|
+----+
| .. |
At each address there is a byte which can be accessed individually.
But words can only be fetched at even addresses. So if we read a word at 0000, we read the bytes at 0000 and 0001. But if we want to read the word at position 0001, we need two read accesses. First 0000,0001 and then 0002,0003 and we only keep 0001,0002.
Of course this took some extra time and that was not appreciated. So that's why they invented alignment. So we store word variables at word boundaries and byte variables at byte boundaries.
For example, if we have a structure with a byte field (B) and a word field (W) (and a very naive compiler), we get the following:
+----+
|0000| B
|0001| W
+----+
|0002| W
|0003|
+----+
Which is not fun. But when using word alignment we find:
+----+
|0000| B
|0001| -
+----+
|0002| W
|0003| W
+----+
Here memory is sacrificed for access speed.
You can imagine that when using double word (4 bytes) or quad word (8 bytes) this is even more important. That's why with most modern compilers you can chose which alignment you are using while compiling the program.
Some CPU architectures require specific alignment of various datatypes, and will throw exceptions if you don't honor this rule. In standard mode, x86 doesn't require this for the basic data types, but can suffer performance penalties (check www.agner.org for low-level optimization tips).
However, the SSE instruction set (often used for high-performance) audio/video procesing has strict alignment requirements, and will throw exceptions if you attempt to use it on unaligned data (unless you use the, on some processors, much slower unaligned versions).
Your issue is probably that one compiler expects the caller to keep the stack aligned, while the other expects callee to align the stack when necessary.
EDIT: as for why the exception happens, a routine in the DLL probably wants to use SSE instructions on some temporary stack data, and fails because the two different compilers don't agree on calling conventions.
IIRC, stack alignment is when variables are placed on the stack "aligned" to a particular number of bytes. So if you are using a 16 bit stack alignment, each variable on the stack is going to start from a byte that is a multiple of 2 bytes from the current stack pointer within a function.
This means that if you use a variable that is < 2 bytes, such as a char (1 byte), there will be 8 bits of unused "padding" between it and the next variable. This allows certain optimisations with assumptions based on variable locations.
When calling functions, one method of passing arguments to the next function is to place them on the stack (as opposed to placing them directly into registers). Whether or not alignment is being used here is important, as the calling function places the variables on the stack, to be read off by the calling function using offsets. If the calling function aligns the variables, and the called function expects them to be non-aligned, then the called function won't be able to find them.
It seems that the msvc compiled code is disagreeing about variable alignment. Try compiling with all optimisations turned off.
As far as I know, compilers don't typically align variables that are on the stack. The library may be depending on some set of compiler options that isn't supported on your compiler. The normal fix is to declare the variables that need to be aligned as static, but if you go about doing this in other people's code, you'll want to be sure that they variables in question are initialized later on in the function rather than in the declaration.
// Some compilers won't align this as it's on the stack...
int __declspec(align(32)) needsToBe32Aligned = 0;
// Change to
static int __declspec(align(32)) needsToBe32Aligned;
needsToBe32Aligned = 0;
Alternately, find a compiler switch that aligns the variables on the stack. Obviously the "__declspec" align syntax I've used here may not be what your compiler uses.
I am a university student currently studying computer science and programming and while reading chapter 2 of c++ primer by Stanley B. Lippmann a question popped up into my mind and that is, if computer memory is divided into tiny storage locations called Bytes (8 bits) and each Byte of memory is assigned a unique address, and an integer variable uses up 4 Bytes of memory, shouldn't my console, when using the address-of operator print out 4 unique addresses instead of 1?
I doubt that the textbook is incorrect and that their is a flaw in my understanding of computer memory. As a result, I would like a positive clarification of this question I am facing. Thanks in advance people :)
shouldn't my console, when using the address-of operator print out 4 unique addresses instead of 1?
No.
The address of an object is the address of its starting byte. A 4-byte int has a unique address, the address of its first byte, but it occupies the next three bytes as well. Those next three bytes have different addresses, but they are not the address of the int.
Each variable is located in memory somewhere, so each variable gets an address you can get with the address-of operator.
That each byte in a multi-byte variable also have their addresses doesn't matter, the address-of operator gives you a pointer to the variable.
Some "graphics" to hopefully explain it...
Lets say we have an int variable named i, and that the type int takes four bytes (32 bits, this is the usual for int). Then you have something like
+---+---+---+---+
| | | | |
+---+---+---+---+
Some place is reserved for the four bytes, where doesn't matter the compiler will handle all that for you.
Now if you use the address-of operator to get a pointer to the variable i i.e. you do &i, then you have something like
+---+---+---+---+
| | | | |
+---+---+---+---+
^
|
&i
The expression &i points to the memory position where the byte-sequence of the variable begins. It can't possible give you multiple pointers, one for each byte, that's really impossible, and not needed as well.
Yes an integer type requires four bytes. All four bytes are allocated as one block of memory for your integer, where each block has a unique address. This unique address is simply the first byte's address of the block.
In http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.6, it is wriiten that
"Another valid approach would be to define a "byte" as 9 bits, and simulate a char* by two words of memory: the first could point to the 36-bit word, the second could be a bit-offset within that word. In that case, the C++ compiler would need to add extra instructions when compiling code using char* pointers."
I couldn't understand what it meant by "simulating char* by two words" and further quote.
Could somebody please explain it by giving an example ?
I think this is what they were describing:
The PDP-10 referenced in the second paragraph had 36-bit words and was unable to address anything inside of those words. The following text is a description of one way that this problem could have been solved while fitting within the restrictions of the C++ language spec (that are included in the first paragraph).
Let's assume that you want to make 9-bit-long bytes (for some reason). By the spec, a char* must be able to address individual bytes. The PDP-10 can't do this, because it can't address anything smaller than a 36-bit word.
One way around the PDP-10's limitations would be to simulate a char* using two words of memory. The first word would be a pointer to the 36-bit word containing the char (this is normally as precise as the PDP-10's pointers allow). The second word would indicate an offset (in bits) within that word. Now, the char* can access any byte in the system and complies with the C++ spec's limitations.
ASCII-art visual aid:
| Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 |
-------------------------------------------------------------------------
| Word 1 | Word 2 |
| (Address) | (Offset) |
-------------------------------------------------------------------------
Say you had a char* with word1 = 0x0100 and word2 = 0x12. This would point to the 18th bit (the start of the third byte) of the 256th word of memory.
If this technique was really used to generate a conforming C++ implementation on the PDP-10, then the C++ compiler would have to do some extra work with juggling the extra bits required by this rather funky internal format.
The whole point of that article is to illustrate that a char isn't always 8 bits. It is at least 8 bits, but there is no defined maximum. The internal representation of data types is dependent on the platform architecture and may be different than what you expect.
Since the C++ spec says that a char* must point to individual bytes, and the PDP-6/10 does not allow addressing individual bytes in a word, you have a problem with char* (which is a byte pointer) on the PDP-6/10
So one work around is: define a byte as 9 bits, then you essentially have 4 bytes in a word (4 * 9 = 36 bits = 1 word).
You still can't have char* point to individual bytes on the PDP-6/10, so instead have char* be made up of 2 36-bit words. The lower word would be the actual address, and the upper word would be some byte-mask magic that the C++ compiler could use to point to the right 9bits in the lower word.
In this case,
sizeof(*int) (36bits) is different than sizeof(*char) (72bits).
It's just a contrived example that shows how the spec doesn't constrain primatives to specific bit/byte sizes.
data: [char1|char2|char3|char4]
To access char1:
ptrToChar = &data
index = 0
To access char2:
ptrToChar = &data
index = 9
To access char3:
ptrToChar = &data
index = 18
...
then to access a char, you would:
(*ptrToChar >> index) & 0x001ff
but ptrToChar and index would be saved in some sort of structure that the compiler creates so they would be associated with each other.
Actually, the PDP-10 can address (load, store) 'bytes', smaller than a (36-bit) word, with a single word pointer. On th -10, a byte pointer includes the word address containing the 'byte', the width (in bits) of the 'byte', and the position (in bits from the right) of the 'byte' within the word. Incrementing the pointer (with an explicit increment, or increment and load/deposit instruction), increments the position part (by the size part) and, handles overflow to the next word address. (No decrementing, though.) A byte pointer can e.g. address individual bits, but 6, 8, 9, 18(!) were probably common, as there were specially-formatted versions of byte pointers (global byte pointers) that made their use somewhat easier.
Supposing a PDP-10 implementation wanted to get as close to having 8-bit bytes as possible. The most reasonable to split up a 36-bit word (the smallest unit of memory that the machine's assembly langauge can address) is to divide the word up into four 9-bit bytes. To access a particular 9-bit byte, you need to know which word it's in (you'd use the machine's native addressing mode for that, using a pointer which takes up one word), and you'd need extra data to indicate which of the 4 bytes inside the word was the one you're interested. This extra data would be stored in a second machine word. The compiler would generate lots of extra instructions to use that extra data to pull the right byte out of the word, using the extra data stored in the second word.
I have some code here, and don't really understand the ">>" and the "&". Can someone clarify?
buttons[0] = indata[byteindex]&1;
buttons[1] = (indata[byteindex]>>1)&1;
rawaxes[7] = (indata[byteindex]>>4)&0xf;
These are bitwise operators, meaning they operate on the binary bits that make up a value. See Bitwise operation on Wikipedia for more detail.
& is for AND
If indata[byteindex] is the number 4, then in binary it would look like 00000100. ANDing this number with 1 gives 0, because bit 1 is not set:
00000100 AND 00000001 = 0
If the value is 5 however, then you will get this:
00000101 AND 00000001 = 1
Any bit matched with the mask is allowed through.
>> is for right-shifting
Right-shifting shifts bits along to the right!
00010000 >> 4 = 00000001
One of the standard patterns for extracting a bit field is (reg >> offset) & mask, where reg is the register (or other memory location) you're reading, offset is how many least-significant bits you skip over, and mask is the set of bits that matter. The >> offset step can be omitted if offset is 0. mask is usually equal to 2width-1, or (1 << width) - 1 in C, where width is the number of bits in the field.
So, looking at what you have:
buttons[0] = indata[byteindex]&1;
Here, offset is 0 (it was omitted) and mask is 1. So this gets just the least-significant bit in indata[byteindex]:
bit number -> 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
indata[byteindex] | | | | | | | |*|
+-+-+-+-+-+-+-+-+
|
\----> buttons[0]
Next:
buttons[1] = (indata[byteindex]>>1)&1;
Here, offset is 1 and width is 1...
bit number -> 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
indata[byteindex] | | | | | | |*| |
+-+-+-+-+-+-+-+-+
|
\------> buttons[1]
And, finally:
rawaxes[7] = (indata[byteindex]>>4)&0xf;
Here, offset is 4 and width is 4 (24-1 = 16 - 1 = 15 = 0xf):
bit number -> 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
indata[byteindex] |*|*|*|*| | | | |
+-+-+-+-+-+-+-+-+
| | | |
\--v--/
|
\---------------> rawaxes[7]
EDIT...
but I don't understand what the point of it is...
Mike pulls up a rocking chair and sits down.
Back in the old days of 8-bit CPUs, a computer typically had 64K (65 536 bytes) of address space. Now we wanted to do as much as we could with our fancy whiz-bang machines, so we would do things like buy 64K of RAM and map everything to RAM. Shazam, 64K of RAM and bragging rights all around.
But a computer that can only access RAM isn't much good. It needs some ROM for an OS (or at least a BIOS), and some addresses for I/O. (You in the back--siddown. I know Intel chips had separate address space for I/O, but it doesn't help here because the I/O space was much, much smaller than the memory space, so you ran into the same constraints.)
Address space used for ROM and I/O was space that wasn't accessible as RAM, so you wanted to minimize how much space wasn't used for RAM. So, for example, when your I/O peripheral had five different things whose status amounted to a single bit each, rather than give each one of those bits its own byte (and, hence, address), they got the brilliant idea of packing all five of those bits into one byte, leaving three bits that did nothing. Voila, the Interrupt Status Register was born.
The hardware designers were also impressed with how fewer addresses resulted in fewer address bits (since address bits is ceiling of log-base-2 of number of addresses), meaning fewer address pins on the chip, freeing pins for other purposes. (These were the days when 48-pin chips were considered large, and 64-pins huge, and grid array packages were out of the question because multi-layer circuit boards were prohibitively expensive. These were also the days before multiplexing the address and data on the same pins became commonplace.)
So the chips were taped out and fabricated, and hardware was built, and then it fell to the programmers to make the hardware work. And lo, the programmers said, "WTF? I just want to know if there is a byte to read in the bloody serial port, but there are all these other bits like "receiver overrun" in the way." And the hardware guys considered this, and said, "tough cookies, deal with it."
So the programmers went to the Guru, the guy who hadn't forgotten his Boolean algebra and was happy not to be writing COBOL. And the Guru said, "use the Bit AND operation to force those bits you don't care about to 0. If you need a number, and not just a zero-or-nonzero, use a logical shift right (LSR) on the result." And they tried it. It worked, and there was much rejoicing, though the wiser ones started wondering about things like race conditions in a read-modify-write cycle, but that's a story for another time.
And so the technique of packing loosely or completely unrelated bits into registers became commonplace. People developing protocols, which always want to use fewer bits, jumped on these techniques as well. And so, even today, with our gigabytes of RAM and gigabits of bandwidth, we still pack and unpack bitfields with expressions whose legibility borders on keyboard head banging.
(Yes, I know bit fields probably go back to the ENIAC, and maybe even the Difference Engine if Lady Ada needed to stuff two data elements into one register, but I haven't been alive that long, okay? I'm sticking with what I know.)
(Note to hardware designers out there: There really isn't much justification anymore for packing things like status flags and control bits that a driver writer will want to use independently. I've done several designs with one bit per 32-bit register in many cases. No bit shifting or masking, no races, driver code is simpler to write and understand, and the address decode logic is trivially more complex. If the driver software is complex, simplifying flag and bitfield handling can save you a lot of ROM and CPU cycles.)
(More random trivia: The Atmel AVR architecture (used in the Arduino, among many other places) has some specialized bit-set and bit-clear instructions. The avr-libc library used to provide macros for these instructions, but now the gcc compiler is smart enough to recognize that reg |= (1 << bitNum); is a bit set and reg &= ~(1 << bitNum); is a bit clear, and puts in the proper instruction. I'm sure other architectures have similar optimizations.)
These are bitwise operators.
& ands two arguments bit by bit.
'>>' shifts first argument's bit string to the right by second argument.
'<<' does the opposite. | is bitwise or and ^ is bitwise xor just like & is bitwise and.
In English, the first line is grabbing to lowest bit (bit 0) only out of Button[0]. Basically, if the value is odd, it will be 1, if even, it will be 0.
(bit 1)
The second is grabbing the second bit. If that bit is set, it returns 1, else 0. It could have also been written as
buttons[1] = (indata[byteindex]&2)>>1;
and it would have done the same thing.
The last (3rd) line is grabbing the 5th throuh 8th bits (bits 4-7). Basically, it will be a number from 0 to 15 when it is complete. It also could hav been written as
rawaxes[7] = (indata[byteindex]&0xf0) >> 4;
and done the same thing. I'd also guess from context that these arrays are unsigned char arrays. Just a guess though.
The '&' (in this case) is a bitwise AND operator and ">>" is the bit-shift operator (so x>>y yields x shifted right Y bits).
So, they're taking the least significant bit of indata[byteindex] and putting it into buttons[0]. They taking the next least significant bit and putting it into buttons[1].
The last one probably needs to be looked at in binary to make a lot of sense. 0xf is 11112, so they're taking the input, shifting it right 4 bits, then retaining the 4 least significant bits of that result.
What is stack alignment?
Why is it used?
Can it be controlled by compiler settings?
The details of this question are taken from a problem faced when trying to use ffmpeg libraries with msvc, however what I'm really interested in is an explanation of what is "stack alignment".
The Details:
When runnig my msvc complied program which links to avcodec I get the
following error: "Compiler did not align stack variables. Libavcodec has
been miscompiled", followed by a crash in avcodec.dll.
avcodec.dll was not compiled with msvc, so I'm unable to see what is going on inside.
When running ffmpeg.exe and using the same avcodec.dll everything works well.
ffmpeg.exe was not compiled with msvc, it was complied with gcc / mingw (same as avcodec.dll)
Thanks,
Dan
Alignment of variables in memory (a short history).
In the past computers had an 8 bits databus. This means, that each clock cycle 8 bits of information could be processed. Which was fine then.
Then came 16 bit computers. Due to downward compatibility and other issues, the 8 bit byte was kept and the 16 bit word was introduced. Each word was 2 bytes. And each clock cycle 16 bits of information could be processed. But this posed a small problem.
Let's look at a memory map:
+----+
|0000|
|0001|
+----+
|0002|
|0003|
+----+
|0004|
|0005|
+----+
| .. |
At each address there is a byte which can be accessed individually.
But words can only be fetched at even addresses. So if we read a word at 0000, we read the bytes at 0000 and 0001. But if we want to read the word at position 0001, we need two read accesses. First 0000,0001 and then 0002,0003 and we only keep 0001,0002.
Of course this took some extra time and that was not appreciated. So that's why they invented alignment. So we store word variables at word boundaries and byte variables at byte boundaries.
For example, if we have a structure with a byte field (B) and a word field (W) (and a very naive compiler), we get the following:
+----+
|0000| B
|0001| W
+----+
|0002| W
|0003|
+----+
Which is not fun. But when using word alignment we find:
+----+
|0000| B
|0001| -
+----+
|0002| W
|0003| W
+----+
Here memory is sacrificed for access speed.
You can imagine that when using double word (4 bytes) or quad word (8 bytes) this is even more important. That's why with most modern compilers you can chose which alignment you are using while compiling the program.
Some CPU architectures require specific alignment of various datatypes, and will throw exceptions if you don't honor this rule. In standard mode, x86 doesn't require this for the basic data types, but can suffer performance penalties (check www.agner.org for low-level optimization tips).
However, the SSE instruction set (often used for high-performance) audio/video procesing has strict alignment requirements, and will throw exceptions if you attempt to use it on unaligned data (unless you use the, on some processors, much slower unaligned versions).
Your issue is probably that one compiler expects the caller to keep the stack aligned, while the other expects callee to align the stack when necessary.
EDIT: as for why the exception happens, a routine in the DLL probably wants to use SSE instructions on some temporary stack data, and fails because the two different compilers don't agree on calling conventions.
IIRC, stack alignment is when variables are placed on the stack "aligned" to a particular number of bytes. So if you are using a 16 bit stack alignment, each variable on the stack is going to start from a byte that is a multiple of 2 bytes from the current stack pointer within a function.
This means that if you use a variable that is < 2 bytes, such as a char (1 byte), there will be 8 bits of unused "padding" between it and the next variable. This allows certain optimisations with assumptions based on variable locations.
When calling functions, one method of passing arguments to the next function is to place them on the stack (as opposed to placing them directly into registers). Whether or not alignment is being used here is important, as the calling function places the variables on the stack, to be read off by the calling function using offsets. If the calling function aligns the variables, and the called function expects them to be non-aligned, then the called function won't be able to find them.
It seems that the msvc compiled code is disagreeing about variable alignment. Try compiling with all optimisations turned off.
As far as I know, compilers don't typically align variables that are on the stack. The library may be depending on some set of compiler options that isn't supported on your compiler. The normal fix is to declare the variables that need to be aligned as static, but if you go about doing this in other people's code, you'll want to be sure that they variables in question are initialized later on in the function rather than in the declaration.
// Some compilers won't align this as it's on the stack...
int __declspec(align(32)) needsToBe32Aligned = 0;
// Change to
static int __declspec(align(32)) needsToBe32Aligned;
needsToBe32Aligned = 0;
Alternately, find a compiler switch that aligns the variables on the stack. Obviously the "__declspec" align syntax I've used here may not be what your compiler uses.