Stack Little endian addresses offsets - gdb

I'm trying to understand how addresses work in memory.
For example I have this stack :
lets say we want to get the address of the 'a' character.
would it be in address 0xbffffc6c ? or did I get it absolutely wrong ?

That is correct:
the hexadecimal representation of data is always most significant bits first
in little endian, the most significant bits are stored in the highest memory address
PS. The wikipedia article on endianess has some images that show this quite well for whenever I mix these up in my head.

Related

Encode additional information in pointer

My problem:
I need to encode additional information about an object in a pointer to the object.
What I thought I could do is use part of the pointer to do so. That is, use a few bits encode bool flags. As far as I know, the same thing is done with certain types of handles in the windows kernel.
Background:
I'm writing a small memory management system that can garbage-collect unused objects. To reduce memory consumption of object references and speed up copying, I want to use pointers with additional encoded data e.g. state of the object(alive or ready to be collected), lock bit and similar things that can be represented by a single bit.
My question:
How can I encode such information into a 64-bit pointer without actually overwriting the important bits of the pointer?
Since x64 windows has limited address space, I believe, not all 64 bits of the pointer are used, so I believe it should be possible. However, I wasn't able to find which bits windows actually uses for the pointer and which not. To clarify, this question is about usermode on 64-bit windows.
Thanks in advance.
This is heavily dependent on the architecture, OS, and compiler used, but if you know those things, you can do some things with it.
x86_64 defines a 48-bit1 byte-oriented virtual address space in the hardware, which means essentially all OSes and compilers will use that. What that means is:
the top 17 bits of all valid addresses must be all the same (all 0s or all 1s)
the bottom k bits of any 2k-byte aligned address must be all 0s
in addition, pretty much all OSes (Windows, Linux, and OSX at least) reserve the addresses with the upper bits set as kernel addresses -- all user addresses must have the upper 17 bits all 0s
So this gives you a variety of ways of packing a valid pointer into less than 64 bits, and then later reconstructing the original pointer with shift and/or mask instructions.
If you only need 3 bits and always use 8-byte aligned pointers, you can use the bottom 3 bits to encode extra info, and mask them off before using the pointer.
If you need more bits, you can shift the pointer up (left) by 16 bits, and use those lower 16 bits for information. To reconstruct the pointer, just right shift by 16.
To do shifting and masking operations on pointers, you need to cast them to intptr_t or int64_t (those will be the same type on any 64-bit implementation of C or C++)
1There's some hints that there may soon be hardware that extends this to 56 bits, so only the top 9 bits would need to be 0s or 1s, but it will be awhile before any OS supports this

What is "bit padding" or "padding bits" exactly?

I do not want to molest you with this, but i just can not find anywhere in the internet a well-described explanation for what "bit padding" really is, as well as not in any answer for bit padding-related threads here on StackOverflow.
I also searched ISO 9899-1990 for it, in which "bit padding" is refered to but quite not explained as i need it.
The only content in the web i found about this was here, where only one ridiculously short explanation of one sentence was given, saying:
bit padding:
Bit padding is the addition of one or more extra bits to a transmission or storage unit to make it conform to a standard size.
Some sources identify bit padding as a type of bit stuffing.
Which it at least some sort of information but not enough explanation for me. I don´t quite understand what that means exactly. It also refers to the term
"bit stuffing".
When i look at the relative tag here on StockOverflow for "padding", padding is described as:
Extra space inserted into memory structures to achieve address alignment -or- extra space between the frame and the content of an HTML element -or- extra spaces or zeros when printing out values using formatting print commands like, in C, the printf*-family of functions.
Background:
I often find the term "bit padding" in relation of data types, but don´t understand what it is nor what it does exaclty with those.
Thank you very much for any topic-based answer.
I often find the term "bit padding" in relation of data types, but don´t understand what it is nor what it does exactly with those.
The gist of it is they are "wasted" space. I say "wasted" because while having padding bits makes the object bigger, it can make working with the object much easier (which means faster) and the small space waste can generate huge performance gains. In some cases it is essential because the CPU can't handle working with objects of that size.
Lets say you have a struct like (all numbers are just an example, different platforms can have different values):
struct foo
{
short a; // 16 bits
char b; // 8 bits
};
and the machine you are working with reads 32 bits of data in a single read operation. Reading a single foo is not a problem since the entire object fits into that 32 bit chunk. What does become a problem is when you have an array. The important thing to remember about arrays is that they are contiguous, there is no space between elements. It's just one object immediately followed by another. So, if you have an array like
foo array[10]{};
With this the first foo object is in a 32 bit bucket. The next element of the array though will be in the first 32 bit bucket and the second 32 bit bucket. This means that the member a is in two separate buckets. Some processors can do this (at a cost) and other processors will just crash if you try to do this. To solve both those problems the compiler will add padding bits to the end of foo to pad out it's size. This means foo actually becomes
struct foo
{
short a; // 16 bits
char b; // 8 bits
char _; // 8 bits of padding
};
And now it is easy for the processor to handle foo objects by themselves or in an array. It doesn't need to do any extra work and you've only added 8 bits per object. You'd need a lot of objects for that to start to matter on a modern machine.
There is also times where you need padding between members of the type because of unaligned access. Lets say you have
struct bar
{
char c; // 8 bits
int d; // 32 bits
};
Now bar is 40 bits wide and d more often then not will be stored in two different buckets again. To fix this the compiler adds padding bits between c an d like
struct bar
{
char c; // 8 bits
char _[3]; // 24 bits
int d; // 32 bits
};
and now d is guaranteed to go into a single 32 bit bucket.
So imagine you have an 8 bit number, it's an uint8_t, and its value is set to 4. This would probably be stored as a = 0000 0100. Now, let's say you wish to convert this into a 16 bit number. What would happen? You have to assign some values to 'new' bits in this number. How would you assign them? You can't randomly assign zeros or ones, value of original variable will change. Depending on architecture etc. you have to pad value with extra bits. In my case, that would mean additional eight extra zeros be added in front of original MSB (most significant bit), making our number a = 0000 0000 0000 0100.
Value is still 4, but now you can assign anything in [0, 2^16) range, instead of [0, 2^8) range.
bit padding:
Bit padding is the addition of one or more extra bits to a transmission or storage unit to make it conform to a standard size.
As the definition you posted is already correct, I'll try to explain with an example:
Suppose you have to store data that occupies less than 32 bits but you have 4 byte slots. It is easier to access that data by accessing to each slot, so you just have to complete all the 32 bits. The additional bits needed to complete 'the given space' but which are not part of the data conform the bit padding.
I'm sure there may be better examples of this in multiple contexts. Anybody, feel free to edit and/or complete the answer with new improvements or examples.
Hope this helps!
Bit Padding can be used in multiple contexts. Two common examples are network, and encryption. I believe the encryption context is more relevant.
Padding is used in encryption to make it harder to decipher messages, which have common part. If multiple messages are known to have same prefix (e.g., "hello"), it make it easier to break the key. By "padding" the message with a variable length bit field, it make it much harder to break the key.
It is being told the British intelligence was able to speed up the analysis of Enigma message, because the German were starting their message with the same heading.
For more technical, accurate description: https://en.wikipedia.org/wiki/Padding_(cryptography) Look for the section about block cipher, and bit padding

Is this the correct way of writing bits to big endian?

Currently, it's for a Huffman compression algorithm that assigns binary codes to characters used in a text file. Fewer bits for more frequent- and more bits for less-frequent characters.
Currently, I'm trying to save the binary code big-endian in a byte.
So let's say I'm using an unsigned char to hold it.
00000000
And I want to store some binary code that's 1101.
In advance, I want to apologize if this seems trivial or is a dupe but I've browsed dozens of other posts and can't seem to find what I need. If anyone could link or quickly explain, it'd be greatly appreciated.
Would this be the correct syntax?
I'll have some external method like
int length = 0;
unsigned char byte = (some default value);
void pushBit(unsigned int bit){
if (bit == 1){
byte |= 1;
}
byte <<= 1;
length++;
if (length == 8) {
//Output the byte
length = 0;
}
}
I've seen some videos explaining endianess and my understanding is the most significant bit (the first one) is placed in the lowest memory address.
Some videos showed the byte from left to right which makes me think I need to left shift everything over but whenever I set, toggle, erase a bit, it's from the rightmost is it not? I'm sorry once again if this is trivial.
So after my method finishes pushing the 1101 into this method, byte would be something like 00001101. Is this big endian? My knowledge of address locations is very weak and I'm not sure whether
**-->00001101 or 00001101<-- **
location is considered the most significant.
Would I need to left shift the remaining amount?
So since I used 4 bits, I would left shift 4 bits to make 11010000. Is this big endian?
First off, as the Killzone Kid noted, endianess and the bit ordering of a binary code are two entirely different things. Endianess refers to the order in which a multi-byte integer is stored in the bytes of memory. For little endian, the least significant byte is stored first. For big endian, the most significant byte is stored first. The bits in the bytes don't change order. Endianess has nothing to do with what you're asking.
As for accumulating bits until you have a byte's worth to write, you have the basic idea, but your code is incorrect. You need to shift first, and then or the bit. The way you're doing it, you are losing the first bit you put in off the top, and the low bit of what you write is always zero. Just put the byte <<= 1; before the if.
You also need to deal with ending the stream somehow, writing out the last bits if there are less than eight left. So you'll need a flushBits() to write out you bit buffer if it has more than one bit in it. Your bit stream would need to be self terminating, or you need to first send the number of bits, so that you don't misinterpret the filler bits in the last byte as a code or codes.
There are two types of endianness, Big-endian and Little-endian (technically there are more, like middle-endian, but big and little are the most common). If you want to have the big-endian format, (as it seems like you do), then the most significant byte comes first, with little-endian the least significant byte comes first.
Wikipedia has some good examples
It looks like what you are trying to do is store the bits themselves within the byte to be in reverse order, which is not what you want. A byte is endian agnostic and does not need to be flipped. Multi-byte types such as uint32_t may need their byte order changed, depending on what endianness you want to achieve.
Maybe what you are referring to is bit numbering, in which case the code you have should largely work (although you should compare length to 7, not 8). The order you place the bits in pushBit would end up with the first bit you pass being the most significant bit.
Bits aren't addressable by definition (if we're talking about C++, not C51 or its C++ successor), so from point of high level language, even from POV of assembler pseudo-code, no matter what the direction LSB -> MSB is, bit-wise << would perform shift from LSB to MSB. Bit order referred as bit numbering and is a separate feature from endian-ness, related to hardware implementation.
Bit fields in C++ change order because in most common use-cases usually bits do have an opposite order, e.g. in network communication, but in fact way how bit fields are packed into byte is implementation dependent, there is no consistency guarantee that there is no gaps or that order is preserved.
Minimal addressable unit of memory in C++ is of char size , and that's where your concern with endian-ness ends. The rare case if you actually should change bit order (when? working with some incompatible hardware?), you have to do explicitly so.
Note, that when working with Ethernet or other network protocol you should not do so, order change is done by hardware (first bit sent over wire is least significant one on the platform).

Which bit is first and when you bit shift, does it actually shift in that direction?

So.. wrestling with bits and bytes, It occurred to me that if i say "First bit of nth byte", it might not mean what I think it means. So far I have assumed that if I have some data like this:
00000000 00000001 00001000
then the
First byte is the leftmost of the groups and has the value of 0
First bit is the leftmost of all 0's and has the value of 0
Last byte is the rightmost of the groups and has the value of 8
Last bit of the second byte is the rightmost of the middle group and has the value of 1
Then I learned that the byte order in a typed collection of bytes is determined by the endianess of the system. In my case it should be little endian (windows, intel, right?) which would mean that something like 01 10 as a 16 bit uinteger should be 2551 while in most programs dealing with memory it would be represented as 265.. no idea whats going on there.
I also learned that bits in a byte could be ordered as whatever and there seems to be no clear answer as to which bit is the actual first one since they could also be subject to bit-endianess and peoples definition about what is first differs. For me its left to right, for somebody else it might be what first appears when you add 1 to 0 or right to left.
Why does any of this matter? Well, curiosity mostly but I was also trying to write a class that would be able to extract X number of bits, starting from bit-address Y. I envisioned it sorta like .net string where i can go and type ".SubArray(12(position), 5(length))" then in case of data like in the top of this post it would retrieve "0001 0" or 2.
So could somebody clarifiy as to what is first and last in terms of bits and bytes in my environment, does it go right to left or left to right or both, wut? And why does this question exist in the first place, why couldn't the coding ancestors have agreed on something and stuck with it?
A shift is an arithmetic operation, not a memory-based operation: it is intended to work on the value, rather than on its representation. Shifting left by one is equivalent to a multiplication by two, and shifting right by one is equivalent to a division by two. These rules hold first, and if they conflict with the arrangement of the bits of a multibyte type in memory, then so much for the arrangement in memory. (Since shifts are the only way to examine bits within one byte, this is also why there is no meaningful notion of bit order within one byte.)
As long as you keep your operations to within a single data type (rather than byte-shifting long integers and them examining them as character sequences), the results will stay predictable. Examining the same chunk of memory through different integer types is, in this case, a bit like performing integer operations and then reading the bits as a float; there will be some change, but it's not the place of the integer arithmetic definitions to say exactly what. It's out of their scope.
You have some understanding, but a couple misconceptions.
First off, arithmetic operations such as shifting are not concerned with the representation of the bits in memory, they are dealing with the value. Where memory representation comes into play is usually in distributed environments where you have cross-platform communication in the mix, where the data on one system is represented differently on another.
Your first comment...
I also learned that bits in a byte could be ordered as whatever and there seems to be no clear answer as to which bit is the actual first one since they could also be subject to bit-endianess and peoples definition about what is first differs
This isn't entirely true, though the bits are only given meaning by the reader and the writer of data, generally bits within an 8-bit byte are always read from left (MSB) to right (LSB). The byte-order is what is determined by the endian-ness of the system architecture. It has to do with the representations of the data in memory, not the arithmetic operations.
Second...
And why does this question exist in the first place, why couldn't the coding ancestors have agreed on something and stuck with it?
From Wikipedia:
The initial endianness design choice was (is) mostly arbitrary, but later technology revisions and updates perpetuate the same endianness (and many other design attributes) to maintain backward compatibility. As examples, the Intel x86 processor represents a common little-endian architecture, and IBM z/Architecture mainframes are all big-endian processors. The designers of these two processor architectures fixed their endiannesses in the 1960s and 1970s with their initial product introductions to the market. Big-endian is the most common convention in data networking (including IPv6), hence its pseudo-synonym network byte order, and little-endian is popular (though not universal) among microprocessors in part due to Intel's significant historical influence on microprocessor designs. Mixed forms also exist, for instance the ordering of bytes within a 16-bit word may differ from the ordering of 16-bit words within a 32-bit word. Such cases are sometimes referred to as mixed-endian or middle-endian. There are also some bi-endian processors which can operate either in little-endian or big-endian mode.
Finally...
Why does any of this matter? Well, curiosity mostly but I was also trying to write a class that would be able to extract X number of bits, starting from bit-address Y. I envisioned it sorta like .net string where i can go and type ".SubArray(12(position), 5(length))" then in case of data like in the top of this post it would retrieve "0001 0" or 2.
Many programming languages and libraries offer functions that allow you to convert to/from network (big endian) and host order (system dependent) so that you can ensure data you're dealing with is in the proper format, if you need to care about it. Since you're asking specifically about bit shifting, it doesn't matter in this case.
Read this post for more info

Byte order for packed images

So from http://en.wikipedia.org/wiki/RGBA_color_space, I learned that the byte order for ARGB is, from lowest address to highest address, BGRA, on a little endian machine in certain interpretations.
How does this effect the naming convention of packed data eg a uint8_t ar[]={R,G,B,R,G,B,R,G,B}?
Little endian by definition stores the bytes of a number in reverse order. This is not strictly necessary if you are treating them as byte arrays however any vaguely efficient code base will actually treat the 4 bytes as a 32 bit unsigned integer. This will speed up software blitting by a factor of almost 4.
Now the real question is why. This comes from the fact that when treating a pixel as a 32 bit int as described above coders want to be able to run arithmetic and shifts in a predictable way. This relies on the bytes being in reverse order.
In short, this is not actually odd as in little endian machines the last byte (highest address) is actually the most significant byte and the first the least significant. Thus a field like this will naturally be in reverse order so it is the correct way around when treated as a number (as a number it will appear ARGB but as a byte array it will appear BGRA).
Sorry if this is unclear, but I hope it helps. If you do not understand or I have missed something please comment.
If you are storing data in a byte array like you have specified, you are using BGR format which is basically RGB reversed:
bgr-color-space