compiler and endians - c++

I have an ISA which is "kind" of little endian.
The basic memory unit is an integer and not byte.For example
00000000: BEFC03FF 00008000
Represents that the "low" integer is BEFC03FF and "high" integer is 00008000.
I need to read the value represented by some bits.For example bits 31 till 47.
What I am doing in VS10 (c++) generate uint64_t var = 0x00008000BEFC03FF
after it use relevant mask and check the value of var & mask.
Is it legal to do that way?I do some assumption about uint64_t bits arrangement - is it legal?
Can I suppose that for very compiler and for every OS (without dependency on hw) the arrangement of bits in the uint64_t will be this way?

You are right to be concerned, It does matter.
However, in this particular case, since ISA is little endian, i.e. if it has AD[31:0], the least significant bit of an integer is packed to bit 0. Assuming your processor is also little endian, then nothing to worry about. when the data written to memory, it should have the right byte order
0000 FF
0001 03
0002 ..
suppose, if your external bus protocol is big endian and your processor is little endian. then a 16 bit integer in your processor, say 0x1234 would be 0001_0010_0011_0100 in native format, but 0010_1100_0100_1000 on the bus (assuming it's 16 bit).
In this case, multi byte data crosses endian boundary, the hardware will only swap bits inside a byte, because it must preserve the memory contiguousness between bytes. after hardware swap, it becomes:
0000 0001_0010
0001 0011_0100
then it is up to the software to swap the byte order

Related

How can i reverse the order of hex in c++

To start with, I have a char array that store data
unsigned char dat[3];
memset(dat, 0, sizeof(dat));
memcpy(dat, &no, 2);
when I inspect dat, it contain a hex of 0xfd 0x01
as the value of no is 509, the hex should be 0x01 0xfd
I'm wondering should I be concern of the order of the hexadecimal,
should I change the order. Many thanks
Your system is little endian. It's hardware dependent and on little endian platform first byte is the least significant one when treated as part of multi-byte value. Look up: https://en.wikipedia.org/wiki/Endianness
Essentially if CPU is little endian, then value 0x12345689 would be represented as set of bytes starting with 0x89. On big endian it's opposite order and on mixed endian it may change during run-time.
The question really is what do you want to do next ? On your current hardware (Little Endian) this is how the system orders bytes of a numeric. The least significant byte comes first: 0xfd 0x01.
In case you really want to swap this byte order, for whatever reason, checkout: How do I convert between big-endian and little-endian values in C++?

Is this the correct way of writing bits to big endian?

Currently, it's for a Huffman compression algorithm that assigns binary codes to characters used in a text file. Fewer bits for more frequent- and more bits for less-frequent characters.
Currently, I'm trying to save the binary code big-endian in a byte.
So let's say I'm using an unsigned char to hold it.
00000000
And I want to store some binary code that's 1101.
In advance, I want to apologize if this seems trivial or is a dupe but I've browsed dozens of other posts and can't seem to find what I need. If anyone could link or quickly explain, it'd be greatly appreciated.
Would this be the correct syntax?
I'll have some external method like
int length = 0;
unsigned char byte = (some default value);
void pushBit(unsigned int bit){
if (bit == 1){
byte |= 1;
}
byte <<= 1;
length++;
if (length == 8) {
//Output the byte
length = 0;
}
}
I've seen some videos explaining endianess and my understanding is the most significant bit (the first one) is placed in the lowest memory address.
Some videos showed the byte from left to right which makes me think I need to left shift everything over but whenever I set, toggle, erase a bit, it's from the rightmost is it not? I'm sorry once again if this is trivial.
So after my method finishes pushing the 1101 into this method, byte would be something like 00001101. Is this big endian? My knowledge of address locations is very weak and I'm not sure whether
**-->00001101 or 00001101<-- **
location is considered the most significant.
Would I need to left shift the remaining amount?
So since I used 4 bits, I would left shift 4 bits to make 11010000. Is this big endian?
First off, as the Killzone Kid noted, endianess and the bit ordering of a binary code are two entirely different things. Endianess refers to the order in which a multi-byte integer is stored in the bytes of memory. For little endian, the least significant byte is stored first. For big endian, the most significant byte is stored first. The bits in the bytes don't change order. Endianess has nothing to do with what you're asking.
As for accumulating bits until you have a byte's worth to write, you have the basic idea, but your code is incorrect. You need to shift first, and then or the bit. The way you're doing it, you are losing the first bit you put in off the top, and the low bit of what you write is always zero. Just put the byte <<= 1; before the if.
You also need to deal with ending the stream somehow, writing out the last bits if there are less than eight left. So you'll need a flushBits() to write out you bit buffer if it has more than one bit in it. Your bit stream would need to be self terminating, or you need to first send the number of bits, so that you don't misinterpret the filler bits in the last byte as a code or codes.
There are two types of endianness, Big-endian and Little-endian (technically there are more, like middle-endian, but big and little are the most common). If you want to have the big-endian format, (as it seems like you do), then the most significant byte comes first, with little-endian the least significant byte comes first.
Wikipedia has some good examples
It looks like what you are trying to do is store the bits themselves within the byte to be in reverse order, which is not what you want. A byte is endian agnostic and does not need to be flipped. Multi-byte types such as uint32_t may need their byte order changed, depending on what endianness you want to achieve.
Maybe what you are referring to is bit numbering, in which case the code you have should largely work (although you should compare length to 7, not 8). The order you place the bits in pushBit would end up with the first bit you pass being the most significant bit.
Bits aren't addressable by definition (if we're talking about C++, not C51 or its C++ successor), so from point of high level language, even from POV of assembler pseudo-code, no matter what the direction LSB -> MSB is, bit-wise << would perform shift from LSB to MSB. Bit order referred as bit numbering and is a separate feature from endian-ness, related to hardware implementation.
Bit fields in C++ change order because in most common use-cases usually bits do have an opposite order, e.g. in network communication, but in fact way how bit fields are packed into byte is implementation dependent, there is no consistency guarantee that there is no gaps or that order is preserved.
Minimal addressable unit of memory in C++ is of char size , and that's where your concern with endian-ness ends. The rare case if you actually should change bit order (when? working with some incompatible hardware?), you have to do explicitly so.
Note, that when working with Ethernet or other network protocol you should not do so, order change is done by hardware (first bit sent over wire is least significant one on the platform).

Bit order in C/C++

I have to implement a protocol which defines data in 8bit words, which starts with the least significant bit (LSB) first. I want to realize this data with unsigned char, but I don't know what's the bit order of LSB and most significant bit (MSB) in C/C++, that could possible require swapping the bits.
Can anybody explain me how to find out an unsigned char is encoded: with MSB-LSB or LSB-MSB?
Example:
unsigned char b = 1;
MSB-LSB: 0000 0001
LSB-MSB: 1000 0000
Endian-ness is platform dependent. Anyway, you don't have to worry about actual bit order unless you are serializing the bytes, which you may be trying to do. In which case, you still don't need to worry about how individual bytes are stored while they're on the machine, since you will have to dig the bits out individually anyway. Fortunately, if you bitwise AND with 1, you get the LSB, regardless of storage order; bit-AND with 2 and you get the next most significant bit, and so on. The compiler will sort out what constants to generate in the machine code, so that level of detail is abstracted away.
There is no such thing in C/C++. The least significant bit is -- well -- the least significant bit. Since the bits don't have addresses, there is no other ordering.

big endian little endian conversion

I saw a question on stack overflow how to convert from one endian to another. And the solution was like this:
template <typename T>
void swap_endian(T& pX)
{
char& raw = reinterpret_cast<char&>(pX);
std::reverse(&raw, &raw + sizeof(T));
}
My question is. Will this solution swap the bits correctly? It will swaps the bytes in the correct order but it will not swap the bits.
Yes it will, because there is no need to swap the bits.
Edit:
Endianness has effect on the order in which the bytes are written for values of 2 bytes or more. Little endian means the least significant byte comes first, big-endian the other way around.
If you receive a big-eindian stream of bytes written by a little endian system, there is no debate what the most significant bit is within the bytes. If the bit order was affected you could not read each others byte streams reliably (even if it was just plain 8 bit ascii).
This can not be autmatically determined for 2-byte or bigger values, as the file system (or network layer) does not know if you send data a byte at a time, or if you are sending ints that are (e.g.) 4 bytes long.
If you have a direct 1-bit serial connection with another system, you will have to agree on little or big endian bit ordering at the transport layer.
bigendian vs little endian concerns itself with how bytes are ordered within a larger unit, such as an int,long, etc. The ordering of bits within a byte is the same.
"Endianness" generally refers to byte order, not the order of the bits within those bytes. In this case, you don't have to reverse the bits.
You are correct, that function would only swap the byte order, not individual bits. This is usually sufficient for networking. Depending on your needs, you may also find the htons() family of functions useful.
From Wikipedia:
Most modern computer processors agree
on bit ordering "inside" individual
bytes (this was not always the
case). This means that any single-byte
value will be read the same on almost
any computer one may send it to."

Endianness in casting an array of two bytes into a single short

Problem: I cannot understand the number 256 (2^8) in the extract of the IBM article:
On the other hand, if it's a
big-endian system, the high byte is 1
and the value of x is 256.
Assume each element in an array consumes 4 bites, then the processor should read somehow: 1000 0000. If it is a big endian, it is 0001 0000 because endianness does not affect bits inside bytes. [2] Contradiction to the 256 in the article!?
Question: Why is the number 256_dec (=1000 0000_bin) and not 32_dec (=0001 0000_bin)?
[2] Endian issues do not affect sequences that have single bytes, because "byte" is considered an atomic unit from a storage point of view.
Because a byte is 8 bits, not 4. The 9th least significant bit in an unsigned int will have value 2^(9-1)=256. (the least significant has value 2^(1-1)=1).
From the IBM article:
unsigned char endian[2] = {1, 0};
short x;
x = *(short *) endian;
They're correct; the value is (short)256 on big-endian, or (short)1 on little-endian.
Writing out the bits, it's an array of {00000001_{base2}, 00000000_{base2}}. Big endian would interpret that bit array read left to right; little endian would swap the two bytes.
256dec is not 1000_0000bin, it's 0000_0001_0000_0000bin.
With swapped bytes (1 byte = 8 bits) this looks like 0000_0000_0000_0001bin, which is 1dec.
Answering your followup question: briefly, there is no "default size of an element in an array" in most programming languages.
In C (perhaps the most popular programming language), the size of an array element -- or anything, really -- depends on its type. For an array of char, the elements are usually 1 byte. But for other types, the size of each element is whatever the sizeof() operator gives. For example, many C implementations give sizeof(short) == 2, so if you make an array of short, it will then occupy 2*N bytes of memory, where N is the number of elements.
Many high-level languages discourage you from even attempting to discover how many bytes an element of an array requires. Giving a fixed number of bytes ties the designers' hands to always using that many bytes, which is good for transparency and code that relies on its binary representation, but bad for backward compatibility whenever some reason comes along to change the representation.
Hope that helps. (I didn't see the other comments until after I wrote the first version of this.)