C++: the fastest way to access specific octet of int - c++

Assuming we have 32bit integer, 8bit char, gcc compiler and Intel architecture:
What would be the fastest way (with no assembler usage) to extract, say, third octet of integer variable? To store it to a char of some specific place of char[] for example?

For the 3rd octet (little endian):
int i = 0xdeadbeef;
char c = (char) (i>>16); // c = 0xad

use a Union:
union myCharredInt
{
int myInt;
struct {
char char1;
char char2;
char char3;
char char4;
}
};
myCharredInt a = 5;
char c = a.char3;

shift the octet to the least significant octet and store it
somewhat like this but it depends exactly what you mean by 3rd octet, as the majority of my experience has been in big-endian architecture
char *ptr;
....
*ptr = val >> 8;

Whenever you are looking for the "fastest" or "best" way to do something in very particular circumstances, the answer almost always will be: experiment, and find out.
While there are rules of thumb to follow, they will not conclusively give you the best answer for your particular system, architecture, compiler, etc.
You will notice there are a few different answers to your question already, using different techniques.
How will you know which is best?
Answer: Try them out. Profile them.
N.b.: I'm being a little facetious. I suspect what you really want to know is how to do this at all, and not how to do it fastest.

Related

Is there a way to initialize a char using bits?

I'm trying to represent the 52 cards in a deck of playing cards.
I need a total of 6 bits; 2 for the suit and 4 for the rank.
I thought I would use a char and have the first 2 bits be zero since I don't need them. The problem is I don't know if there's a way to initialize a char using bits.
For example, I'd like to do is:
char aceOfSpades = 00000000;
char queenOfHearts = 00011101;
I know once I've initialized char I can manipulate the bits but it would be easier if I could initialize it from the beginning as shown in my example. Thanks in advance!
Yes you can:
example,
char aceOfSpades = 0b00000000;
char queenOfHearts = 0b00011101;
The easier way, as Captain Oblivious said in comments, is to use a bit field
struct SixBits
{
unsigned int suit : 2;
unsigned int rank : 4;
};
int main()
{
struct SixBits card;
card.suit = 0; /* You need to specify what the values mean */
card.rank = 10;
}
You could try using various bit fiddling operations on a char, but that is more difficult to work with. There is also a potential problem that it is implementation-defined whether char is signed or unsigned - and, if it is signed, bitfiddling operations give undefined behaviour in some circumstances (e.g. if operating on a negative value).
Personally, I wouldn't bother with trying to pack everything into a char. I'd make the code comprehensible (e.g. use an enum to represent the sut, an int to represent rank) unless there is demonstrable need (e.g. trying to get the program to work on a machine with extremely limited memory - which is unlikely in practice with hardware less than 20 years old). Otherwise, all you are really achieving is code that is hard to maintain with few real-world advantages.

c++ best way to compare byte array to struct

I need help. I have an unsigned char * and say I have a struct
struct{
int a=3;
char b='d';
double c=3.14;
char d='e';
} cmp;
unsigned char input[1000];
l= recv(sockfd,input , sizeof(cmp),0);
I want to compare cmp and input. What is the fastest way?
Thanks a lot in advance.
If the compiler guarantees that there are no gaps between fields in the struct (usually happen due to packing) or you can use a #pragna to cancel any such gaps, then you can compare by either:
memcmp(&cmp, input, sizeof(stuct ThesSruct));
Or, my preferred:
cmp == *(struct TheStruct *)input // provided the struct doesn't contain pointers.
But a much safer way would be to compare it on a field by field basis. And even more, prepare special functions for extracting ints, floats, etc.. from the raw input. For example, extracting an int at index n may be as simple as
*(int *)&input[n]
But it might be more complicated, like shifting chars at 8, 16, 24 bits.
In short, accessing the communication data must be done with the most robust way, checking every basic element and not assuming anything.
Give reinterpret_cast a try. This will allow you to arbitrarily cast the char * to a cmp *
http://msdn.microsoft.com/en-us/library/e0w9f63b.aspx
In the general case James Kantzes comment is correct, you can't compare like that. This is , among other things, due to byte padding.
However in the specific case with the following assumptions;
The sender is on the same cpu architecture as the receiver
The sender is using the same compiler and linker as the receiver
The applications are compiled with the same compiler/linker flags
...other things...you get the gist.
The sender is sending it straight from the struct.
cmp c{ ...set variables... };
send(sockfd, (char*)&c, sizeof(c));
So in short, this is a very brittle way of transporting structs and you shouldn't do it for anything except simple tests or quick hacks.

c++ - inverted byte order when casting from char[2] to short

What I have is this
struct Record
{
unsigned char cat;
unsigned char len[2]={0x00, 0x1b}; // can't put short here because that
// whould change the size of the struct
unsigned char dat[253];
};
Record record;
unsigned short recordlen = *((unsigned short*)record.len);
This result in recordlen=0x1b00 instead of 0x001b
Same with *reinterpret_cast<unsigned short*>(record.len)
Can you explain why ? How should I be doing this ?
What you encounter is called "endianness". In x86, all numeric variables are stored "little endian", meaning the least-significant byte comes first.
From the Wikipedia page:
The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses.
This depends on endianess of your cpu. See wikipedia.
In your case you have "little endian", which means that least significant bytes come first. This is convenient when you want to convert numbers to different byte sizes: if you use a long int to represent a short number, its representation is the same as if it were a short number, only it has additional zeroes at the end.
Can you explain why?
Because you cannot assume a specific endianness of your computer architecture.
The natural follow-up question is what do you do about it. Fortunately, you can force a specific byte order by calling one of these functions htonl, htons, ntohl, or ntohs. They work regardless of the computer architecture on which you run them:
On the sending end, you convert from host order to network order; on the receiving end, you convert from network order to host order.
// Sending end
unsigned short recordlen = calculate_len();
*reinterpret_cast<unsigned short*>(record.len) = htons(recordlen);
// Receiving end
unsigned short recordlen = ntohs(*reinterpret_cast<unsigned short*>(record.len));
unsigned short recordlen = *((unsigned short*)record.len);
This is broken. record.len doesn't point to an unsigned short. Telling the compiler it does is just lying.
I presume you want:
unsigned short recordlen = static_cast<unsigned short>(record.len[0]) * 256 +
static_cast<unsigned short>(record.len[1]);
Or, if you like it better:
unsigned short recordlen = (static_cast<unsigned short>(record.len[0]) << 8) |
static_cast<unsigned short>(record.len[1]);
If not, code whatever it is you actually want.

Worst side effects from chars signedness. (Explanation of signedness effects on chars and casts)

I frequently work with libraries that use char when working with bytes in C++. The alternative is to define a "Byte" as unsigned char but that not the standard they decided to use. I frequently pass bytes from C# into the C++ dlls and cast them to char to work with the library.
When casting ints to chars or chars to other simple types what are some of the side effects that can occur. Specifically, when has this broken code that you have worked on and how did you find out it was because of the char signedness?
Lucky i haven't run into this in my code, used a char signed casting trick back in an embedded systems class in school. I'm looking to better understand the issue since I feel it is relevant to the work I am doing.
One major risk is if you need to shift the bytes. A signed char keeps the sign-bit when right-shifted, whereas an unsigned char doesn't.
Here's a small test program:
#include <stdio.h>
int main (void)
{
signed char a = -1;
unsigned char b = 255;
printf("%d\n%d\n", a >> 1, b >> 1);
return 0;
}
It should print -1 and 127, even though a and b start out with the same bit pattern (given 8-bit chars, two's-complement and signed values using arithmetic shift).
In short, you can't rely on shift working identically for signed and unsigned chars, so if you need portability, use unsigned char rather than char or signed char.
The most obvious gotchas come when you need to compare the numeric value of a char with a hexadecimal constant when implementing protocols or encoding schemes.
For example, when implementing telnet you might want to do this.
// Check for IAC (hex FF) byte
if (ch == 0xFF)
{
// ...
Or when testing for UTF-8 multi-byte sequences.
if (ch >= 0x80)
{
// ...
Fortunately these errors don't usually survive very long as even the most cursory testing on a platform with a signed char should reveal them. They can be fixed by using a character constant, converting the numeric constant to a char or converting the character to an unsigned char before the comparison operator promotes both to an int. Converting the char directly to an unsigned won't work, though.
if (ch == '\xff') // OK
if ((unsigned char)ch == 0xff) // OK, so long as char has 8-bits
if (ch == (char)0xff) // Usually OK, relies on implementation defined behaviour
if ((unsigned)ch == 0xff) // still wrong
I've been bitten by char signedness in writing search algorithms that used characters from the text as indices into state trees. I've also had it cause problems when expanding characters into larger types, and the sign bit propagates causing problems elsewhere.
I found out when I started getting bizarre results, and segfaults arising from searching texts other than the one's I'd used during the initial development (obviously characters with values >127 or <0 are going to cause this, and won't necessarily be present in your typical text files.
Always check a variable's signedness when working with it. Generally now I make types signed unless I have a good reason otherwise, casting when necessary. This fits in nicely with the ubiquitous use of char in libraries to simply represent a byte. Keep in mind that the signedness of char is not defined (unlike with other types), you should give it special treatment, and be mindful.
The one that most annoys me:
typedef char byte;
byte b = 12;
cout << b << endl;
Sure it's cosmetics, but arrr...
When casting ints to chars or chars to other simple types
The critical point is, that casting a signed value from one primitive type to another (larger) type does not retain the bit pattern (assuming two's complement). A signed char with bit pattern 0xff is -1, while a signed short with the decimal value -1 is 0xffff. Casting an unsigned char with value 0xff to a unsigned short, however, yields 0x00ff. Therefore, always think of proper signedness before you typecast to a larger or smaller data type. Never carry unsigned data in signed data types if you don't need to - if an external library forces you to do so, do the conversion as late as possible (or as early as possible if the external code acts as data source).
The C and C++ language specifications define 3 data types for holding characters: char, signed char and unsigned char. The latter 2 have been discussed in other answers. Let's look at the char type.
The standard(s) say that the char data type may be signed or unsigned and is an implementation decision. This means that some compilers or versions of compilers, can implement char differently. The implication is that the char data type is not conducive for arithmetic or Boolean operations. For arithmetic and Boolean operations, signed and unsigned versions of char will work fine.
In summary, there are 3 versions of char data type. The char data type performs well for holding characters, but is not suited for arithmetic across platforms and translators since it's signedness is implementation defined.
You will fail miserably when compiling for multiple platforms because the C++ standard doesn't define char to be of a certain "signedness".
Therefore GCC introduces -fsigned-char and -funsigned-char options to force certain behavior. More on that topic can be found here, for example.
EDIT:
As you asked for examples of broken code, there are plenty of possibilities to break code that processes binary data. For example, image you process 8-bit audio samples (range -128 to 127) and you want to halven the volume. Now imagine this scenario (in which the naive programmer assumes char == signed char):
char sampleIn;
// If the sample is -1 (= almost silent), and the compiler treats char as unsigned,
// then the value of 'sampleIn' will be 255
read_one_byte_sample(&sampleIn);
// Ok, halven the volume. The value will be 127!
char sampleOut = sampleOut / 2;
// And write the processed sample to the output file, for example.
// (unsigned char)127 has the exact same bit pattern as (signed char)127,
// so this will write a sample with the loudest volume!!
write_one_byte_sample_to_output_file(&sampleOut);
I hope you like that example ;-) But to be honest I've never really came across such problems, not even as a beginner as far as I can remember...
Hope this answer is sufficient for you downvoters. What about a short comment?
Sign extension. The first version of my URL encoding function produced strings like "%FFFFFFA3".

C++: how to cast 2 bytes in an array to an unsigned short

I have been working on a legacy C++ application and am definitely outside of my comfort-zone (a good thing). I was wondering if anyone out there would be so kind as to give me a few pointers (pun intended).
I need to cast 2 bytes in an unsigned char array to an unsigned short. The bytes are consecutive.
For an example of what I am trying to do:
I receive a string from a socket and place it in an unsigned char array. I can ignore the first byte and then the next 2 bytes should be converted to an unsigned char. This will be on windows only so there are no Big/Little Endian issues (that I am aware of).
Here is what I have now (not working obviously):
//packetBuffer is an unsigned char array containing the string "123456789" for testing
//I need to convert bytes 2 and 3 into the short, 2 being the most significant byte
//so I would expect to get 515 (2*256 + 3) instead all the code I have tried gives me
//either errors or 2 (only converting one byte
unsigned short myShort;
myShort = static_cast<unsigned_short>(packetBuffer[1])
Well, you are widening the char into a short value. What you want is to interpret two bytes as an short. static_cast cannot cast from unsigned char* to unsigned short*. You have to cast to void*, then to unsigned short*:
unsigned short *p = static_cast<unsigned short*>(static_cast<void*>(&packetBuffer[1]));
Now, you can dereference p and get the short value. But the problem with this approach is that you cast from unsigned char*, to void* and then to some different type. The Standard doesn't guarantee the address remains the same (and in addition, dereferencing that pointer would be undefined behavior). A better approach is to use bit-shifting, which will always work:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
This is probably well below what you care about, but keep in mind that you could easily get an unaligned access doing this. x86 is forgiving and the abort that the unaligned access causes will be caught internally and will end up with a copy and return of the value so your app won't know any different (though it's significantly slower than an aligned access). If, however, this code will run on a non-x86 (you don't mention the target platform, so I'm assuming x86 desktop Windows), then doing this will cause a processor data abort and you'll have to manually copy the data to an aligned address before trying to cast it.
In short, if you're going to be doing this access a lot, you might look at making adjustments to the code so as not to have unaligned reads and you'll see a perfromance benefit.
unsigned short myShort = *(unsigned short *)&packetBuffer[1];
The bit shift above has a bug:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
if packetBuffer is in bytes (8 bits wide) then the above shift can and will turn packetBuffer into a zero, leaving you with only packetBuffer[2];
Despite that this is still preferred to pointers. To avoid the above problem, I waste a few lines of code (other than quite-literal-zero-optimization) it results in the same machine code:
unsigned short p;
p = packetBuffer[1]; p <<= 8; p |= packetBuffer[2];
Or to save some clock cycles and not shift the bits off the end:
unsigned short p;
p = (((unsigned short)packetBuffer[1])<<8) | packetBuffer[2];
You have to be careful with pointers, the optimizer will bite you, as well as memory alignments and a long list of other problems. Yes, done right it is faster, done wrong the bug can linger for a long time and strike when least desired.
Say you were lazy and wanted to do some 16 bit math on an 8 bit array. (little endian)
unsigned short *s;
unsigned char b[10];
s=(unsigned short *)&b[0];
if(b[0]&7)
{
*s = *s+8;
*s &= ~7;
}
do_something_With(b);
*s=*s+8;
do_something_With(b);
*s=*s+8;
do_something_With(b);
There is no guarantee that a perfectly bug free compiler will create the code you expect. The byte array b sent to the do_something_with() function may never get modified by the *s operations. Nothing in the code above says that it should. If you don't optimize your code then you may never see this problem (until someone does optimize or changes compilers or compiler versions). If you use a debugger you may never see this problem (until it is too late).
The compiler doesn't see the connection between s and b, they are two completely separate items. The optimizer may choose not to write *s back to memory because it sees that *s has a number of operations so it can keep that value in a register and only save it to memory at the end (if ever).
There are three basic ways to fix the pointer problem above:
Declare s as volatile.
Use a union.
Use a function or functions whenever changing types.
You should not cast a unsigned char pointer into an unsigned short pointer (for that matter cast from a pointer of smaller data type to a larger data type). This is because it is assumed that the address will be aligned correctly. A better approach is to shift the bytes into a real unsigned short object, or memcpy to a unsigned short array.
No doubt, you can adjust the compiler settings to get around this limitation, but this is a very subtle thing that will break in the future if the code gets passed around and reused.
Maybe this is a very late solution but i just want to share with you. When you want to convert primitives or other types you can use union. See below:
union CharToStruct {
char charArray[2];
unsigned short value;
};
short toShort(char* value){
CharToStruct cs;
cs.charArray[0] = value[1]; // most significant bit of short is not first bit of char array
cs.charArray[1] = value[0];
return cs.value;
}
When you create an array with below hex values and call toShort function, you will get a short value with 3.
char array[2];
array[0] = 0x00;
array[1] = 0x03;
short i = toShort(array);
cout << i << endl; // or printf("%h", i);
static cast has a different syntax, plus you need to work with pointers, what you want to do is:
unsigned short *myShort = static_cast<unsigned short*>(&packetBuffer[1]);
Did nobody see the input was a string!
/* If it is a string as explicitly stated in the question.
*/
int byte1 = packetBuffer[1] - '0'; // convert 1st byte from char to number.
int byte2 = packetBuffer[2] - '0';
unsigned short result = (byte1 * 256) + byte2;
/* Alternatively if is an array of bytes.
*/
int byte1 = packetBuffer[1];
int byte2 = packetBuffer[2];
unsigned short result = (byte1 * 256) + byte2;
This also avoids the problems with alignment that most of the other solutions may have on certain platforms. Note A short is at least two bytes. Most systems will give you a memory error if you try and de-reference a short pointer that is not 2 byte aligned (or whatever the sizeof(short) on your system is)!
char packetBuffer[] = {1, 2, 3};
unsigned short myShort = * reinterpret_cast<unsigned short*>(&packetBuffer[1]);
I (had to) do this all the time. big endian is an obvious problem. What really will get you is incorrect data when the machine dislike misaligned reads! (and write).
you may want to write a test cast and an assert to see if it reads properly. So when ran on a big endian machine or more importantly a machine that dislikes misaligned reads an assert error will occur instead of a weird hard to trace 'bug' ;)
On windows you can use:
unsigned short i = MAKEWORD(lowbyte,hibyte);
I realize this is an old thread, and I can't say that I tried every suggestion made here. I'm just making my self comfortable with mfc, and I was looking for a way to convert a uint to two bytes, and back again at the other end of a socket.
There are alot of bit shifting examples you can find on the net, but none of them seemed to actually work. Alot of the examples seem overly complicated; I mean we're just talking about grabbing 2 bytes out of a uint, sending them over the wire, and plugging them back into a uint at the other end, right?
This is the solution I finally came up with:
class ByteConverter
{
public:
static void uIntToBytes(unsigned int theUint, char* bytes)
{
unsigned int tInt = theUint;
void *uintConverter = &tInt;
char *theBytes = (char*)uintConverter;
bytes[0] = theBytes[0];
bytes[1] = theBytes[1];
}
static unsigned int bytesToUint(char *bytes)
{
unsigned theUint = 0;
void *uintConverter = &theUint;
char *thebytes = (char*)uintConverter;
thebytes[0] = bytes[0];
thebytes[1] = bytes[1];
return theUint;
}
};
Used like this:
unsigned int theUint;
char bytes[2];
CString msg;
ByteConverter::uIntToBytes(65000,bytes);
theUint = ByteConverter::bytesToUint(bytes);
msg.Format(_T("theUint = %d"), theUint);
AfxMessageBox(msg, MB_ICONINFORMATION | MB_OK);
Hope this helps someone out.