I'm floored that VisualStudio 2015 insists on promoting a WORD (unsigned short) to an unsigned int when only WORD values are involved in only bit manipulations. (i.e. promotes 16 bit to 32 bit when doing 16bit | 16bit).
e.g.
// where WORD is a 'unsigned short'
const WORD kFlag = 1;
WORD old = 2;
auto value = old | kFlag; // why the blazes is value an unsigned int (32 bits)
Moreover, is there a way to get 0x86 intrinsics for WORD|WORD? I surely do not want to pay for (16->32|16->)->16. Nor does this code need to consume more than a couple of 16 bit registers, not a few 32 bit regs.
But the registry use is really just an aside. The optimizer is welcome to do as it pleases, so long as the results are indistinguishable for me. (i.e. it should not change the size in a visible way).
The main problem for me is that using flags|kFlagValue results in a wider entity, and then pumping that into a template gives me a type mismatch error (template is rather much longer than I want to get into here, but the point is it takes two arguments, and they should match in type, or be trivially convertible, but aren't, due to this automatic size-promotion rule).
If I had access to a "conservative bit processing function set" then I could use:
flag non-promoting-bit-operator kFlagValue
To achieve my ends.
I guess I have to go write that, or use casts all over the place, because of this unfortunate rule.
C++ should not promote in this instance. It was a poor language choice.
Why is value promoted to a larger type? Because the language spec says it is (a 16-bit unsigned short will be converted to a 32-bit int). 16-bit ops on x86 actually incur a penalty over the corresponding 32 bit ones (due to a prefix opcode), so the 32 bit version just may run faster.
I'm looking at static_cast with bounded types .
Is the behavior implementation-specific? In other words (given 16-bit shorts and 32-bit longs) is
long x = 70000;
short y = static_cast<short>(x);
guaranteed to produce y = 4464 (the low-order 16 bits of x)? Or only on a little-endian machine?
I have always assumed it would but I am getting odd results on a big-endian machine and trying to figure them out.
Here's the actual problem. I have two time_t's (presumably 64 bits) that I "know" will always be within some reasonable number of seconds of each other. I want to display that difference with printf. The code is multi-platform, so rather than worry about what the underlying type of time_t is, I am doing a printf("%d") passing static_cast<int>(time2-time1). I'm seeing a zero, despite the fact that the printf is in a block conditioned on (time2 != time1). (The printf is in a library; no reasonable possibility of using cout instead.)
Is static_cast possibly returning the high 32 bits of time_t?
Is there a better way to do this?
Thanks,
I think perhaps the problem was unrelated to the static_cast. #ifdef platform confusion. I'd still be interested if someone definitively knows the answer.
I would like to store a signed int32 into an unsigned uint32 such that I can extract a int32 from it later. The value of the uint32 itself isn't used when it stores an integer like this, but I can't use a union in this case unfortunately. The way I currently do this is by simply converting it:
int32 signedVar = -500;
uint32 unsignedVar = uint32(signedVar);
func(int32(unsignedVar)); // should supply the function with -500
This appears to work, but I'm afraid it might not be portable, and that there might be needless conversions happening behind the scenes for what I'm hoping is a no-op.
Is -500 or any other negative number guaranteed to survive this conversion? If not, is there a "painless" (none of the types are changed, only the conversion method) alternative?
Edit: I'm most concerned with making sure the value in int32 is preserved in a uint32 so it can be used as a int32 of the same value later, the uint32 version is never used. (editing because SO asked me to explain how this isn't a duplicate of another question; it doesn't specify what the results of the conversion should look like, or that it needs to be symmetrical like this, making it too general to be applied here)
If your machine uses 1s-complement or sign-magnitude arithmetic (which is allowed by the C standard at least; not sure about C++), then this double conversion will convert the value -0 into 0. If your machine uses 2s complement, then this will be fine.
That said, I'm not aware of any machine built in the last 30 years that uses anything other than 2s complement for arithmetic.
I was having a look over this page: http://www.devbistro.com/tech-interview-questions/Cplusplus.jsp, and didn't understand this question:
What’s potentially wrong with the following code?
long value;
//some stuff
value &= 0xFFFF;
Note: Hint to the candidate about the base platform they’re developing for. If the person still doesn’t find anything wrong with the code, they are not experienced with C++.
Can someone elaborate on it?
Thanks!
Several answers here state that if an int has a width of 16 bits, 0xFFFF is negative. This is not true. 0xFFFF is never negative.
A hexadecimal literal is represented by the first of the following types that is large enough to contain it: int, unsigned int, long, and unsigned long.
If int has a width of 16 bits, then 0xFFFF is larger than the maximum value representable by an int. Thus, 0xFFFF is of type unsigned int, which is guaranteed to be large enough to represent 0xFFFF.
When the usual arithmetic conversions are performed for evaluation of the &, the unsigned int is converted to a long. The conversion of a 16-bit unsigned int to long is well-defined because every value representable by a 16-bit unsigned int is also representable by a 32-bit long.
There's no sign extension needed because the initial type is not signed, and the result of using 0xFFFF is the same as the result of using 0xFFFFL.
Alternatively, if int is wider than 16 bits, then 0xFFFF is of type int. It is a signed, but positive, number. In this case both operands are signed, and long has the greater conversion rank, so the int is again promoted to long by the usual arithmetic conversions.
As others have said, you should avoid performing bitwise operations on signed operands because the numeric result is dependent upon how signedness is represented.
Aside from that, there's nothing particularly wrong with this code. I would argue that it's a style concern that value is not initialized when it is declared, but that's probably a nit-pick level comment and depends upon the contents of the //some stuff section that was omitted.
It's probably also preferable to use a fixed-width integer type (like uint32_t) instead of long for greater portability, but really that too depends on the code you are writing and what your basic assumptions are.
I think depending on the size of a long the 0xffff literal (-1) could be promoted to a larger size and being a signed value it will be sign extended, potentially becoming 0xffffffff (still -1).
I'll assume it's because there's no predefined size for a long, other than it must be at least as big as the preceding size (int). Thus, depending on the size, you might either truncate value to a subset of bits (if long is more than 32 bits) or overflow (if it's less than 32 bits).
Yeah, longs (per the spec, and thanks for the reminder in the comments) must be able to hold at least -2147483647 to 2147483647 (LONG_MIN and LONG_MAX).
For one value isn't initialized before doing the and so I think the behaviour is undefined, value could be anything.
long type size is platform/compiler specific.
What you can here say is:
It is signed.
We can't know the result of value &= 0xFFFF; since it could be for example value &= 0x0000FFFF; and will not do what expected.
While one could argue that since it's not a buffer-overflow or some other error that's likely to be exploitable, it's a style thing and not a bug, I'm 99% confident that the answer that the question-writer is looking for is that value is operated on before it's assigned to. The value is going to be arbitrary garbage, and that's unlikely to be what was meant, so it's "potentially wrong".
Using MSVC I think that the statement would perform what was most likely intended - that is: clear all but the least significant 16 bits of value, but I have encountered other platforms which would interpret the literal 0xffff as equivalent to (short)-1, then sign extend to convert to long, in which case the statement "value &= 0xFFFF" would have no effect.
"value &= 0x0FFFF" is more explicit and robust.
Should a buffer of bytes be signed char or unsigned char or simply a char buffer?
Any differences between C and C++?
Thanks.
If you intend to store arbitrary binary data, you should use unsigned char. It is the only data type that is guaranteed to have no padding bits by the C Standard. Each other data type may contain padding bits in its object representation (that is the one that contains all bits of an object, instead of only those that determines a value). The padding bits' state is unspecified and are not used to store values. So if you read using char some binary data, things would be cut down to the value range of a char (by interpreting only the value bits), but there may still be bits that are just ignored but still are there and read by memcpy. Much like padding bits in real struct objects. Type unsigned char is guaranteed to not contain those. That follows from 5.2.4.2.1/2 (C99 TC2, n1124 here):
If the value of an object of type char is treated as a signed integer when used in an
expression, the value of CHAR_MIN shall be the same as that of SCHAR_MIN and the
value of CHAR_MAX shall be the same as that of SCHAR_MAX. Otherwise, the value of
CHAR_MIN shall be 0 and the value of CHAR_MAX shall be the same as that of
UCHAR_MAX. The value UCHAR_MAX shall equal 2^CHAR_BIT − 1
From the last sentence it follows that there is no space left for any padding bits. If you use char as the type of your buffer, you also have the problem of overflows: Assigning any value explicitly to one such element which is in the range of 8 bits - so you may expect such assignment to be OK - but not within the range of a char, which is CHAR_MIN..CHAR_MAX, such a conversion overflows and causes implementation defined results, including raise of signals.
Even if any problems regarding the above would probably not show in real implementations (would be a very poor quality of implementation), you are best to use the right type from the beginning onwards, which is unsigned char.
For strings, however, the data type of choice is char, which will be understood by string and print functions. Using signed char for these purposes looks like a wrong decision to me.
For further information, read this proposal which contain a fix for a next version of the C Standard which eventually will require signed char not have any padding bits either. It's already incorporated into the working paper.
Should a buffer of bytes be signed
char or unsigned char or simply a char
buffer? Any differences between C and
C++?
A minor difference in how the language treats it. A huge difference in how convention treats it.
char = ASCII (or UTF-8, but the signedness gets in the way there) textual data
unsigned char = byte
signed char = rarely used
And there is code that relies on such a distinction. Just a week or two ago I encountered a bug where JPEG data was getting corrupted because it was being passed to the char* version of our Base64 encode function — which "helpfully" replaced all the invalid UTF-8 in the "string". Changing to BYTE aka unsigned char was all it took to fix it.
It depends.
If the buffer is intended to hold text, then it probably makes sense to declare it as an array of char and let the platform decide for you whether that is signed or unsigned by default. That will give you the least trouble passing the data in and out of the implementation's runtime library, for example.
If the buffer is intended to hold binary data, then it depends on how you intend to use it. For example, if the binary data is really a packed array of data samples that are signed 8-bit fixed point ADC measurements, then signed char would be best.
In most real-world cases, the buffer is just that, a buffer, and you don't really care about the types of the individual bytes because you filled the buffer in a bulk operation, and you are about to pass it off to a parser to interpret the complex data structure and do something useful. In that case, declare it in the simplest way.
If it actually is a buffer of 8 bit bytes, rather than a string in the machine's default locale, then I'd use uint8_t. Not that there are many machines around where a char is not a byte (or a byte a octet), but making the statement 'this is a buffer of octets' rather than 'this is a string' is often useful documentation.
You should use either char or unsigned char but never signed char. The standard has the following in 3.9/2
For any object (other than a
base-class subobject) of POD type T,
whether or not the object holds a
valid value of type T, the underlying
bytes (1.7) making up the object can
be copied into an array of char or
unsigned char.If the content of
the array of char or unsigned char is
copied back into the object, the
object shall subsequently hold its
original value.
It is better to define it as unsigned char. Infact Win32 type BYTE is defined as unsigned char. There is no difference between C & C++ between this.
For maximum portability always use unsigned char. There are a couple of instances where this could come into play. Serialized data shared across systems with different endian type immediately comes to mind. When performing shift or bit masking the values is another.
The choice of int8_t vs uint8_t is similar to when you are comparing a ptr to be NULL.
From a functionality point of view, comparing to NULL is the same as comparing to 0 because NULL is a #define for 0.
But personally, from a coding style point of view, I choose to compare my pointers to NULL because the NULL #define connotes to the person maintaining the code that you are checking for a bad pointer...
VS
when someone sees a comparison to 0 it connotes that you are checking for a specific value.
For the above reason, I would use uint8_t.
If you fetch an element into a wider variable, it will of course be sign-extended or not.
Should and should ... I tend to prefer unsigned, since it feels more "raw", less inviting to say "hey, that's just a bunch of small ints", if I want to emphasize the binary-ness of the data.
I don't think I've ever used an explicit signed char to represent a buffer of bytes.
Of course, one third option is to represent the buffer as void * as much as possible. Many common I/O functions work with void *, so sometimes the decision of what integer type to use can be fully encapsulated, which is nice.
Several years ago I had a problem with a C++ console application that printed colored chars for ASCII values above 128 and this was solved by switching from char to unsigned char, but I think it had been solveable while keeping char type, too.
For now, most C/C++ functions use char and I understand both languages much better now, so I use char in most cases.
Do you really care? If you don't, just use the default (char) and don't clutter your code with unimportant matter. Otherwise, future maintainers will be left wondering why did you use signed (or unsigned). Make their life simpler.
If you lie to the compiler, it will punish you.
If the buffer contains data that is just passing through, and you will not manipulate them in any way, it doesn't matter.
However, if you have to operate on the buffer contents then the correct type declaration will make your code simpler. No "int val = buf[i] & 0xff;" nonsense.
So, think about what the data actually is and how you need to use it.
typedef char byte;
Now you can make your array be of bytes. It's obvious to everyone what you meant, and you don't lose any functionality.
I know it's somewhat silly, but it makes your code read 100% as you intended.