I have an array of bitmasks, the idea being to use them to clear a specified number of the least significant bits of an integer that is being used as a set of flags. It is defined as follows:
int clearLow[10]=
{
0xffffffff, 0xfffffffe, 0xfffffffc, 0xfffffff8, 0xfffffff0, 0xffffffe0, 0xffffffc0, 0xffffff80, 0xffffff00, 0xfffffe00
};
I recently switching to using gcc 4.8 I have found that this array starts throwing warnings,
warning: narrowing conversion of ‘4294967295u’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11
etc
etc
Clearly my hexadecimal literals are being taken as unsigned ints and the fix is easy as, honestly, I do not care if this array is int or unsigned int it just needs to have the appropriate bits set in each cell, but my question is this:
Are there any ways to set literals in hexadecimal, for the purposes of simply setting bits, without the compiler assuming them to be unsigned?
You describe that you just want to use the values as operands to bit operations. As that is the case, just always use unsigned datatypes. That's the simple solution.
It looks like you just want an array of unsigned int to use for your bit masking:
const unsigned clearLow[] = {
0xffffffff, 0xfffffffe, 0xfffffffc, 0xfffffff8, 0xfffffff0, 0xffffffe0, 0xffffffc0, 0xffffff80, 0xffffff00, 0xfffffe00
};
Related
I use utf8 and have to save a constant in a char array:
const char s[] = {0xE2,0x82,0xAC, 0}; //the euro sign
However it gives me error:
test.cpp:15:40: error: narrowing conversion of ‘226’ from ‘int’ to ‘const char’ inside { } [-fpermissive]
I have to cast all the hex numbers to char, which I feel tedious and don't smell good. Is there any other proper way of doing this?
char may be signed or unsigned (and the default is implementation specific). You probably want
const unsigned char s[] = {0xE2,0x82,0xAC, 0};
or
const char s[] = "\xe2\x82\xac";
or with many recent compilers (including GCC)
const char s[] = "€";
(a string literal is an array of char unless you give it some prefix)
See -funsigned-char (or -fsigned-char) option of GCC.
On some implementations a char is unsigned and CHAR_MAX is 255 (and CHAR_MIN is 0). On others char-s are signed so CHAR_MIN is -128 and CHAR_MAX is 127 (and e.g. things are different on Linux/PowerPC/32 bits and Linux/x86/32 bits). AFAIK nothing in the standard prohibits 19 bits signed chars.
The short answer to your question is that you are overflowing a char. A char has the range of [-128, 127]. 0xE2 = 226 > 127. What you need to use is an unsigned char, which has a range of [0, 255].
unsigned char s = {0xE2,0x82,0xAC, 0};
While it may well be tedious to be putting lots of casts in your code, it actually smells extremely GOOD to me to use as strong of typing as possible.
As noted above, when you specify type "char" you are inviting a compiler to choose whatever the compiler writer preferred (signed or unsigned). I'm no expert on UTF-8, but there is no reason to make your code non-portable if you don't need to.
As far as your constants, I've used compilers that default constants written that way to signed ints, as well as compilers that consider the context and interpret them accordingly. Note that converting between signed and unsigned can overflow EITHER WAY. For the same number of bits, a negative overflows an unsigned (obviously) and an unsigned with the top bit set overflows a signed, because the top bit means negative.
In this case, your compiler is taking your constants as unsigned 8 bit--OR LARGER--which means they don't fit as signed 8 bit. And we are all grateful that the compiler complains (at least I am).
My perspective is, there is nothing at all bad about casting to show exactly what you intend to happen. And if a compiler lets you assign between signed and unsigned, it should require that you cast regardless of variables or constants. eg
const int8_t a = (int8_t) 0xFF; // will be -1
although in my example, it would be better to assign -1. When you are having to add extra casts, they either make sense, or you should code your constants so they make sense for the type you are assigning to.
Is there a way to mix these? I want a define macro FX_RGB(R,G,B) that makes a const string "\x01\xRR\xGG\xBB" so I can do the following:
const char* LED_text = "Hello " FX_RGB(0xff, 0xff, 0x80) "World";
and get a sting: const char* LED_text = "Hello \x01\xff\xff\x80World";
Lets say I want to store the low 16 bits of a uint32_t in a uint16_t on windows, I could do it either
uint32_t value = 123456789;
uint16_t low1 = value; //like this
uint16_t low2 = value & 0xFFFF; //or this
There appears to be no difference in the results but I couldn't find any documentation explicitly stating that this is defined behavior. Could it be different under circumstances X or Y? Or is this just how it works?
The C++ standard guarantees that assignment to and initialization of unsigned types gives you the value modulo 2n, where n is the number of bits in the value representation of the unsigned type.
In Windows all bits participate in the value representation.
Hence using a bitmask serves no purpose other than to put a little stumbling block in the way for the future, when one might change the types.
If you absolutely want to use a mask, e.g. to avoid a compilation warning from an over-zealous compiler, then you can do it in a type-independent way like this, assuming that the type is unsigned:
uint16_t low2 = value & uint16_t(-1);
Which relies on the aforementioned modulo-2n guarantee.
The compiler should give you a warning, that value will be truncated if you use -Wconversion.
warning: conversion to 'uint16_t {aka short unsigned int}' from 'uint32_t {aka unsigned int}' may alter its value [-Wconversion]
uint16_t low1 = value;
^
With bitmask g++ do not produce a warning...
This question already has answers here:
Difference between unsigned and unsigned int in C
(5 answers)
Closed 9 years ago.
I saw in some C++ code the keyword "unsigned" in the following form:
const int HASH_MASK = unsigned(-1) >> 1;
and later:
unsigned hash = HASH_SEED;
(it is taken from the CS106B/X reader - of Stanford - by Eric S. Roberts - on the topic of "implementation of the hash code function for strings").
Can someone tell me please what does that keyword mean and when do I use it anyway?
Thanks!
Take a look: https://stackoverflow.com/a/7176690/1758762
unsigned is a modifier which can apply to any integral type (char,
short, int, long, etc.) but on its own it is identical to unsigned
int.
It's a short version of unsigned int. Syntactically, you can use it anywhere you would use any other datatype like float or short.
Unsigned types are types that can't represent negative numbers; only zero and positive numbers. In C++, they use modular arithmetic; the modulus for an N-bit type is 2^N. It's a good idea to use unsigned rather than signed types when messing around with bit patterns (for example, when calculating hash codes), since C++ allows several different representations of negative numbers which could lead to portability issues.
unsigned can be used as a qualifier for any integer type (e.g. unsigned int or unsigned long long); or on its own as shorthand for unsigned int.
So the first converts -1 into unsigned int. Due to modular arithmetic, this gives the largest representable value. This could also be written (more clearly, in my opinion) as std::numeric_limits<unsigned>::max().
The second declares and initialises a variable of type unsigned int.
Values are signed by default, which means they can be positive or negative. The unsigned keyword is used to specify that a value must be positive.
Signed variables use 1 bit to specify whether the value is positive or not. The unsigned keyword actualy makes this bit part of the value (thus allowing bigger numbers to be stored).
Lastly, unsigned hash is interpreted by compilers as unsigned int hash (int being the default type in C programming).
To get a good idea what unsigned means, one has to understand signed and unsigned integers. For a full explanation of twos-compliment, search Wikipedia, but in a nutshell, a computer stores negative numbers by subtracting negative numbers from 2^32 (for a 32-bit integer). In this way, -1 is stored as 2^32-1. This does mean that you only have 2^31 positive numbers, but that is by the by. This is known as signed integers (as it can have positive or negative sign)
Unsigned tells the compiler that you don't want twos compliment and are dealing only in positive numbers. When -1 is typecast (as it is in the code) to an unsigned int it becomes
2^32-1 = 0b111111111...
Thus that is an easy way of getting a whole lot of 1s in binary.
Use unsigned rarely. If you need to do bit operations, or for some reason need only positive integers bigger than 2^31. Otherwise, if you leave it out, c++ assumes signed integers.
C allows chars to be signed or unsigned, depending on which is more efficient for the host computer. if you want to be sure your char is unsigned, you can declare your variable to be unsigned char. You can use signed char if you want the ensure signed interpretation.
Incidentally, the C and C++ compilers treatd char, signed char, and unsigned char as three distinct types, even though char is compiled into one of the other two.
I just asked this question and it got me thinking if there is any reason
1)why you would assign a int variable using hexidecimal or octal instead of decimal and
2)what are the difference between the different way of assignment
int a=0x28ff1c; // hexideciaml
int a=10; //decimal (the most commonly used way)
int a=012177434; // octal
You may have some constants that are more easily understood when written in hexadecimal.
Bitflags, for example, in hexadecimal are compact and easily (for some values of easily) understood, since there's a direct correspondence 4 binary digits => 1 hex digit - for this reason, in general the hexadecimal representation is useful when you are doing bitwise operations (e.g. masking).
In a similar fashion, in several cases integers may be internally divided in some fields, for example often colors are represented as a 32 bit integer that goes like this: 0xAARRGGBB (or 0xAABBGGRR); also, IP addresses: each piece of IP in the dotted notation is two hexadecimal digits in the "32-bit integer" notation (usually in such cases unsigned integers are used to avoid messing with the sign bit).
In some code I'm working on at the moment, for each pixel in an image I have a single byte to use to store "accessory information"; since I have to store some flags and a small number, I use the least significant 4 bits to store the flags, the 4 most significant ones to store the number. Using hexadecimal notations it's immediate to write the appropriate masks and shifts: byte & 0x0f gives me the 4 LS bits for the flags, (byte & 0xf0)>>4 gives me the 4 MS bits (re-shifted in place).
I've never seen octal used for anything besides IOCCC and UNIX permissions masks (although in the last case they are actually useful, as you probably know if you ever used chmod); probably their inclusion in the language comes from the fact that C was initially developed as the language to write UNIX.
By default, integer literals are of type int, while hexadecimal literals are of type unsigned int or larger if unsigned int isn't large enough to hold the specified value. So, when assigning a hexadecimal literal to an int there's an implicit conversion (although it won't impact the performance, any decent compiler will perform the cast at compile time). Sorry, brainfart. I checked the standard right now, it goes like this:
decimal literals, without the u suffix, are always signed; their type is the smallest that can represent them between int, long int, long long int;
octal and hexadecimal literals without suffix, instead, may also be of unsigned type; their actual type is the smallest one that can represent the value between int, unsigned int, long int, unsigned long int, long long int, unsigned long long int.
(C++11, §2.14.2, ¶2 and Table 6)
The difference may be relevant for overload resolution1, but it's not particularly important when you are just assigning a literal to a variable. Still, keep in mind that you may have valid integer constants that are larger than an int, i.e. assignment to an int will result in signed integer overflow; anyhow, any decent compiler should be able to warn you in these cases.
Let's say that on our platform integers are in 2's complement representation, int is 16 bit wide and long is 32 bit wide; let's say we have an overloaded function like this:
void a(unsigned int i)
{
std::cout<<"unsigned";
}
void a(int i)
{
std::cout<<"signed";
}
Then, calling a(1) and a(0x1) will produce the same result (signed), but a(32768) will print signed and a(0x10000) will print unsigned.
It matters from a readability standpoint - which one you choose expresses your intention.
If you're treating the variable as an integral type, you know, like 2+2=4, you use the decimal representation. It's intuitive and straight-forward.
If you're using it as a bitmask, you can use hexa, octal or even binary. For example, you'll know
int a = 0xFF;
will have the last 8 bits set to 1. You'll know that
int a = 0xF0;
is (...)11110000, but you couldn't directly say the same thing about
int a = 240;
although they are equivalent. It just depends on what you use the numbers for.
well the truth is it doesn't matter if you want it on decimal, octal or hexadecimal its just a representation and for your information, numbers in computers are stored in binary(so they are just 0's and 1's) which you can use also to represent a number. so its just a matter of representation and readability.
NOTE:
Well in some of C++ debuggers(in my experience) I assigned a number as a decimal representation but in my debugger it is shown as hexadecimal.
It's similar to the assignment of and integer this way:
int a = int(5);
int b(6);
int c = 3;
it's all about preference, and when it breaks down you're just doing the same thing. Some might choose octal or hex to go along with their program that manipulates that type of data.
Is it really necessary to use unsigned char to hold binary data as in some libraries which work on character encoding or binary buffers? To make sense of my question, have a look at the code below -
char c[5], d[5];
c[0] = 0xF0;
c[1] = 0xA4;
c[2] = 0xAD;
c[3] = 0xA2;
c[4] = '\0';
printf("%s\n", c);
memcpy(d, c, 5);
printf("%s\n", d);
both the printf's output 𤭢 correctly, where f0 a4 ad a2 is the encoding for the Unicode code-point U+24B62 (𤭢) in hex.
Even memcpy also correctly copied the bits held by a char.
What reasoning could possibly advocate the use of unsigned char instead of a plain char?
In other related questions unsigned char is highlighted because it is the only (byte/smallest) data type which is guaranteed to have no padding by the C-specification. But as the above example showed, the output doesn't seem to be affected by any padding as such.
I have used VC++ Express 2010 and MinGW to compile the above. Although VC gave the warning
warning C4309: '=' : truncation of constant value
the output doesn't seems to reflect that.
P.S. This could be marked a possible duplicate of Should a buffer of bytes be signed or unsigned char buffer? but my intent is different. I am asking why something which seems to be working as fine with char should be typed unsigned char?
Update: To quote from N3337,
Section 3.9 Types
2 For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes (1.7) making up the object can be copied into
an array of char or unsigned char. If the content of the array of char
or unsigned char is copied back into the object, the object shall
subsequently hold its original value.
In view of the above fact and that my original example was on Intel machine where char defaults to signed char, am still not convinced if unsigned char should be preferred over char.
Anything else?
In C the unsigned char data type is the only data type that has all the following three properties simultaneously
it has no padding bits, that it where all storage bits contribute to the value of the data
no bitwise operation starting from a value of that type, when converted back into that type, can produce overflow, trap representations or undefined behavior
it may alias other data types without violating the "aliasing rules", that is that access to the same data through a pointer that is typed differently will be guaranteed to see all modifications
if these are the properties of a "binary" data type you are looking for, you definitively should use unsigned char.
For the second property we need a type that is unsigned. For these all conversion are defined with modulo arihmetic, here modulo UCHAR_MAX+1, 256 in most 99% of the architectures. All conversion of wider values to unsigned char thereby just corresponds to truncation to the least significant byte.
The two other character types generally don't work the same. signed char is signed, anyhow, so conversion of values that don't fit it is not well defined. char is not fixed to be signed or unsigned, but on a particular platform to which your code is ported it might be signed even it is unsigned on yours.
You'll get most of your problems when comparing the contents of individual bytes:
char c[5];
c[0] = 0xff;
/*blah blah*/
if (c[0] == 0xff)
{
printf("good\n");
}
else
{
printf("bad\n");
}
can print "bad", because, depending on your compiler, c[0] will be sign extended to -1, which is not any way the same as 0xff
The plain char type is problematic and shouldn't be used for anything but strings. The main problem with char is that you can't know whether it is signed or unsigned: this is implementation-defined behavior. This makes char different from int etc, int is always guaranteed to be signed.
Although VC gave the warning ... truncation of constant value
It is telling you that you are trying to store int literals inside char variables. This might be related to the signedness: if you try to store an integer with value > 0x7F inside a signed character, unexpected things might happen. Formally, this is undefined behavior in C, though practically you'd just get a weird output if attempting to print the result as an integer value stored inside a (signed) char.
In this specific case, the warning shouldn't matter.
EDIT :
In other related questions unsigned char is highlighted because it is the only (byte/smallest) data type which is guaranteed to have no padding by the C-specification.
In theory, all integer types except unsigned char and signed char are allowed to contain "padding bits", as per C11 6.2.6.2:
"For unsigned integer types other than unsigned char, the bits of the
object representation shall be divided into two groups: value bits and
padding bits (there need not be any of the latter)."
"For signed integer types, the bits of the object representation shall
be divided into three groups: value bits, padding bits, and the sign
bit. There need not be any padding bits; signed char shall not have
any padding bits."
The C standard is intentionally vague and fuzzy, allowing these theoretical padding bits because:
It allows different symbol tables than the standard 8-bit ones.
It allows implementation-defined signedness and weird signed integer formats such as one's complement or "sign and magnitude".
An integer may not necessarily use all bits allocated.
However, in the real world outside the C standard, the following applies:
Symbol tables are almost certainly 8 bits (UTF8 or ASCII). Some weird exceptions exist, but clean implementations use the standard type wchar_t when implementing symbols tables larger than 8 bits.
Signedness is always two's complement.
An integer always uses all bits allocated.
So there is no real reason to use unsigned char or signed char just to dodge some theoretical scenario in the C standard.
Bytes are usually intended as unsigned 8 bit wide integers.
Now, char doesn't specify the sign of the integer: on some compilers char could be signed, on other it may be unsigned.
If I add a bit shift operation to the code you wrote, then I will have an undefined behaviour. The added comparison will also have an unexpected result.
char c[5], d[5];
c[0] = 0xF0;
c[1] = 0xA4;
c[2] = 0xAD;
c[3] = 0xA2;
c[4] = '\0';
c[0] >>= 1; // If char is signed, will the 7th bit go to 0 or stay the same?
bool isBiggerThan0 = c[0] > 0; // FALSE if char is signed!
printf("%s\n", c);
memcpy(d, c, 5);
printf("%s\n", d);
Regarding the warning during the compilation: if the char is signed then you are trying to assign the value 0xf0, which cannot be represented in the signed char (range -128 to +127), so it will be casted to a signed value (-16).
Declaring the char as unsigned will remove the warning, and is always good to have a clean build without any warning.
The signed-ness of the plain char type is implementation defined, so unless you're actually dealing with character data (a string using the platform's character set - usually ASCII), it's usually better to specify the signed-ness explicitly by either using signed char or unsigned char.
For binary data, the best choice is most probably unsigned char, especially if bitwise operations will be performed on the data (specifically bit shifting, which doesn't behave the same for signed types as for unsigned types).
I am asking why something which seems to be working as fine with char should be typed unsigned char?
If you do things which are not "correct" in the sense of the standard, you rely on undefined behaviour. Your compiler might do it the way you want today, but you don't know what it does tomorrow. You don't know what GCC does or VC++ 2012. Or even if the behaviour depends on external factors or Debug/Release compiles etc. As soon as you leave the safe path of the standard, you might run into trouble.
Well, what do you call "binary data"? This is a bunch of bits, without any meaning assigned to them by that specific part of software that calls them "binary data". What's the closest primitive data type, which conveys the idea of the lack of any specific meaning to any one of these bits? I think unsigned char.
Is it really necessary to use unsigned char to hold binary data as in some libraries which work on character encoding or binary buffers?
"really" necessary? No.
It is a very good idea though, and there are many reasons for this.
Your example uses printf, which not type-safe. That is, printf takes it's formatting cues from the format string and not from the data type. You could just as easily tried:
printf("%s\n", (void*)c);
... and the result would have been the same. If you try the same thing with c++ iostreams, the result will be different (depending on the signed-ness of c).
What reasoning could possibly advocate the use of unsigned char instead of a plain char?
Signed specifies that the most significant bit of the data (for unsigned char the 8-th bit) represents the sign. Since you obviously do not need that, you should specify your data is unsigned (the "sign" bit represents data, not the sign of the other bits).