Gimpel Software's PC-lint and Flexelint have a rule "971: Use of 'char' without 'signed' or 'unsigned'", that forbids using the plain char type without specifying signedness.
http://www.gimpel.com/html/pub/msg.txt
I think this is misguided. If char is used as an integer type then it might make sense to specify signedness explicitly, but not when it is used for textual characters. Standard library functions like printf take pointers to plain char, and using signed or unsigned char is a type mismatch. One can of course cast between the types, but that can lead to just the kind of mistakes that lint is trying to prevent.
Is this lint rule against the plain char type wrong?
The messages PC Lint provides in the 900-999 (and 1900-1999 for C++) range are called "Elective Notes" and off by default. They are intended for use if you have a coding guideline that is restrictive in some specific way. You may then activate one or more of these notes to help you find potential violations. I do not think that anyone has activated all the 9xx messages in a real development effort.
You are right about char: It is the use of a byte (almost always) for a real character. However, a char is treated by a C compiler as either signed or unsigned. For C++, char is different from both unsigned char and from signed char.
I many embedded C environments I have worked in it was customary to have a coding rule stating that a plain char is not allowed. That's when this PC Lint message should be activated. Exceptions, like in interfacing with other libraries, had to be explicitly permitted, and then used a Lint comment to suppress the individual message.
I would assume the reason that they chose to force you to pick signed or unsigned is because the C standard does not. The C standard states char, unsigned char, and signed char are three unique types.
gcc for example makes the default signed but that can be modified with the flag -funsigned-char
So IMO I would say no, the rule aginst it is not wrong, it's just trying to tighten up the C spec.
Related
I'm drowning in the sea of casts between char* and unsigned char*, it's just ridiculous. There has to be some way around this.
Yes -fpermissive will allow this
No. C++ pointers are more strongly typed than C's. You are required to cast between pointer types outside of upcasting in C++ which can happen implicitly. What you want is intentionally difficult in C++ to discourage people from doing so.
C allows these potentially unsafe conversions to occur silently, but if you enable warnings in gcc, the issues will be brought to your attention.
On another note: having a char * point to an unsigned char is well defined behavior, though could be error prone in the long run depending on how the memory is used.
Consider following code:
void foo(unsigned int x)
{
}
int main()
{
foo(-5);
return 0;
}
This code compiles with no problems. Errors like this can cause lots of problems and are hard to find. Why does C++ allow such conversion?
The short answer is because C supported such conversions originally and they didn't want to break existing software in C++.
Note that some compilers will warn on this. For example g++ -Wconversion will warn on that construct.
In many cases the implicit conversion is useful, for example when int was used in calculations, but the end result will never be negative (known from the algorithm and optionally asserted upon).
EDIT: Additional probable explanation: Remember that originally C was a much looser-typed language than C++ is now. With K&R style function declarations there would have been no way for the compiler to detect such implicit conversions, so why bother restricting it in the language. For example your code would look roughly like this:
int foo(x)
unsigned int x
{
}
int main()
{
foo(-5);
return 0;
}
while the declaration alone would have been int foo(x);
The compiler actually relied on the programmer to pass the right types into each function call and did no conversions at the call site. Then when the function actually got called the data on the stack (etc) was interpreted in the way the function declaration indicated.
Once code was written that relied on that sort of implicit conversion it would have become much harder to remove it from ANSI C even when function prototypes were added with actual type information. This is likely why it remains in C even now. Then C++ came along and again decided to not break backwards compatibility with C, continuing to allow such implicit conversions.
Just another quirk of a language that has lots of silly quirks.
The conversion is well-defined to wrap around, which may be useful in some cases.
It's backward-compatible with C, which does it for the above reasons.
Take your pick.
#user168715 is right. C++ was initially designed to be a superset of C, pretending to be as backward-compatible as possible.
The "C" philosophy is to deliver most of the responsibility to the programmer, instead of disallowing dangerous things. For C programmers it is heaven, for Java programmers, it is hell... a matter of taste.
I will dig the standards to see where exactly it is written, but I have no time for this right now. I'll edit my answer as soon as I can.
I also agree that some of the inherited freedom can lead to errors that are really hard to debug, so I am adding to what was said that in g++ you can turn on a warning to prevent you from doing this kind of mistake: -Wconversion flag.
-Wconversion
Warn for implicit conversions that may alter a value. This includes
conversions between real and integer,
like abs (x) when x is double;
conversions between signed and
unsigned, like unsigned ui = -1; and
conversions to smaller types, like
sqrtf (M_PI). Do not warn for explicit
casts like abs ((int) x) and ui =
(unsigned) -1, or if the value is not
changed by the conversion like in abs
(2.0). Warnings about conversions
between signed and unsigned integers
can be disabled by using
-Wno-sign-conversion.
For C++, also warn for confusing overload resolution for user-defined
conversions; and conversions that will
never use a type conversion operator:
conversions to void, the same type, a
base class or a reference to them.
Warnings about conversions between
signed and unsigned integers are
disabled by default in C++ unless
-Wsign-conversion is explicitly enabled.
Other compilers may have similar flags.
By the time of the original C standard, the conversion was already allowed by many (all?) compilers. Based on the C rationale, there appears to have been little (if any) discussion of whether such implicit conversions should be allowed. By the time C++ came along, such implicit conversions were sufficiently common that eliminating them would have rendered the language incompatible with a great deal of C code. It would probably have made C++ cleaner; it would certainly have made it much less used -- to the point that it would probably never have gotten beyond the "C with Classes" stage, and even that would just be a mostly-ignored footnote in the history of Bell labs.
The only real question along this line was between "value preserving" and "unsigned preserving" rules when promoting unsigned values "smaller" than int. The difference between the two arises when you have (for example) an unsigned short being added to an unsigned char.
Unsigned preserving rules say that you promote both to unsigned int. Value preserving rules say that you promote both values to int, if it can represent all values of the original type (e.g., the common case of 8-bit char, 16-bit short, and 32-bit int). On the other hand, if int and short are both 16 bits, so int cannot represent all values of unsigned short, then you promote the unsigned short to unsigned int (note that it's still considered a promotion, even though it only happens when it's really not a promotion -- i.e., the two types are the same size).
For better or worse, (and it's been argued both directions many times) the committee chose value preserving rather than unsigned preserving promotions. Note, however, that this deals with a conversion in the opposite direction: rather than from signed to unsigned, it's about whether you convert unsigned to signed.
Because the standard allows implicit conversion from signed to unsigned types.
Also (int)a + (unsigned)b results to unsigned - this is a c++ standard.
Why does C++ (and probably C as well) allow me to assign and int to a char without at least giving me a warning?
Is it okay to directly assign the value, like in
int i = 12;
char c = i;
i.e. do an implicit conversion, or shall I use a static_cast<>?
EDIT
Btw, I'm using gcc.
It was allowed in C before an explicit cast syntax was invented.
Then it remained a common practice, so C++ inherited it in order to not break a huge amount of code.
Actually most compilers issue a warning. If your doesn't, try change its settings.
C as originally designed wasn't really a strongly-typed language. The general philosophy was that you the programmer must know what you are doing, and the compiler is just there to help you do it. If you asked to convert between float, int, and unsigned char six or seven times in a single expression, well that must be what you wanted.
C++ sort of picked that up just so that all the existing C code wouldn't be too much of a bear to port. They are slowly trying to make it stronger with each revision though. Today just about any C++ compiler will give you a warning for that if you turn the warning levels up (which I highly recommend you do).
Otherwise, perhaps you should look into true strongly-typed languages, like Java and Ada. The equivalent Ada code would not compile without an explicit conversion.
Short answer: It's okay (by the c++ standard) in your example.
Slightly longer answer: It's not okay if char is signed and you are trying to assign it a value outside its range. It's okay if it is unsigned though (whether or not char is signed depends on your environment), then you'll get modulo arithmetics. Compilers usually have a switch to warn you because of the first case, but as long as you stay in the bounds it's perfectly fine (however, an explicit cast to make your intentions clear does not hurt).
char is the same as short. So, there should be a warning about possible lose of information. May be you have warnings switched off, try to configure somehow your compiler/ide.
Do you prefer to see something like t_byte* (with typedef unsigned char t_byte) or unsigned char* in code?
I'm leaning towards t_byte in my own libraries, but have never worked on a large project where this approach was taken, and am wondering about pitfalls.
If you're using C99 or newer, you should use stdint.h for this. uint8_t, in this case.
C++ didn't get this header until C++11, calling it cstdint. Old versions of Visual C++ didn't let you use C99's stdint.h in C++ code, but pretty much every other C++98 compiler did, so you may have that option even when using old compilers.
As with so many other things, Boost papers over this difference in boost/integer.hpp, providing things like uint8_t if your compiler's standard C++ library doesn't.
I suggest that if your compiler supports it use the C99 <stdint.h> header types such as uint8_t and int8_t.
If your compiler does not support it, create one. Here's an example for VC++, older versions of which do not have stdint.h. GCC does support stdint.h, and indeed most of C99
One problem with your suggestion is that the sign of char is implementation defined, so if you do create a type alias. you should at least be explicit about the sign. There is some merit in the idea since in C# for example a char is 16bit. But it has a byte type as well.
Additional note...
There was no problem with your suggestion, you did in fact specify unsigned.
I would also suggest that plain char is used if the data is in fact character data, i.e. is a representation of plain text such as you might display on a console. This will present fewer type agreement problems when using standard and third-party libraries. If on the other hand the data represents a non-character entity such as a bitmap, or if it is numeric 'small integer' data upon which you might perform arithmetic manipulation, or data that you will perform logical operations on, then one of the stdint.h types (or even a type defined from one of them) should be used.
I recently got caught out on a TI C54xx compiler where char is in fact 16bit, so that is why using stdint.h where possible, even if you use it to then define a byte type is preferable to assuming that unsigned char is a suitable alias.
I prefer for types to convey the meaning of the values stored in it. If I need a type describing a byte as it is on my machine, I very much prefer byte_t over unsigned char, which could mean just about anything. (I have been working in a code base that used either signed char or unsigned char to store UTF-8 strings.) The same goes for uint8_t. It could just be used as that: an 8bit unsigned integer.
With byte_t (as with any other aptly named type), there rarely ever is a need to look up what it is defined to (and if so, a good editor will take 3secs to look it up for you; maybe 10secs, if the code base is huge), and just by looking at it it's clear what's stored in objects of that type.
Personally I prefer boost::int8_t and boost::uint8_t.
If you don't want to use boost you could borrow boost\cstdint.hpp.
Another option is to use portable version of stdint.h (link from this answer).
Besides your awkward naming convention, I think that might be okay. Keep in mind boost does this for you, to help with cross-platform-ability:
#include <boost/integer.hpp>
typedef boost::uint8_t byte_t;
Note that usually type's are suffixed with _t, as in byte_t.
I prefer to use standard types, unsigned char, uint8_t, etc., so any programmer looking at the source does not have to refer back to headers to grok the code. The more typedefs you use the more time it takes for others to get to know your typing conventions. For structures, absolutely use typedefs, but for primitives use them sparingly.
In both Microsoft VC2005 and g++ compilers, the following results in an error:
On win32 VC2005: sizeof(wchar_t) is 2
wchar_t *foo = 0;
static_cast<unsigned short *>(foo);
Results in
error C2440: 'static_cast' : cannot convert from 'wchar_t *' to 'unsigned short *' ...
On Mac OS X or Linux g++: sizeof(wchar_t) is 4
wchar_t *foo = 0;
static_cast<unsigned int *>(foo);
Results in
error: invalid static_cast from type 'wchar_t*' to type 'unsigned int*'
Of course, I can always use reinterpret_cast. However, I would like to understand why it is deemed illegal by the compiler to static_cast to the appropriate integer type. I'm sure there is a good reason...
You cannot cast between unrelated pointer types. The size of the type pointed to is irrelevant. Consider the case where the types have different alignment requirements, allowing a cast like this could generate illegal code on some processesors. It is also possible for pointers to different types to have differrent sizes. This could result in the pointer you obtain being invalid and or pointing at an entirely different location. Reinterpret_cast is one of the escape hatches you hacve if you know for your program compiler arch and os you can get away with it.
As with char, the signedness of wchar_t is not defined by the standard. Put this together with the possibility of non-2's complement integers, and for for a wchar_t value c,
*reinterpret_cast<unsigned short *>(&c)
may not equal:
static_cast<unsigned short>(c)
In the second case, on implementations where wchar_t is a sign+magnitude or 1's complement type, any negative value of c is converted to unsigned using modulo 2^N, which changes the bits. In the former case the bit pattern is picked up and used as-is (if it works at all).
Now, if the results are different, then there's no realistic way for the implementation to provide a static_cast between the pointer types. What could it do, set a flag on the unsigned short* pointer, saying "by the way, when you load from this, you have to also do a sign conversion", and then check this flag on all unsigned short loads?
That's why it's not, in general, safe to cast between pointers to distinct integer types, and I believe this unsafety is why there is no conversion via static_cast between them.
If the type you're casting to happens to be the so-called "underlying type" of wchar_t, then the resulting code would almost certainly be OK for the implementation, but would not be portable. So the standard doesn't offer a special case allowing you a static_cast just for that type, presumably because it would conceal errors in portable code. If you know reinterpret_cast is safe, then you can just use it. Admittedly, it would be nice to have a straightforward way of asserting at compile time that it is safe, but as far as the standard is concerned you should design around it, since the implementation is not required even to dereference a reinterpret_casted pointer without crashing.
By spec using of static_cast restricted by narrowable types, eg: std::ostream& to std::ofstream&. In fact wchar_t is just extension but widely used.
Your case (if you really need it) should be fixed by reinterpret_cast
By the way MSVC++ has an option - either treat wchar_t as macro (short) or as stand-alone datatype.
Pointers are not magic "no limitations, anything goes" tools.
They are, by the language specification actually very constrained. They do not allow you to bypass the type system or the rest of the C++ language, which is what you're trying to do.
You are trying to tell the compiler to "pretend that the wchar_t you stored at this address earlier is actually an int. Now read it."
That does not make sense. The object stored at that address is a wchar_t, and nothing else. You are working in a statically typed language, which means that every object has one, and juts one, type.
If you're willing to wander into implementation-defined behavior-land, you can use a reinterpret_cast to tell the compiler to just pretend it's ok, and interpret the result as it sees fit. But then the result is not specified by the standard, but by the implementation.
Without that cast, the operation is meaningless. A wchar_t is not an int or a short.