Disable default numeric types in compiler

Disable default numeric types in compiler - c++

When creating custom typedefs for integers, is it possible for compiler to warn when you when using a default numeric type?
For example,
typedef int_fast32_t kint;
int_fast32_t test=0;//Would be ok
kint test=0; //Would be ok
int test=0; //Would throw a warning or error
We're converting a large project and the default int size on platform is 32767 which is causing some issues. This warning would warn a user to not use ints in the code.
If possible, it would be great if this would work on GCC and VC++2012.

I'm reasonably sure gcc has no such option, and I'd be surprised if VC did.
I suggest writing a program that detects references to predefined types in source code, and invoking that tool automatically as part of your build process. It would probably suffice to search for certain keywords.
Be sure you limit this to your own source files; predefined and third-party headers are likely to make extensive use of predefined types.
But I wouldn't make the prohibition absolute. There are a number of standard library functions that use predefined types. For example, in c = getchar() it makes no sense to declare c as anything other than int. And there's no problem for something like for (int i = 0; i <= 100; i ++) ...
Ideally, the goal should be to use predefined types properly. The language has never guaranteed that an int can exceed 32767. (But "proper" use is difficult or impossible to verify automatically.)

I'd approach this by doing a replace-all first and then documenting this thoroughly.
You can use a preprocessor directive:
#define int use kint instead
Note that technically this is undefined behavior and you'll run into trouble if you do this definition before including third-party headers.

I would recommend to make bulk replacement int -> old_int_t at the very beginning of your porting. This way you can continue modifying your code without facing major restrictions and at the same time have access to all places that are not yet updated.
Eventually, at the end of your work, all occurencies of old_int_t should go away.

Even if one could somehow undefine the keyword int, that would do nothing to prevent usage of that type, since there are many cases where the compiler will end up using that type. Beyond the obvious cases of integer literals, there are some more subtle cases involving integer promotion. For example, if int happens to be 64 bits, operations between two variables of type uint32_t will be performed using type int rather than uint32_t. As nice as it would be to be able to specify that some variables represent numbers (which should be eagerly promoted when practical) while others represent members of a wrapping algebraic ring (which should not be promoted), I know of no facility to do such a thing. Consequently, int is unavoidable.

Related

Purpose of using UINT64_C?

I found this line in boost source:
const boost::uint64_t m = UINT64_C(0xc6a4a7935bd1e995);
I wonder what is the purpose of using a MACRO here?
All this one does is to add ULL to the constant provided.
I assume it may be used to make it harder for people to make mistake of typing UL instead of ULL, but I wonder if there is any other reason to use it.

If you look at boost/cstdint.h, you can see that the definition of the UINT64_C macro is different on different platforms and compilers.
On some platforms it's defined as value##uL, on others it's value##uLL, and on yet others it's value##ui64. It all depends on the size of unsigned long and unsigned long long on that platform or the presence of compiler-specific extensions.
I don't think using UINT64_C is actually necessary in that context, since the literal 0xc6a4a7935bd1e995 would already be interpreted as a 64-bit unsigned integer. It is necessary in some other context though. For example, here the literal 0x00000000ffffffff would be interpreted as a 32-bit unsigned integer if it weren't specifically specified as a 64-bit unsigned integer by using UINT64_C (though I think it would be promoted to uint64_t for the bitwise AND operation).
In any case, explicitly declaring the size of literals where it matters serves a valuable role in code-clarity. Sometimes, even if an operation is perfectly well-defined by the language, it can be difficult for a human programmer to tell what types are involved. Saying it explicitly can make code easier to reason about, even if it doesn't directly alter the behavior of the program.

Which to use? int32_t vs uint32_t [duplicate]

When is it appropriate to use an unsigned variable over a signed one? What about in a for loop?
I hear a lot of opinions about this and I wanted to see if there was anything resembling a consensus.
for (unsigned int i = 0; i < someThing.length(); i++) {
SomeThing var = someThing.at(i);
// You get the idea.
}
I know Java doesn't have unsigned values, and that must have been a concious decision on Sun Microsystems' part.

I was glad to find a good conversation on this subject, as I hadn't really given it much thought before.
In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case).
unsigned starts to make more sense when:
You're going to do bitwise things like masks, or
You're desperate to to take advantage of the sign bit for that extra positive range .
Personally, I like signed because I don't trust myself to stay consistent and avoid mixing the two types (like the article warns against).

In your example above, when 'i' will always be positive and a higher range would be beneficial, unsigned would be useful. Like if you're using 'declare' statements, such as:
#declare BIT1 (unsigned int 1)
#declare BIT32 (unsigned int reallybignumber)
Especially when these values will never change.
However, if you're doing an accounting program where the people are irresponsible with their money and are constantly in the red, you will most definitely want to use 'signed'.
I do agree with saint though that a good rule of thumb is to use signed, which C actually defaults to, so you're covered.

I would think that if your business case dictates that a negative number is invalid, you would want to have an error shown or thrown.
With that in mind, I only just recently found out about unsigned integers while working on a project processing data in a binary file and storing the data into a database. I was purposely "corrupting" the binary data, and ended up getting negative values instead of an expected error. I found that even though the value converted, the value was not valid for my business case.
My program did not error, and I ended up getting wrong data into the database. It would have been better if I had used uint and had the program fail.

C and C++ compilers will generate a warning when you compare signed and unsigned types; in your example code, you couldn't make your loop variable unsigned and have the compiler generate code without warnings (assuming said warnings were turned on).
Naturally, you're compiling with warnings turned all the way up, right?
And, have you considered compiling with "treat warnings as errors" to take it that one step further?
The downside with using signed numbers is that there's a temptation to overload them so that, for example, the values 0->n are the menu selection, and -1 means nothing's selected - rather than creating a class that has two variables, one to indicate if something is selected and another to store what that selection is. Before you know it, you're testing for negative one all over the place and the compiler is complaining about how you're wanting to compare the menu selection against the number of menu selections you have - but that's dangerous because they're different types. So don't do that.

size_t is often a good choice for this, or size_type if you're using an STL class.

Forcing types to a specific size

I've been learning C++ and one thing that I'm not really comfortable with is the fact that datatype sizes are not consistent. Depending on what system something is deployed on an int could be 16 bits or 32 bits, etc.
So I was thinking it might be a good idea to make my own header file with data types like byte, word, etc. that are defined to be a specific size and will maintain that size on any platform.
Two questions. First is this a good idea? Or is it going to create other problems I'm not aware of? Second, how do you define a type as being, say, 8 bits? I can't just say #define BYTE char, cause char would vary across platforms.

Fortunately, other people have noticed this same problem. In C99 and C++11 (so set your compiler to compatibility with one of those two modes, there should be a switch in your compiler settings), they added the header stdint.h (for C) and cstdint (for C++). If you #include <cstdint>, you get the types int8_t, int16_t, int32_t, int64_t, and the same prefixed with a u for unsigned versions. If your platform supports those types, they will be defined in the header, along with several others.
If your compiler does not yet support that standard (or you are forced by reasons out of your control to remain on C++03), then there is also Boost.
However, you should only use this if you care exactly about the size of the type. int and unsigned are fine for throw-away variables in most cases. size_t should be used for indexing std::vector, etc.

First you need to figure out if you really care what sizes things are. If you are using an int to count the number of lines in a file, do you really care if it's 32-bit or 64? You need BYTE, WORD, etc if you are working with packed binary data, but generally not for any other reason. So you may be worrying over something that doesn't really matter.

Better yet, use the already defined stuff in stdint.h. See here for more details. Similar question here.
Example:
int32_t is always 32 bits.

Many libraries have their own .h with a lots of typedef to have constant size types. This is useful when making portable code, and avoid relying on the headers of the platform you are currently working with.

If you only want to make sure the builtin data types have a minimum size you can use std::numeric_limits in the header to check.
std::numeric_limits<int>::digits
will give you, for example, the number of bits of an int without the sign bit. And
std::numeric_limits<int>::max()
will give you the max value.

Is there any tool for C++ which will check for common unspecified behavior?

Often one makes assumptions about a particular platform one is coding on, for example that signed integers use two's complement storage, or that (0xFFFFFFFF == -1), or things of that nature.
Does a tool exist which can check a codebase for the most common violations of these kinds of things (for those of us who want portable code but don't have strange non-two's-complement machines)?
(My examples above are specific to signed integers, but I'm interested in other errors (such as alignment or byte order) as well)

There are various levels of compiler warnings that you may wish to have switched on, and you can treat warnings as errors.
If there are other assumptions you know you make at various points in the code you can assert them. If you can do that with static asserts you will get failure at compile time.

I know that CLang is very actively developing a static analyzer (as a library).
The goal is to catch errors at analysis time, however the exact extent of the errors caught is not that clear to me yet. The library is called "Checker" and T. Kremenek is the responsible for it, you can ask about it on clang-dev mailing list.
I don't have the impression that there is any kind of reference about the checks being performed, and I don't think it's mature enough yet for production tool (given the rate of changes going on) but it may be worth a look.

Maybe a static code analysis tool? I used one a few years ago and it reported errors like this. It was not perfect and still limited but maybe the tools are better now?
update:
Maybe one of these:
What open source C++ static analysis tools are available?
update2:
I tried FlexeLint on your example (you can try it online using the Do-It-Yourself Example on http://www.gimpel-online.com/OnlineTesting.html) and it complains about it but perhaps not in a way you are looking for:
5 int i = -1;
6 if (i == 0xffffffff)
diy64.cpp 6 Warning 650: Constant '4294967295' out of range for operator '=='
diy64.cpp 6 Info 737: Loss of sign in promotion from int to unsigned int
diy64.cpp 6 Info 774: Boolean within 'if' always evaluates to False [Reference: file diy64.cpp: lines 5, 6]

Very interesting question. I think it would be quite a challenge to write a tool to flag these usefully, because so much depends on the programmer's intent/assumptions
For example, it would be easy to recognize a construct like:
x &= -2; // round down to an even number
as being dependent on twos-complement representation, but what if the mask is a variable instead of a constant "-2"?
Yes, you could take it a step further and warn of any use of a signed int with bitwise &, any assignment of a negative constant to an unsigned int, and any assignment of a signed int to an unsigned int, etc., but I think that would lead to an awful lot of false positives.
[ sorry, not really an answer, but too long for a comment ]

Why do C programmers use typedefs to rename basic types?

So I'm far from an expert on C, but something's been bugging me about code I've been reading for a long time: can someone explain to me why C(++) programmers use typedefs to rename simple types? I understand why you would use them for structs, but what exactly is the reason for declarations I see like
typedef unsigned char uch;
typedef uch UBYTE;
typedef unsigned long ulg;
typedef unsigned int u32;
typedef signed short s16;
Is there some advantage to this that isn't clear to me (a programmer whose experience begins with Java and hasn't ventured far outside of strictly type-safe languages)? Because I can't think of any reason for it--it looks like it would just make the code less readable for people unfamiliar with the project.
Feel free to treat me like a C newbie, I honestly know very little about it and it's likely there are things I've misunderstood from the outset. ;)

Renaming types without changing their exposed semantics/characteristics doesn't make much sense. In your example
typedef unsigned char uch;
typedef unsigned long ulg;
belong to that category. I don't see the point, aside from making a shorter name.
But these ones
typedef uch UBYTE;
typedef unsigned int u32;
typedef signed short s16;
are a completely different story. For example, s16 stands for "signed 16 bit type". This type is not necessarily signed short. Which specific type will hide behind s16 is platform-dependent. Programmers introduce this extra level of naming indirection to simplify the support for multiple platforms. If on some other platform signed 16 bit type happens to be signed int, the programmer will only have to change one typedef definition. UBYTE apparently stands for an unsigned machine byte type, which is not necessarily unsigned char.
It's worth noting that the C99 specification already provides a standard nomenclature for integral types of specific width, like int16_t, uint32_t and so on. It probably makes more sense to stick with this standard naming convention on platforms that don't support C99.

This allows for portability. For example you need an unsigned 32-bit integer type. Which standard type is that? You don't know - it's implementation defined. That's why you typedef a separate type to be 32-bit unsigned integer and use the new type in your code. When you need to compile on another C implementation you just change the typedefs.

Sometimes it is used to reduce an unwieldy thing like volatile unsigned long to something a little more compact such as vuint32_t.
Other times it is to help with portability since types like int are not always the same on each platform. By using a typedef you can set the storage class you are interested in to the platform's closest match without changing all the source code.

There are many reasons to it. What I think is:
Typename becomes shorter and thus code also smaller and more readable.
Aliasing effect for longer structure names.
Convention used in particular team/companies/style.
Porting - Have same name across all OS and machine. Its native data-structure might be slightly different.

Following is a quote from The C Programming Language (K&R)
Besides purely aesthetic issues, there are two main reasons for using
typedefs.
First- to parameterize a program
The first is to parameterize a program against portability problems.
If typedefs are used for data types
that may be machine-dependent, only
the typedefs need change when the
program is moved.
One common situation is to use typedef names for various integer
quantities, then make an appropriate
set of choices of short, int, and long
for each host machine. Types like
size_t and ptrdiff_t from the standard library are examples.
The italicized portions tells us that programmers typedef basic type for portability. If I want to make sure my program works on different platforms, using different compiler, I will try to ensure that its portability in every possible way and typedef is one of them.
When I started programming using Turbo C compiler on Windows platform, it gave us the size of int 2. When I moved to Linux platform and GCC complier, the size I get is 4. If I had developed a program using Turbo C which relied on the assertion that sizeof( int ) is always two, it would have not ported properly to my new platform.
Hope it helps.
Following quote from K&R is not related to your query but I have posted it too for the sake of completion.
Second- to provide better documentation
The second purpose of typedefs is to provide better documentation for a
program - a type called Treeptr may be easier to understand than one declared only as a
pointer to a complicated structure.

Most of these patterns are bad practices that come from reading and copying existing bad code. Often they reflect misunderstandings about what C does or does not require.
Is akin to #define BEGIN { except it saves some typing instead of making for more.
Is akin to #define FALSE 0. If your idea of "byte" is the smallest addressable unit, char is a byte by definition. If your idea of "byte" is an octet, then either char is the octet type, or your machine has no octet type.
Is really ugly shorthand for people who can't touch type...
Is a mistake. It should be typedef uint32_t u32; or better yet, uint32_t should just be used directly.
Is the same as 4. Replace uint32_t with int16_t.
Please put a "considered harmful" stamp on them all. typedef should be used when you really need to create a new type whose definition could change over the life cycle of your code or when the code is ported to different hardware, not because you think C would be "prettier" with different type names.

We use it to make it Project/platform specific, everything has a common naming convention
pname_int32, pname_uint32, pname_uint8 -- pname is project/platform/module name
And some #defines
pname_malloc, pname_strlen
It easier to read and shortens long datatypes like unsigned char to pname_uint8 also making it a convention across all modules.
When porting you need to just modify the single file , thus making porting easy.

To cut the long story short,
you might want to do that to make your code portable (with less effort/editing).
This way you don't depend to 'int', instead you are using INTEGER that can be anything you want.

All [|u]intN_t types, where N=8|16|32|64 and so forth, are defined per architecture in this exact manner. This is a direct consequence of the fact that the standard does not mandate that char,int,float, etc. have exactly N bits - that would be insane. Instead, the standard defines minimum and maximum values of each type as guarantees to the programmer, and in various architectures types may well exceed those boundaries. It is not an uncommon sight.
The typedefs in your post are used to defined types of a certain length, in a specific architecture. It's probably not the best choice of naming; u32 and s16 are a bit too short, in my opinion. Also, it's kind of a bad thing to expose the names ulg and uch, one could prefix them with an application specific string since they obviously will not be exposed.
Hope this helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js