Using a flag number within unsigned integers

Using a flag number within unsigned integers - c++

Many times people will combine a boolean check by just re-using an int variable they already have and checking for -1 if something exists or not.
However, what if someone wants to use unsigned integers but still wants to use this method and also where 0 actually has a different meaning besides existance.
Is there a way to have a data range be -1 to 4,294,967,294?
The obvious choice here is to just use a bool that detects what you are after but it is my understanding that a bool is a byte, and can really add to the storage size if you have an array of structs. This is why I wondered if there was a way to get the most useful numbers you can (postivies) all while leaving just one number to act as a flag.
Infact, if it is possible to do something like shifting the data range of a data type, it would seem like shifting it to something like -10 to 4,294,967,285 would allow you to have 10 boolean flags at no additional cost (bits).
The obvious hacky method here is just to add whatever number to what your storing and remember to account for it later on, but I wanted to keep it a bit more readable (I guess if thats the case I shouldnt even be using -1, but meh).

If you simply want to pick a value which can not exist in your interpretation of the variable and to use it to indicate an exception or error value, why not to simply do it? You can take such a value, define it as a macro and use it. For example if you are sure that your variable never reaches the max limit, put:
#define MY_FUN_ERROR_VALUE (UINT_MAX)
then you can use it as:
unsigned r = my_function_maybe_returning_error();
if (r == MY_FUN_ERROR_VALUE) {handle error}
you shall also ensure that my_function_maybe_returning_error does not return MY_FUN_ERROR_VALUE in normal conditions when actually no error happens. For this you may use an assert:
unsigned my_function_maybe_returning_error() {
...
// branch going to return normal (not error) value r
assert(r != MY_FUN_ERROR_VALUE);
return(r);
}
I do not see anything wrong on this.

You just asked how to use a value that can be 0 or something greater than 0 to hold the three states: whatever 0 means, something greater than 0, and does not exist. So no, (by the pigeonhole principle I guess) it's not possible.
Nor should it be. Overloading a variable is bad practice unless you're down to your last 3 bytes left of RAM, which you almost certainly aren't. So yes, please use another variable with a correct name and clear purpose.

Related

Which to use? int32_t vs uint32_t [duplicate]

When is it appropriate to use an unsigned variable over a signed one? What about in a for loop?
I hear a lot of opinions about this and I wanted to see if there was anything resembling a consensus.
for (unsigned int i = 0; i < someThing.length(); i++) {
SomeThing var = someThing.at(i);
// You get the idea.
}
I know Java doesn't have unsigned values, and that must have been a concious decision on Sun Microsystems' part.

I was glad to find a good conversation on this subject, as I hadn't really given it much thought before.
In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case).
unsigned starts to make more sense when:
You're going to do bitwise things like masks, or
You're desperate to to take advantage of the sign bit for that extra positive range .
Personally, I like signed because I don't trust myself to stay consistent and avoid mixing the two types (like the article warns against).

In your example above, when 'i' will always be positive and a higher range would be beneficial, unsigned would be useful. Like if you're using 'declare' statements, such as:
#declare BIT1 (unsigned int 1)
#declare BIT32 (unsigned int reallybignumber)
Especially when these values will never change.
However, if you're doing an accounting program where the people are irresponsible with their money and are constantly in the red, you will most definitely want to use 'signed'.
I do agree with saint though that a good rule of thumb is to use signed, which C actually defaults to, so you're covered.

I would think that if your business case dictates that a negative number is invalid, you would want to have an error shown or thrown.
With that in mind, I only just recently found out about unsigned integers while working on a project processing data in a binary file and storing the data into a database. I was purposely "corrupting" the binary data, and ended up getting negative values instead of an expected error. I found that even though the value converted, the value was not valid for my business case.
My program did not error, and I ended up getting wrong data into the database. It would have been better if I had used uint and had the program fail.

C and C++ compilers will generate a warning when you compare signed and unsigned types; in your example code, you couldn't make your loop variable unsigned and have the compiler generate code without warnings (assuming said warnings were turned on).
Naturally, you're compiling with warnings turned all the way up, right?
And, have you considered compiling with "treat warnings as errors" to take it that one step further?
The downside with using signed numbers is that there's a temptation to overload them so that, for example, the values 0->n are the menu selection, and -1 means nothing's selected - rather than creating a class that has two variables, one to indicate if something is selected and another to store what that selection is. Before you know it, you're testing for negative one all over the place and the compiler is complaining about how you're wanting to compare the menu selection against the number of menu selections you have - but that's dangerous because they're different types. So don't do that.

size_t is often a good choice for this, or size_type if you're using an STL class.

Since I'm not supposed to have magic numbers in my code, should I #define ZERO 0

I want to optimize readability of my code. I don't want people reading my code to be confused when they see 0. What 0 means could be extremely ambiguous. For instance, if I had a statement like if (myVector.size() > 0), that might be confusing because, after all, what is zero supposed to mean in this context? I'm wondering if I should put
#define ZERO 0
at the top of my code.

You miss the point here. Magic is not about using a number - it's about being able to understand why this particular number is used. When you compare size() with 0, your intent is pretty obvious. Have you compared it with any other number - like 42, for example - you'd have to explain why this particular number has been chosen.
As for this particular change (0 => ZERO), it'll just introduce tautology in your code.

0 used in comparisons or initialization of counters, indexes, accumulators, arithmetic expressions... is perfectly understandable as such.
It makes sense to create an identifier in situations where the exact value is irrelevant and could be changed without doing any harm, and having a interpretation specific to the context, like FREE_CLUSTER or NO_ERROR.

No, don't do that. It's a bad choice and utterly unnecessary. In this specific case, your code is pretty obvious:
if (myVector.size() > 0)
No one can get confused with that. But if it was something like:
if (myVector.size() > 23*66)
Then that would be confusing, because someone reading your code would be left wondering what the hell 23*66 means. This is a magic number. In a context like this, a #define could help, accompanied by a comment, for example:
/* We have at most 23 files with 66 records each */
#define MAX_RECORDS_NO 23*66
And then you'd have:
if (myVector.size() > MAX_RECORDS_NO)
That is the kind of situation where you can and surely want to avoid magic numbers. Note that the #define is a description of what that number means, if I were to follow your logic, I would have chosen this instead:
#define TWENTY_THREE_TIMES_SIXTY_SIX 23*66
Now, that's not very helpful, is it? #define ZERO 0 totally misses the point: it adds nothing to someone reading your code, it is excessively verbose, and it's useless because it is not very likely that the value of 0 will change. What if all of a sudden you wanted to compare myVector.size() to another number? You'd have to replace the constant name because defining ZERO to something else other than 0 is asking for trouble. So, after all, you gained nothing.

I'm wondering if I should put
#define ZERO 0
at the top of my code.
As long you're asking just for the 0 magic number: NO, Don't do this!
You should simply use
if (myVector.size()) // ...
which reproduces behavior of your statement equivalently.
It is guaranteed in c and c++ language, that 0 evaluates to false and any other value evaluates to true for evaluation of conditional expressions.

You don't need to do so.
The original code is already straightforward, as you can see.
Also, just think about it. If you define zero as 0, are you going to change zero to some other value other than 0? The answer is no, of course.

Default value of an integer?

My program requires several floats to be set to a default number when the program launches. As the program runs these integers will be set to their true values. These true values however can be any real number. My program will be consistently be checking these numbers to see if their value has been changed from the default.
For example lets say I have integers A,B,C. All these integers will be set to a default value at the start (lets say -1). Then as the program progresses, lets say A and B are set to 3 and 2 respectfully. Since C is still at the default value, the program can conclude than C hasn't been assigned a non-default value yet.
The problem arises when trying to find a unique default value. Since the values of the numbers can be set to anything, if the value its set to is identical to the default value, my program won't know if a float still has the default value or its true value is just identical to the default value.
I considered NULL as a default value, but NULL is equal to 0 in C++, leading to the same problem!
I could create a whole object consisting of an bool and a float as members, where the bool indicates whether the float has been assigned its own value yet or not. This however seems like an overkill. Is there a default value I can set my floats to such that the value isn't identical to any other value? (Examples include infinity or i)
I am asking for C/C++ solutions.

I could create a whole object consisting of an bool and a integer as
members, where the bool indicates whether the number has been assigned
its own value yet or not. This however seems like an overkill.
What you described is called a "nullable type" in .NET. A C++ implementation is boost::optional:
boost::optional<int> A;
if (A)
do_something(*A);

On a two's complement machine there's an integer value that is less useful than the others: INT_MIN. You can't make a valid positive value by negating it. Since it's the least useful value in the integer range, it makes a good choice for a marker value. It also has an easily recognizable hex value, 0x80000000.

There is no bit pattern you can assign to an int that isn't an actual int. You need to keep separate flags if you really have no integer values that are out of bounds.

If the domain of valid int values is unlimited, the only choice is a management bit indicating whether it is assigned or not.
But, are you sure MAX_INT is a desired choice?

There is no way to guarantee that a value you assign an int to is not going to be equal to another random int. The only way to assure that what you want to happen occurs, is to create a separate bool to account for changes.

No, you will have to create your own data type which contains the information about whether it has been assigned or not.

If as you say, no integer value is off limits, then you cannot assign a default "uninitialised" value. Just use a struct with an int and a bool as you suggest in your question.

I could create a whole object consisting of an bool and a integer as
members, where the bool indicates whether the number has been assigned
its own value yet or not. This however seems like an overkill.
My first guess would be to effectively use a flag and mark each variable. But this is not your only choice of course.
You can use pointers (which can be NULL) and assign dynamically the memory. Not very convenient.
You can pick a custom value which is almost never used. You can then define this value to be the default value. Ofc, some time, you will need to assign this value to your floats, but this case won't happen often and you just need to keep track of this variables. Given the occurrence of such case, a simple linked list should do.

Magic Numbers In Arrays? - C++

I'm a fairly new programmer, and I apologize if this information is easily available out there, I just haven't been able to find it yet.
Here's my question:
Is is considered magic numbers when you use a literal number to access a specific element of an array?
For example:
arrayOfNumbers[6] // Is six a magic number in this case?
I ask this question because one of my professors is adamant that all literal numbers in a program are magic numbers. It would be nice for me just to access an element of an array using a real number, instead of using a named constant for each element.
Thanks!

That really depends on the context. If you have code like this:
arr[0] = "Long";
arr[1] = "sentence";
arr[2] = "as";
arr[3] = "array.";
...then 0..3 are not considered magic numbers. However, if you have:
int doStuff()
{
return my_global_array[6];
}
...then 6 is definitively a magic number.

It's pretty magic.
I mean, why are you accessing the 6th element? What's are the semantics that should be applied to that number? As it stands all we know is "the 6th (zero-based) number". If we knew the declaration of arrayOfNumbers we would further know its type (e.g. an int or a double).
But if you said:
arrayOfNumbers[kDistanceToSaturn];
...now it has much more meaning to someone reading the code.
In general one iterates over an array, performing some operation on each element, because one doesn't know how long the array is and you can't just access it in a hardcoded manner.
However, sometimes array elements have specific meanings, for example, in graphics programming. Sometimes an array is always the same size because the data demands it (e.g. certain transform matrices). In these cases it may or may not be okay to access the specific element by number: domain experts will know what you're doing, but generalists probably won't. Giving the magic index number a name makes it more obvious to those who have to maintain your code, and helps you to prevent typing the wrong one accidentally.
In my example above I assumed your array holds distances from the sun to a planet. The sun would be the zeroth element, thus arrayOfNumbers[kDistanceToSun] = 0. Then as you increment, each element contains the distance to the next farthest planet: mercury, venus, etc. This is much more readable than just typing the number of the planet you want. In this case the array is of a fixed size because there are a fixed number of planets (well, except the whole Pluto debacle).
The other problem is that "arrayOfNumbers" tells us nothing about the contents of the array. We already know its an array of numbers because we saw the declaration somewhere where you said int arrayOfNumers[12345]; or however you declared it. Instead, something like:
int distanceToPlanetsFromSol[kNumberOfPlanets];
...gives us a much better idea of what the data actually is and what its semantics are. One of your goals as a programmer should be to write code that is self-documenting in this manner.
And then we can argue elsewhere if kNumberOfPlanets should be 8 or 9. :)

You should ask yourself why are you accessing that particular position. In this case, I assume that if you are doing arrayOfNumbers[6] the sixth position has some special meaning. If you think what's that meaning, you probably realize that it's a magic number hiding that.

another way to look at it:
What if after some chance the program needs to access 7th element instead of 6th? HOw would you or a maintainer know that? If for example if the 6th entry is the count of trees in CA it would be a good thing to put
#define CA_STATE_ENTRY 6
Then if now the table is reordered somebody can see that they need to change this to 9 (say). BTW I am not saying this is the best way to maintain an array for tree counts by state - it probably isnt.
Likewise, if later people want to change the program to deal with trees in oregon, then they know to replace
trees[CA_STATE_ENTRY]
with
trees[OR_STATE_ENTRY]
The point is
trees[6]
is not self-documenting
Of course for c++ it should be an enum not a #define

You'd have to provide more context for a meaningful answer. Not all literal numbers are magic, but many are. In a case like that there is no way at all to tell for sure, though most cases I can think of off-hand with an explicit array index >>1 probably qualify as magic.

Not all literals in a program really qualify as "magic numbers" -- but this one certainly seems to. The 6 gives us no clue of why you're accessing that particular element of the array.
To not be a magic number, you need its meaning to be quite clear even on first examination (or at least minimal examination) why that value is being used. Just for example, a lot of code will do things like: &x[0]. In this case, it's typically pretty clear that the '0' really just means "the beginning of the array."

If you need to access a particular element of the array, chances are you're doing it wrong.
You should almost always be iterating over the entire array.

It's only not a magic number if your program is doing something very special involving the number six specifically. Could you provide some context?

That's the problem with professors, they're often too academic. In theory he's right, as usual, but usually magic numbers are used in a stricter context, when the number is embedded in a data stream, allowing you to detect certain properties of the stream (like the signature header of a file type for instance).
See also this Wikipedia entry.

Usually not all constant values in software are called magic numbers.
A java class files always starts with the hex value 0xcafebabe a windows .exe
file with MZ 0x4d, 0x5a , this allows you quickly (but not for sure) to identify
the content of a binary file.

In a MISRA compliant system, all values except 0 and 1 are considered magic numbers. My opinion has always been if the constant value is obvious or likely won't change then leave it as a number. If in doubt create a unique constant since long term maintenance will be easier.

Rationale behind return 0 as default value in C/C++

Is there a reason why zero is used as a "default" function return value? I noticed that several functions from the stdlib and almost everywhere else, when not returning a proper number (e.g pow(), strcpy()) or an error (negative numbers), simply return zero.
I just became curious after seeing several tests performed with negated logic. Very confusing.
Why not return 1, or 0xff, or any positive number for that matter?

The rationale is that you want to distinguish the set of all the possible (negative) return values corresponding to different errors from the only situation in which all went OK. The simplest, most concise and most C-ish way to pursue such distinction is a logical test, and since in C all integers are "true" except for zero, you want to return zero to mean "the only situation", i.e. you want zero as the "good" value.
The same line of reasoning applies to the return values of Unix programs, but indeed in the tests within Unix shell scripts the logic is inverted: a return value of 0 means "true" (for example, look at the return value of /bin/true).

Originally, C did not have "void". If a function didn't return anything, you just left the return type in the declaration blank. But that meant, that it returned an int.
So, everything returned something, even if it didn't mean anything. And, if you didn't specifically provide a return value, whatever value happened to be in the register the compiler used to return values became the function's return value.
// Perfectly good K&R C code.
NoReturn()
{
// do stuff;
return;
}
int unknownValue = NoReturn();
People took to clearing that to zero to avoid problems.

In shell scripting, 0 represents true, where another number typically represents an error code. Returning 0 from a main application means everything went successfully. The same logic may be being applied to the library code.
It could also just be that they return nothing, which is interpreted as 0. (Essentially the same concept.)

Another (minor) reason has to do with machine-level speed and code size.
In most processors, any operation that results in a zero automatically sets the zero flag, and there is a very cheap operation to jump against the zero flag.
In other words, if the last machine operation (e.g., PUSH) got us to zero, all we need is a jump-on-zero or a jump-not-zero.
On the other hand, if we test things against some other value, then we have to move that value into the register, run a compare operation that essentially subtracts the two numbers, and equality results in our zero.

Because Bash and most other UNIX shell environments regard 0 as success, and -x as a system error, and x as a user-defined error.

There's probably a bunch of forgotten history dating back to the days when everything was written in asm. In general it is much easier to test for zero than for other specific values.

I may be wrong about this, but I think that it's mainly for historical reasons (hysterical raisins?). I believe that K&R C (pre-ANSI) didn't have a void type, so the logical way to write a function that didn't return anything interesting was to have it return 0.
Surely somebody here will correct me if I'm wrong... :)

My understanding is that it was related to the behaviour of system calls.
Consider the open() system call; if it is successful, it returns a non-negative integer, which is the file descriptor that was created. However, down at the assembler level (where there's a special, non-C instruction that traps into the kernel), when an error is returned, it is returned as a negative value. When it detects an error return, the C code wrapper around the system call stores the negated value into errno (so errno has a positive value), and the function returns -1.
For some other system calls, the negative return code at the assembler level is still negated and placed into errno and -1 is returned. However, these system calls have no special value to return, so zero was chosen to indicate success. Clearly, there is a large variety of system calls, but most manage to fit these conventions. For example, stat() and its relatives return a structure, but a pointer to that structure is passed as an input parameter, and the return value is a 0 or -1 status. Even signal() manages it; -1 was SIG_DFL and 0 was SIG_IGN, and other values were function pointers. There are a few system calls with no error return - getpid(), getuid() and so on.
This zero-indicates-success mechanism was then emulated by other functions which were not actually system calls.

Conventionally, a return code of 0 specifies that your program has ended normally and all is well. (You can remember this as "zero errors", although for technical reasons, you cannot use the number of errors your program found as the return code. See Style.) A return code other than 0 indicates that some sort of error has occurred. If your code terminates when it encounters an error, use exit, and specify a non-zero return code. Source

Because 0 is false and null in C/C++ and you can make handy short cuts when that happens.

It is because when used from a UNIX shell a command that returns 0 indicates success.
Any other value indicates a failure.
As Paul Betts indicates positive and negative values delimitate where the error probably originated, but this is only a convention and not an absolute. A user application may return a negative value without any bad consequence (other than it is indicating to the shell that the application failed).

Besides all the fine points made by previous posters, it also cleans up the code considerably when a function returns 0 on success.
Consider:
if ( somefunc() ) {
// handle error
}
is much cleaner than:
if ( !somefunc() ) {
// handle error
}
or:
if ( somefunc() == somevalue ) {
// handle error
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js