Why index casting into an enum is allowed but erroneous

Why index casting into an enum is allowed but erroneous - c++

I have learnt that enums are data types as opposed to data structures. I have also learnt that usually they are nominal in nature rather than ordinal. Consequentially, it does not make sense to iterate through an enum, or obtain the enum's constant value like you would an array, e.g. week[3].
However, I have come across instances where an index is used to obtain the value from an enum:
#include <iostream>
enum week {Mon=5, Tues, Wed};
int main()
{
week day = (week)0;
std::cout << day << "\n"; // outputs 0
day = (week)13;
std::cout << day << "\n"; // outputs 13
return 0;
}
How is this working?
I assumed (week)13 would not work, given there is nothing at this index. Is it that the cast is failing and falling back to the type to be cast?
As a side note: I believe some confusion with this style of code in C/C++ may occur due to other languages (C#) handling this case differently.
Edit:
The linked solutions don't answer the question I'm asking.
1 is about comparing integers to enums -- in this case I'm asking why casting an int to an enum gives a certain result.[2] (Assigning an int value to enum and vice versa in C++) mentions "Old enums (unscoped enums) can be converted to integer but the other way is not possible (implicitly). ", but by the accounts of my code, it does seem possible. Lastly 3 has nothing to do with this question: I'm not talking about assignment of enums, rather the casting of them via an integer.

The rule, roughly, is that any value hat fits in the bits needed to represent all of the enumerators can be held in an enumerated type. In the example in the question, Wed has the value 7, which requires three bits. So any value from 0 to 7 can be stored in an object of type week. Even if Wed wasn’t there, the remaining two values would need the same three low bits, so the values 0 to 7 would all still work. 13, on the other hand requires the fourth bit, which isn’t needed for any of the enumerators, so is not required to work.

Related

c++ best way to define a type consisting of a set of fixed integers

i need to work with "indicator" variables I that may take on one of 3 integer values {-1, 0, 1}.
so i would rather not declare
int I;
but rather
indicator_t I;
where the type indicator_t ~ {-1, 0, 1}
if furthermore i can later use I in numerical expressions, as an integer (without casting?), that would be excellent.
question:
how should i define the type indicator_t?

The simplest approach would be to convert the tri-state input into an enum class with 3 fixed discriminators:
enum class indicator_t {
negative = -1,
zero = 0,
positive = 1,
};
There are several nice things with this approach:
It's simple, which makes it easy to understand and maintain
enum class makes it a unique type from ints, which allow both the enum and an integer to appear as part of an overload set, if needed
The enum logically has a cardinality of 3 (the type can accept 3 logical inputs). 1
This cardinality prevents code that would otherwise be like a != 22 even though 2 is never a possible input
When inputs are passed to a function accepting indicator_t, it's clear from the call-site the intention. Consider the difference between the following two code snippets:
accept(1); // is this an indicator?
accept(indicator_t::positive); // ah, it's an indicator
If you need to convert to / from numeric values, you can create simple wrappers for this as well:
auto to_int(indicator_t indicator) -> int
{
return static_cast<int>(indicator);
}
auto to_indicator(int indicator) -> indicator_t
{
if (indicator > 1 || indicator < -1) {
// handle error. Throw an exception?
}
return static_cast<indicator_t>(indicator);
}
1 Technically C++ enums can take on any integral value that fits in std::underlying_type_t<the_enum>, but this doesn't change that the logical valid set of inputs is fixed, which prevents developer bugs. Compilers will even try to warn on checks outside of the logical range.
2 Technically this could still be done, but you'd need to explicitly static_cast all the time -- so it would appear as a code smell rather than a silently missed logic bug.

how to keep the leading zero's in an enum?

i'm trying to store bytes of data in to an enum,
enum bufferHeaders
{//output_input
void_void = 0x0003010000,
output_void = 0x00030B8000,
void_input = 0x0006010000,
led_on = 0xff1A01,
led_off = 0x001A01,
};
the leading zero's of my values are being ignored and the compiler stores my values as int.
later I need to be able to find how many bytes each set of data exactly is.

If you write
enum bufferHeaders : int
{
// ToDo - your values here
};
Then it's guaranteed that the backing type is int, so the number of bytes used to store each value is sizeof(int). But then the onus is on you to make sure that the enum values can fit into an int: 0xff1A01 for example might not.
Numerically speaking, preserving leading zeros makes no sense at all.

You are mixing up numerical value and representation!
There are always multiple ways to represent one and the same value: 77, 0x4d, 0x04d (guess, it is my birth year). I won't get older just because you change the representation.
Internally, the CPU has its own representation for a value, which is a specific bit pattern in one if its registers (or somewhere in RAM).
When printing out integer or enum values to console, the internal representation of the CPU is converted to some default representation suitable for most situations. If you want to have another representation, you need to explicitly tell which formatting to apply to retrieve such a representation:
std::cout << std::hex << std::showbase << std::internal << std::setfill('0')
<< std::setw(10) << 77 << std::endl;
Be aware that the stream modifiers above all but std::setw modify the stream persistently (until next modification), so you need to explicitly revert if you want to have different output format subsequently, whereas std::setw only applies for next value to be output and has to be re-applied for every subsequent value.

Is there a specific need for you to use enums? By default every value will have the same type and you seem to be focused on the numerical value stored. If you are really concerned about the representation it might be a good approach to leave the enum as int and storing the values in a std::map structure where you can use the enum value as the key itself and the value will be the numerical value whichever way you want to store it.

Storing 8 logical true/false values inside 1 byte?

I'm working on a microcontroller with only 2KB of SRAM and desperately need to conserve some memory. Trying to work out how I can put 8 0/1 values into a single byte using a bitfield but can't quite work it out.
struct Bits
{
int8_t b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
int main(){
Bits b;
b.b0 = 0;
b.b1 = 1;
cout << (int)b.b0; // outputs 0, correct
cout << (int)b.b1; // outputs -1, should be outputting 1
}
What gives?

All of your bitfield members are signed 1-bit integers. On a two's complement system, that means they can represent only either 0 or -1. Use uint8_t if you want 0 and 1:
struct Bits
{
uint8_t b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};

As a word of caution - the standard doesn't really enforce an implementation scheme for bitfields. There is no guarantee that Bits will be 1 byte, and hypothetically it is entirely possible for it to be larger.
In practice however the actual implementations usually follow the obvious logic and it will "almost always" be 1 byte in size, but again, there is no requirement that it is guaranteed. Just in case you want to be sure, you could do it manually.
BTW -1 is still true but it -1 != true

As noted, these variables consist of only a sign bit, so the only available values are 0 and -1.
A more appropriate type for these bitfields would be bool. C++14 §9.6/4:
If the value true or false is stored into a bit-field of type bool of any size (including a one bit bit-field), the original bool value and the value of the bit-field shall compare equal.
Yes, std::uint8_t will do the job, but you might as well use the best fit. You won't need things like the cast for std::cout << (int)b.b0;.

Signed and unsigned integers are the answer.
Keep in mind that signaling is just an interpretation of bits, -1 or 1 is just the 'print' serializer interpreting the "variable type", as it was "revealed" to cout functions (look operator overloading) by compiler, the bit is the same, its value also (on/off) - since you have only 1 bit.
Don't care about that, but is a good practice to be explicit, so prefer to declare your variable with unsigned, it instructs the compiler to mount a proper code when you set or get the value to a serializer like "print" (cout).
"COUT" OPERATOR OVERLOADING:
"cout" works through a series of functions which the parameter overloading instructs the compiler which function to call. So, there are two functions, one receives an unsigned and another signed, thus they can interpret the same data differently, and you can change it, instructing the compiler to call another one using cast. See cout << myclass

Strange C++ boolean casting behaviour (true!=true)

Just read on an internal university thread:
#include <iostream>
using namespace std;
union zt
{
bool b;
int i;
};
int main()
{
zt w;
bool a,b;
a=1;
b=2;
cerr<<(bool)2<<static_cast<bool>(2)<<endl; //11
cerr<<a<<b<<(a==b)<<endl; //111
w.i=2;
int q=w.b;
cerr<<(bool)q<<q<<w.b<<((bool)((int)w.b))<<w.i<<(w.b==a)<<endl; //122220
cerr<<((w.b==a)?'T':'F')<<endl; //F
}
So a,b and w.b are all declared as bool. a is assigned 1, b is assigned 2, and the internal representation of w.b is changed to 2 (using a union).
This way all of a,b and w.b will be true, but a and w.b won't be equal, so this might mean that the universe is broken (true!=true)
I know this problem is more theoretical than practical (a sake programmer doesn't want to change the internal representation of a bool), but here are the questions:
Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
Do you know any case where this corner case might become a real issue? (For example while loading binary data from a stream)
EDIT:
Three things:
bool and int have different sizes, that's okay. But what if I use char instead of int. Or when sizeof(bool)==sizeof(int)?
Please give answer to the two questions I asked if possible. I'm actually interested in answers to the second questions too, because in my honest opinion, in embedded systems (which might be 8bit systems) this might be a real problem (or not).
New question: Is this really undefined behavior? If yes, why? If not, why? Aren't there any assumptions on the boolean comparison operators in the specs?

If you read a member of a union that is a different member than the last member which was written then you get undefined behaviour. Writing an int member and then reading the union's bool member could cause anything to happen at any subsequent point in the program.
The only exception is where the unions is a union of structs and all the structs contain a common initial sequence, in which case the common sequence may be read.

Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
Any integer value that is non zero (or pointer that is non NULL) represents true.
But when comparing integers and bool the bool is converted to int before comparison.
Do you know any case where this corner case might become a real issue? (For example while binary loading of data from a stream)
It is always a real issue.
Is this okay?
I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.
In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.
If we go by your last code sample both a and b are bool and set to true by assigning 1 and 2 respectfully (Noe the 1 and 2 disappear they are now just true).
So breaking down your expression:
a!=0 // true (a converted to 1 because of auto-type conversion)
b!=0 // true (b converted to 1 because of auto-type conversion)
((a!=0) && (b!=0)) => (true && true) // true ( no conversion done)
a==0 // false (a converted to 1 because of auto-type conversion)
b==0 // false (b converted to 1 because of auto-type conversion)
((a==0) && (b==0)) => (false && false) // false ( no conversion done)
((a!=0) && (b!=0)) || ((a==0) && (b==0)) => (true || false) => true
So I would always expect the above expression to be well defined and always true.
But I am not sure how this applies to your original question. When assigning an integer to a bool the integer is converted to bool (as described several times). The actual representation of true is not defined by the standard and could be any bit pattern that fits in an bool (You may not assume any particular bit pattern).
When comparing the bool to int the bool is converted into an int first then compared.
Any real-world case
The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).
But this might mean bad programming practice.
Writing data in a binary format is non portable without knowledge.
There are problems with the size of each object.
There are problems with representation:
Integers (have endianess)
Float (Representation undefined ((usually depends on the underlying hardware))
Bool (Binary representation is undefined by the standard)
Struct (Padding between members may differ)
With all these you need to know the underlying hardware and the compiler. Different compilers or different versions of the compiler or even a compiler with different optimization flags may have different behaviors for all the above.
The problem with Union
struct X
{
int a;
bool b;
};
As people mention writing to 'a' and then reading from 'b' is undefined.
Why: because we do not know how 'a' or 'b' is represented on this hardware. Writing to 'a' will fill out the bits in 'a' but how does that reflect on the bits in 'b'. If your system used 1 byte bool and 4 byte int with lowest byte in low memory highest byte in the high memory then writing 1 to 'a' will put 1 in 'b'. But then how does your implementation represent a bool? Is true represented by 1 or 255? What happens if you put a 1 in 'b' and for all other uses of true it is using 255?
So unless you understand both your hardware and your compiler the behavior will be unexpected.
Thus these uses are undefined but not disallowed by the standard. The reason they are allowed is that you may have done the research and found that on your system with this particular compiler you can do some freeky optimization by making these assumptions. But be warned any changes in the assumptions will break your code.
Also when comparing two types the compiler will do some auto-conversions before comparison, remember the two types are converted into the same type before comparison. For comparison between integers and bool the bool is converted into an integer and then compared against the other integer (the conversion converts false to 0 and true to 1). If the objects being converted are both bool then no conversion is required and the comparison is done using boolean logic.

Normally, when assigning an arbitrary value to a bool the compiler will convert it for you:
int x = 5;
bool z = x; // automatic conversion here
The equivalent code generated by the compiler will look more like:
bool z = (x != 0) ? true : false;
However, the compiler will only do this conversion once. It would be unreasonable for it to assume that any nonzero bit pattern in a bool variable is equivalent to true, especially for doing logical operations like and. The resulting assembly code would be unwieldy.
Suffice to say that if you're using union data structures, you know what you're doing and you have the ability to confuse the compiler.

The boolean is one byte, and the integer is four bytes. When you assign 2 to the integer, the fourth byte has a value of 2, but the first byte has a value of 0. If you read the boolean out of the union, it's going to grab the first byte.
Edit: D'oh. As Oleg Zhylin points out, this only applies to a big-endian CPU. Thanks for the correction.

I believe what you're doing is called type punning:
http://en.wikipedia.org/wiki/Type_punning

Hmm strange, I am getting different output from codepad:
11
111
122222
T
The code also seems right to me, maybe it's a compiler bug?
See here

Just to write down my points of view:
Is this okay?
I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.
In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.
Any real-world case
The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).
But this might mean bad programming practice.
One more: in embedded systems this bug might be a bigger problem, than on a "normal" system, because the programmers usually do more "bit-magic" to get the job done.

Addressing the questions posed, I think the behavior is ok and shouldn't be a problem in real world. As we don't have ^^ in C++ I would suggest !bool == !bool as a safe bool comparison technique.
This way every non-zero value in bool variable will be converted to zero and every zero is converted to some non-zero value, but most probably one and the same for any negation operation.

Define smallest possible datatype in c++ that can hold six values

I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.
Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.
#include <iostream>
enum Foo
{
a = 0x0,
b = 0x1,
c = 0x2,
d = 0x3,
e = 0x4,
f = 0x5,
};
int main()
{
Foo myfoo = a;
char mychar = 'a';
std::cout << sizeof(myfoo); // prints 4
std::cout << sizeof(mychar); // prints 1
return 1;
}
Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)
Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.

A char is the smallest possible type.
If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:
struct foo {
unsigned int val1:3;
unsigned int val2:3;
};
and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.

C++ 0x will contain Strongly typed enumerations where you can specify the underlying datatype (in your example char), but current C++ does not support this. The standard is not clear about the use of a char here (the examples are with int, short and long), but they mention the underlying integral type and that would include char as well.
As of today Neil Butterworth's answer to create a class for your problem seems the most elegant, as you can even extend it to contain a nested enumeration if you want symbolical names for the values.

C++ does not express units of memory smaller than bytes. If you're producing them one at a time, That's the best you can do. Your own example works well. If you need to get just a few, You can use bit-fields as Alnitak suggests. If you're planning on allocating them one at a time, then you're even worse off. Most archetectures allocate page-size units, 16 bytes being common.
Another choice might be to wrap std::bitset to do your bidding. This will waste very little space, if you need many such values, only about 1 bit for every 8.
If you think about your problem as a number, expressed in base-6, and convert that number to base two, possibly using an Unlimited precision integer (for example GMP), you won't waste any bits at all.
This assumes, of course, that you're values have a uniform, random distribution. If they follow a different distribution, You're best bet will be general compression of the first example, with something like gzip.

You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.
For example:
struct example
{
unsigned int a : 3; //<Three bits, can be 0 through 7.
bool b : 1; //<One bit, the stores 0 or 1.
unsigned int c : 10; //<Ten bits, can be 0 through 1023.
unsigned int d : 19; //<19 bits, can be 0 through 524287.
}
In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.
Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.
Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:
//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;
To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:
//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);
See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).
Then to decode:
const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;
Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.

Minimal size what you can use - 1 byte.
But if you use group of enum values ( writing in file or storing in container, ..), you can pack this group - 3 bits per value.

You don't have to enumerate the values of the enum:
enum Foo
{
a,
b,
c,
d,
e,
f,
};
Foo myfoo = a;
Here Foo is an alias of int, which on your machine takes 4 bytes.
The smallest type is char, which is defined as the smallest addressable data on the target machine. The CHAR_BIT macro yields the number of bits in a char and is defined in limits.h.
[Edit]
Note that generally speaking you shouldn't ask yourself such questions. Always use [unsigned] int if it's sufficient, except when you allocate quite a lot of memory (e.g. int[100*1024] vs char[100*1024], but consider using std::vector instead).

The size of an enumeration is defined to be the same of an int. But depending on your compiler, you may have the option of creating a smaller enum. For example, in GCC, you may declare:
enum Foo {
a, b, c, d, e, f
}
__attribute__((__packed__));
Now, sizeof(Foo) == 1.

The best solution is to create your own type implemented using a char. This should have sizeof(MyType) == 1, though this is not guaranteed.
#include <iostream>
using namespace std;
class MyType {
public:
MyType( int a ) : val( a ) {
if ( val < 0 || val > 6 ) {
throw( "bad value" );
}
}
int Value() const {
return val;
}
private:
char val;
};
int main() {
MyType v( 2 );
cout << sizeof(v) << endl;
cout << v.Value() << endl;
}

It is likely that packing oddly sized values into bitfields will incur a sizable performance penalty due to the architecture not supporting bit-level operations (thus requiring several processor instructions per operation). Before you implement such a type, ask yourself if it is really necessary to use as little space as possible, or if you are committing the cardinal sin of programming that is premature optimization. At most, I would encapsulate the value in a class whose backing store can be changed transparently if you really do need to squeeze every last byte for some reason.

You can use an unsigned char. Probably typedef it into an BYTE. It will occupy only one byte.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why index casting into an enum is allowed but erroneous - c++

Related

c++ best way to define a type consisting of a set of fixed integers

how to keep the leading zero's in an enum?

Storing 8 logical true/false values inside 1 byte?

Strange C++ boolean casting behaviour (true!=true)

Define smallest possible datatype in c++ that can hold six values

Categories

Resources