Union struct produces garbage and general question about struct nomenclature

Union struct produces garbage and general question about struct nomenclature - c++

I read about unions the other day( today ) and tried the sample functions that came with them. Easy enough, but the result was clear and utter garbage.
The first example is:
union Test
{
int Int;
struct
{
char byte1;
char byte2;
char byte3;
char byte4;
} Bytes;
};
where an int is assumed to have 32 bits. After I set a value Test t; t.Int = 7; and then cout
cout << t.Bytes.byte1 << etc...
the individual bytes, there is nothing displayed, but my computer beeps. Which is fairly odd I guess.
The second example gave me even worse results.
union SwitchEndian
{
unsigned short word;
struct
{
unsigned char hi;
unsigned char lo;
} data;
} Switcher;
Looks a little wonky in my opinion. Anyway, from the description it says, this should automatically store the result in a high/little endian format when I set the value like
Switcher.word = 7656; and calling with cout << Switcher.data.hi << endl
The result of this were symbols not even defined in the ASCII chart. Not sure why those are showing up.
Finally, I had an error when I tried correcting the example by, instead of placing Bytes at the end of the struct, positioning it right next to it. So instead of
struct {} Bytes;
I wanted to write
struct Bytes {};
This tossed me a big ol' error. What's the difference between these? Since C++ cannot have unnamed structs it seemed, at the time, pretty obvious that the Bytes positioned at the beginning and at the end are the things that name it. Except no, that's not the entire answer I guess. What is it then?

The beeps and weird symbols are because you are trying to print the character representations of decimal numbers, in this case, ASCII control characters. In your first example (the beeps), you are printing ASCII 7 which is the bell character.
You can cast your data to int to print out the actual decimal representation, e.g.:
cout << (int)t.Bytes.byte1 << endl << (int)t.Bytes.byte2 << endl << (int)t.Bytes.byte3 << endl << (int)t.Bytes.byte4 << endl;
You can do something similar for your second example to see the decimal representation of those unsigned char values in memory.
The reason for the difference is that the type of cout, basic_ostream, has multiple overloads for operator<< for various basic types.
For your last issue, what compiler error are you getting? Both struct definitions compile fine for me when using VS2008.

Note that, technically, reading from a member of a union other than the member that was last written to results in undefined behavior, so if you last assigned a value to Int, you cannot read a value from Bytes (there's some discussion of this on StackOverflow, for example, in this answer to another question).
Chris Schmich gives a good explanation of why you are hearing beeps and seeing control characters, so I won't repeat that.
For your final question, struct {} Bytes; declares an instance named Bytes of an unnamed struct. It is similar to saying:
struct BytesType {};
BytesType Bytes;
except that you cannot refer to BytesType elsewhere. struct Bytes {}; defines a struct named Bytes but declares no instances of it.

Related

Sign & Unsigned Char is not working in C++

In C++ Primer 5th Edition I saw this
when I tried to use it---
At this time it didn't work, but the program's output did give a weird symbol, but signed is totally blank And also they give some warnings when I tried to compile it. But C++ primer and so many webs said it should work... So I don't think they give the wrong information did I do something wrong?
I am newbie btw :)

But C++ primer ... said it should work
No it doesn't. The quote from C++ primer doesn't use std::cout at all. The output that you see doesn't contradict with what the book says.
So I don't think they give the wrong information
No1.
did I do something wrong?
It seems that you've possibly misunderstood what the value of a character means, or possibly misunderstood how character streams work.
Character types are integer types (but not all integer types are character types). The values of unsigned char are 0..255 (on systems where size of byte is 8 bits). Each2 of those values represent some textual symbol. The mapping from a set of values to a set of symbols is called a "character set" or "character encoding".
std::cout is a character stream. << is stream insertion operator. When you insert a character into a stream, the behaviour is not to show the numerical value. Instead, the behaviour to show the symbol that the value is mapped to3 in the character set that your system uses. In this case, it appears that the value 255 is mapped to whatever strange symbol you saw on the screen.
If you wish to print the numerical value of a character, what you can do is convert to a non-character integer type and insert that to the character stream:
int i = c;
std::cout << i;
1 At least, there's no wrong information regarding your confusion. The quote is a bit inaccurate and outdated in case of c2. Before C++20, the value was "implementation defined" rather than "undefined". Since C++20, the value is actually defined, and the value is 0 which is the null terminator character that signifies end of a string. If you try to print this character, you'll see no output.
2 This was bit of a lie for simplicity's sake. Some characters are not visible symbols. For example, there is the null terminator charter as well as other control characters. The situation becomes even more complex in the case of variable width encodings such as the ubiquitous Unicode, where symbols may consist of a sequence of several char. In such encoding, and individual char cannot necessarily be interpreted correctly without other char that are part of such sequence.
3 And this behaviour should feel natural once you grok the purpose of character types. Consider following program:
unsigned char c = 'a';
std::cout << c;
It would be highly confusing if the output would be a number that is the value of the character (such as 97 which may be the value of the symbol 'a' on the system) rather than the symbol 'a'.
For extra meditation, think about what this program might print (and feel free to try it out):
char c = 57;
std::cout << c << '\n';
int i = c;
std::cout << i << '\n';
c = '9';
std::cout << c << '\n';
i = c;
std::cout << i << '\n';

This is due to the behavior of the << operator on the char type and the character stream cout. Note, the << is known as formatted output means it does some implicit formatting.
We can say that the value of a variable is not the same as its representation in certain contexts. For example:
int main() {
bool t = true;
std::cout << t << std::endl; // Prints 1, not "true"
}
Think of it this way, why would we need char if it would still behave like a number when printed, why not to use int or unsigned? In essence, we have different types so to have different behaviors which can be deduced from these types.
So, the underlying numeric value of a char is probably not what we looking for, when we print one.
Check this for example:
int main() {
unsigned char c = -1;
int i = c;
std::cout << i << std::endl; // Prints 255
}
If I recall correctly, you're somewhat close in the Primer to the topic of built-in types conversions, it will bring in clarity when you'll get to know these rules better. Anyway, I'm sure, you will benefit greatly from looking into this article. Especially the "Printing chars as integers via type casting" part.

Working with chars and only solution I've found is to static_cast twice. Is there a way around this?

This code seems a bit ridiculous, but it's the only way I found to deal with my problem...
char word[10];
cout << std::hex << static_cast<int>(static_cast<unsigned char>(word[i]));
This is my way of cout-ing a char as a hex value (including signed chars). It seems to work great (to my knowledge), but I feel it's a very stupid way to do it.
I should add, I'm reading a file, that's why my data type is char initially.

You are already doing it the right way, although using int would work as well as unsigned int. You could make a function or a functor if you'll be doing this in several places, e.g.:
int char_to_int(char ch)
{
return static_cast<unsigned char>(ch);
}
// ...
cout << hex << char_to_int(word[i]);
As noted in comments, another option is word[i] & 0xFF with no casting. This is actually implementation-defined but most likely will give the intended result. But again, if you will be doing this in several places I would suggest wrapping it up in a function so that it is more obvious what is going on.

A story of stringstream, hexadecimal and characters

I have a string containing hexadecimal values (two characters representing a byte). I would like to use std::stringstream to make the conversion as painless as possible, so I came up with the following code:
std::string a_hex_number = "e3";
{
unsigned char x;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> x;
std::cout << x << std::endl;
}
To my biggest surprise this prints out "e" ... Of course I don't give up so easily, and I modify the code to be:
{
unsigned short y;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> y;
std::cout << y << std::endl;
}
This, as expected, prints out 227 ...
I looked at http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/ and http://www.cplusplus.com/reference/ios/hex/ but I just could not find a reference which tells me more about why this behaviour comes ...(yes, I feel that it is right because when extracting a character it should take one character, but I am a little bit confused that std:hex is ignored for characters). Is there a mention about this situation somewhere?
(http://ideone.com/YHt7Fz)
Edit I am specifically interested if this behaviour is mentioned in any of the STL standards.

If I understand correctly, you're trying to convert a string in
hex to an unsigned char. So for starters, since this is
"input", you should be using std::istringstream:
std::istringstream ss( a_hex_number );
ss >> std::hex >> variable;
Beyond that, you want the input to parse the string as an
integral value. Streams do not consider character types as
numeric values; they read a single character into them (after
skipping leading white space). To get a numeric value, you
should input to an int, and then convert that to unsigned
char. Characters don't have a base, so std::hex is
irrelevant for them. (The same thing holds for strings, for
example, and even for floating point.)
With regards to the page you site: the page doesn't mention
inputting into a character type (strangely enough, because it
does talk about all other types, including some very special
cases). The documentation for the std::hex manipulator is
also weak: in the running text, it only says that "extracted
values are also expected to be in hexadecimal base", which isn't
really correct; in the table, however, it clearly talks about
"integral values". In the standard, this is documented in
§27.7.2.2.3. (The >> operators for character types are not
member functions, but free functions, so are defined in
a different section.) What we are missing, however, is a good
document which synthesizes these sort of things: whether the
>> operator is a member or a free function doesn't really
affect the user much; you want to see all of the >> available,
with their semantics, in one place.

Let's put it simple: variable type is 'stronger' than 'hex'. That's why 'hex' is ignored for 'char' variable.
Longer story:
'Hex' modifies internal state of stringstream object telling it how to treat subsequent operations on integers. However, this does not apply to chars.

When you print out a character (i.e. unsigned char), it's printed as a character, not as a number.

C++ cout and enum representations

I have an enum that I'm doing a cout on, as in: cout << myenum.
When I debug, I can see the value as an integer value but when cout spits it out, it shows up as the text representation.
Any idea what cout is doing behind the scenes? I need that same type of functionality, and there are examples out there converting enum values to string but that seems like we need to know what those values are ahead of time. In my case, I don't. I need to take any ol' enum and get its text representation. In C# it's a piece of cake; C++.. not easy at all.
I can take the integer value if I need to and convert it appropriately, but the string would give me exactly what I need.
UPDATE:
Much thanks to everyone that contributed to this question. Ultimately, I found my answer in some buried code. There was a method to convert the enum value to a string representing the actual move like "exd5" what have ya. In this method though they were doing some pretty wild stuff which I'm staying away form at the moment. My main goal was to get to the string representation.

Enum.hpp:
enum Enum {
FOO,
BAR,
BAZ,
NUM_ENUMS
};
extern const char* enum_strings[];
Enum.cpp:
const char* enum_strings[] = {
"FOO",
"BAR",
"BAZ",
"NUM_ENUMS",
0 };
Then when I want to output the symbolic representation of the enum, I use std::cout << enum_strings[x].
Thus, you do need to know the string values, but only in one place—not everywhere you use this.

This functionality comes from the IOStreams library. std::cout is an std::ostream.
std::stringstream is an std::ostream too.
int x = 5;
std::stringstream ss;
ss << x;
// ss.str() is a string containing the text "5"

Serializing struct containing char*

I'm getting an error with serializing a char* string error C2228: left of '.serialize' must have class/struct/union I could use a std::string and then get a const char* from it. but I require the char* string.

The error message says it all, there's no support in boost serialization to serialize pointers to primitive types.
You can do something like this in the store code:
int len = strlen(string) + 1;
ar & len;
ar & boost::serialization::make_binary_object(string, len);
and in the load code:
int len;
ar & len;
string = new char[len]; //Don't forget to deallocate the old string
ar & boost::serialization::make_binary_object(string, len);

There is no way to serialize pointer to something in boost::serialization (I suspect, there is no actual way to do that too). Pointer is just a memory address, these memory addresses are generally specific for instance of object, and, what's really important, this address doesn't contain information where to stop the serialization.
You can't just say to your serializer: "Hey, take something out from this pointer and serialize this something. I don't care what size does it have, just do it..."
First and the optimal solution for your problem is wrapping your char* using std::string or your own string implementation. The second would mean writing special serializing routine for char* and, I suspect, will generally do the same as the first method does.

Try this:
struct Example
{
int i;
char c;
char * text; // Prefer std::string to char *
void Serialize(std::ostream& output)
{
output << i << "\n";
output << c << "\n";
// Output the length of the text member,
// followed by the actual text.
size_t text_length = 0;
if (text)
(
text_length = strlen(text);
}
output << text_length << "\n";
output << text << "\n";
};
void Input(std::istream& input)
{
input >> i;
input.ignore(1000, '\n'); // Eat any characters after the integer.
input >> c;
input.ignore(1000, '\n');
// Read the size of the text data.
size_t text_length = 0;
input >> text_length;
input.ignore(1000, '\n');
delete[] text; // Destroy previous contents, if any.
text = NULL;
if (text_length)
{
text = new char[text_length];
input.read(text, text_length);
}
};
Since pointers are not portable, the data must be written instead.
The text is known as a variable length field. Variable length fields are commonly output (serialized) in two data structures: length followed by data OR data followed by terminal character. Specifying the length first allows usage of block reading. With the latter data structure, the data must be read one unit at a time until the terminal character is read. Note: the latter data structure also implies that the terminal character cannot be part of the set of data items.
Some important issue to think about for serialization:
1. Use a format that is platform independent, such as ASCII text for numbers.
2. If a platform method is not available or allowed, define the exact specification for numbers, including Endianness and maximum length.
3. For floating point numbers, the specification should treat the components of a floating point number as individual numbers that have to abide by the specification for a number (i.e. exponent, magnitude and mantissa).
4. Prefer fixed length records to variable length records.
5. Prefer serializing to a buffer. Users of the object can then create a buffer of one or more objects and write the buffer as one block (using one operation). Likewise for input.
6. Prefer using a database to serializing. Although this may not be possible for networking, try every effort to have a database manage the data. The database may be able to send the data over the network.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Union struct produces garbage and general question about struct nomenclature - c++

Related

Sign & Unsigned Char is not working in C++

Working with chars and only solution I've found is to static_cast twice. Is there a way around this?

A story of stringstream, hexadecimal and characters

C++ cout and enum representations

Serializing struct containing char*

Categories

Resources