Why does a bool array have a int type output that is not 1 or 0? [duplicate] - c++

Many compilers seem to be keeping only 0 or 1 in bool values, but I'm not sure this will always work:
int a = 2;
bool b = a;
int c = 3 + b; // 4 or 5?

Yes:
In C++ (§4.5/4):
An rvalue of type bool can be
converted to an rvalue of type int,
with false becoming zero and true
becoming one.
In C, when a value is converted to _Bool, it becomes 0 or 1 (§6.3.1.2/1):
When any scalar value is converted to
_Bool, the result is 0 if the value compares equal to 0; otherwise, the
result is 1.
When converting to int, it's pretty straight-forward. int can hold 0 and 1, so there's no change in value (§6.3.1.3).

Well, not always...
const int n = 100;
bool b[n];
for (int i = 0; i < n; ++i)
{
int x = b[i];
if (x & ~1)
{
std::cout << x << ' ';
}
}
Output on my system:
28 255 34 148 92 192 119 46 165 192 119 232 26 195 119 44 255 34 96 157 192 119
8 47 78 192 119 41 78 192 119 8 250 64 2 194 205 146 124 192 73 64 4 255 34 56 2
55 34 224 255 34 148 92 192 119 80 40 190 119 255 255 255 255 41 78 192 119 66 7
8 192 119 192 73 64 240 255 34 25 74 64 192 73 64
The reason for this apparently weird output is laid out in the standard, 3.9.1 §6:
Values of type bool are either true or false. Using a bool value in ways described by this International Standard as "undefined", such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false.

Is C/C++ .......
There's no language named C/C++.
bool type always guaranteed to be 0 or 1 when typecast'ed to int?
In C++ yes because section $4.5/4 says
An rvalue of type bool can be converted to an rvalue of type int, with false becoming zero and true becoming one.
.
int c = 3 + b; // 4 or 5?
The value of c will be 4

One more example when you are out of the safe boat:
bool b = false;
*(reinterpret_cast<char*>(&b)) = 0xFF;
int from_bool = b;
cout << from_bool << " is " << (b ? "true" : "false");
Output (g++ (GCC) 4.4.7):
255 is true
To be added to the FredOverflow's example.

There is no bool type in C pre C99 (Such as C90), however the bool type in C99/C++ is always guaranteed to be 0 or 1.
In C, all boolean operation are guaranteed to return either 0 or 1, whether the bool type is defined or not.
So a && b or !a or a || b will always return 0 or 1 in C or C++ regardless of the type of a and b.

Types with padding bits may behave strangely if the padding bits don't hold the values expected for the type. Most C89 implementations didn't use padding bits with any of their integer types, but C99 requires that implementations define such a type: _Bool. Reading a _Bool when all of its bits are zero will yield zero. Writing any non-zero value to a _Bool will set its bits to some pattern which will yield 1 when read. Writing zero will set the bits to a pattern (which may or may not be all-bits-zero) which will yield 0 when read.
Unless specified otherwise in an implementation's documentation, any bit pattern other than all-bits-zero which could not have been produced by storing a zero or non-zero value to a _Bool is a trap representation; the Standard says nothing about what will happen if an attempt is made to read such a value. Given, e.g.
union boolChar { _Bool b; unsigned char c; } bc;
storing zero to bc.c and reading bc.b will yield zero. Storing zero or one to bc.b will set bc.c to values which, if written, will cause bc.b to hold zero or one. Storing any other value to bc.c and reading bc.b will yield Undefined Behavior.

Related

Why does the Russian Peasant Algorithm work with negative numbers when using Logic Right Shift?

I was tasked to write the Russian Multiplication Algorithm in any programming language without using any multiplication/division operators (or math methods from libraries etc.) except the shift operators. I used Java and this is my method:
public static int multiply(int a, int b) {
System.out.printf("%d %d\n",a,b);
if(b == 1 || b == -1 || a == 0)
return a;
int a0 = a << 1;
int b0 = b >>> 1;
int recursionResult = multiply(a0,b0);
if((b & 1) == 1)
recursionResult += a;
return recursionResult;
}
This works for me without any problems. However, I don't understand why it does for negative b.
I tried using the arithmetic right shift for dividing b by 2 at first. It failed, and when I looked at the output, I completely get why. Then I tried using the logic right shift just for fun (before maybe trying to invert b if it is negative, and inverting it back at the end), and it suddenly worked!
The output looks like this (for a=11, b=-50, as an example):
11 -50
22 2147483623
44 1073741811
88 536870905
176 268435452
352 134217726
704 67108863
1408 33554431
2816 16777215
5632 8388607
11264 4194303
22528 2097151
45056 1048575
90112 524287
180224 262143
360448 131071
720896 65535
1441792 32767
2883584 16383
5767168 8191
11534336 4095
23068672 2047
46137344 1023
92274688 511
184549376 255
369098752 127
738197504 63
1476395008 31
-1342177280 15
1610612736 7
-1073741824 3
-2147483648 1
This looks quite random to me, but in the end, you get -550, the correct result. Can anyone explain this to me?

Why does Microsoft's implementation of std::string require 40 bytes on the stack?

Having recently watched this video about facebook's implementation of string, I was curious to see the internals of Microsoft's implementation. Unfortunately, the string file (in %VisualStudioDirectory%/VC/include) doesn't seem to contain the actual definition, but rather just conversion functions (e.g. atoi) and some operator overloads.
I decided to do some poking and prodding at it from user-level programs. The first thing I did, of course, was to test sizeof(std::string). To my surprise, std::string takes 40 bytes! (On 64-bit machines anyways.) The previously mentioned video goes into detail about how facebook's implementation only requires 24 bytes and gcc's takes 32 bytes, so this was shocking to say the least.
We can dig a little deeper by writing a simple program that prints off the contents of the data byte-by-byte (including the c_str address), as such:
#include <iostream>
#include <string>
int main()
{
std::string test = "this is a very, very, very long string";
// Print contents of std::string test.
char* data = reinterpret_cast<char*>(&test);
for (size_t wordNum = 0; wordNum < sizeof(std::string); wordNum = wordNum + sizeof(uint64_t))
{
for (size_t i = 0; i < sizeof(uint64_t); i++)
std::cout << (int)(data[wordNum + i]) << " ";
std::cout << std::endl;
}
// Print the value of the address returned by test.c_str().
// (Doing this byte-by-byte to match the above values).
const char* testAddr = test.c_str();
char* dataAddr = reinterpret_cast<char*>(&testAddr);
std::cout << "c_str address: ";
for (size_t i = 0; i < sizeof(const char*); i++)
std::cout << (int)(dataAddr[i]) << " ";
std::cout << std::endl;
}
This prints out:
48 33 -99 -47 -55 1 0 0
16 78 -100 -47 -55 1 0 0
-52 -52 -52 -52 -52 -52 -52 -52
38 0 0 0 0 0 0 0
47 0 0 0 0 0 0 0
c_str address: 16 78 -100 -47 -55 1 0 0
Examining this, we can see that the second word contains the address that points to the allocated data for the string, the third word is garbage (a buffer for Short String Optimization), the fourth word is the size, and the fifth word is the capacity. But what about the first word? It appears to be an address, but what for? Shouldn't everything already be accounted for?
For the sake of completeness, the following output shows SSO (the string is set to "Short String"). Note that the first word still seems to represent a pointer:
0 36 -28 19 123 1 0 0
83 104 111 114 116 32 83 116
114 105 110 103 0 -52 -52 -52
12 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0
c_str address: 112 -9 79 -108 23 0 0 0
EDIT: Ok, so having done more testing, it appears that the size of std::string actually decreases down to 32 bytes when compiled for release, and the first word is no longer there. But I'm still really interested in knowing why that is the case, and what that extra pointer is used for in debug mode.
Update: As per the tip by the user Yuushi, the extra word appears to related to Debug Iterator Support. This was verified when I turned off Debug Iterator Support (an example for doing this is shown here) and the size of std::string was reduced to 32 bytes, with the first word now missing.
However, it would still be really interesting to see how Debug Iterator Support uses that additional pointer to check for incorrect iterator use.
Visual Studio 2015 use xstring instead of string to define std::basic_string
NOTE: This answer is applied for VS2015 only, VS2013 uses a different implementation, however, they are more or less the same.
It's implemented as:
template<class _Elem,
class _Traits,
class _Alloc>
class basic_string
: public _String_alloc<_String_base_types<_Elem, _Alloc> >
{
// This class has no member data
}
_String_alloc use a _Compressed_pair<_Alty, _String_val<_Val_types> > to store its data, in std::string, _Alty is std::allocator<char> and _Val_types is _Simple_types<char>, because std::is_empty<std::allocator<char>>::value is true, sizeof _Compressed_pair<_Alty, _String_val<_Val_types> > is the same with sizeof _String_val<_Val_types>
class _String_val inherites from _Container_base which is a typedef of _Container_base0 when #if _ITERATOR_DEBUG_LEVEL == 0 and _Container_base12 otherwise. The difference between them is _Container_base12 contains pointer to _Container_proxy for debug purpose. Beside that _String_val also have those members:
union _Bxty
{ // storage for small buffer or pointer to larger one
_Bxty()
{ // user-provided, for fancy pointers
}
~_Bxty() _NOEXCEPT
{ // user-provided, for fancy pointers
}
value_type _Buf[_BUF_SIZE];
pointer _Ptr;
char _Alias[_BUF_SIZE]; // to permit aliasing
} _Bx;
size_type _Mysize; // current length of string
size_type _Myres; // current storage reserved for string
With _BUF_SIZE is 16.
And pointer_type, size_type is well aligned together in this system. No alignment is necessary.
Hence, when _ITERATOR_DEBUG_LEVEL == 0 then sizeof std::string is:
_BUF_SIZE + 2 * sizeof size_type
otherwise it's
sizeof pointer_type + _BUF_SIZE + 2 * sizeof size_type

UDP receive data as unsigned char

I am a trying to receive some data from network using UDP and parse it.
Here is the code,
char recvline[1024];
int n=recvfrom(sockfd,recvline,1024,0,NULL,NULL);
for(int i=0;i<n;i++)
cout << hex <<static_cast<short int>(recvline[i])<<" ";
Printed the output,
19 ffb0 0 0 ff88 d 38 19 48 38 0 0 2 1 3 1 ff8f ff82 5 40 20 16 6 6 22 36 6 2c 0 0 0 0 0 0 0 0
But I am expecting the output like,
19 b0 0 0 88 d 38 19 48 38 0 0 2 1 3 1 8f 82 5 40 20 16 6 6 22 36 6 2c 0 0 0 0 0 0 0 0
The ff shouldn't be there on printed output.
Actually I have to parse this data based on each character,
Like,
parseCommand(recvline);
and the parse code looks,
void parseCommand( char *msg){
int commId=*(msg+1);
switch(commId){
case 0xb0 : //do some operation
break;
case 0x20 : //do another operation
break;
}
}
And while debugging I am getting commId=-80 on watch.
Note:
In Linux I am getting successful output with the code, note that I have used unsigned char instead char for the read buffer.
unsigned char recvline[1024];
int n=recvfrom(sockfd,recvline,1024,0,NULL,NULL);
Where as in Windows recvfrom() not allowing the second argument as unsigned it giving build error, so I chose char
Looks like you might be getting the correct values, but your cast to short int during printing sign-extends your char value, causing ff to be propogated to the top byte if the top bit of your char is 1 (i.e. it is negative). You should first cast it to unsigned type, then extend to int, so you need 2 casts:
cout << hex << static_cast<short int>(static_cast<uint8_t>(recvline[i]))<<" ";
I have tested this and it behaves as expected.
In response to your extension: the data read is fine, it is a matter of how you interpret it. To parse correctly you should:
uint8_t commId= static_cast<uint8_t>(*(msg+1));
switch(commId){
case 0xb0 : //do some operation
break;
case 0x20 : //do another operation
break;
}
As you store your data in a signed data type conversions/promotion to bigger data types will first sign extend the value (filling the high order bits with the value of the MSB) even if it then gets converted to unsigned datatypes.
One solution is to define recvline as uint8_t[] in the first place an cast it to char* when passing it to the recvfrom function. That way, you only have to cast it once and you are using the same code in your windows and linux version. Also uint8_t[] is (at least to me) a clear indication that you are using the array as raw memory instead of a string of some kind.
Another possibility is to simply perform a bitwise And: (recvline[i] & 0xff). Thanks to automatic integral promotion this doesn't even require a cast.
Personal Note:
It is really annoying that the C and C++ standards don't provide a separate type for raw memory (yet), but with any luck well get a byte type in a future standard revision.

Weird list output of __int64

im having a problem with __int64 and %I64u. or may be there is a problem with my formula. i am trying to mimic the output below. but there is a weird thing happen in some line items. i cant understand what happened since others were printed fine.
NOTE: the main source of these list are from binary raw data. so i fetched it from hex and trying to convert it in __int64. my list consist of 120 line items which outputs well up to line 73 and fail from line 74 with expected value of 2276812558 and displayed 18446744071691396878. From line 74 up to line 120 results are intermittent. others are ok and other lines fail.
anyone help??
SOURCE:
74 2276812558 <-- expected output
...
110 88343310421 <-- expected output
111 101677534814 <-- expected output
112 116372862414
113 132547934111 <-- expected output
114 150330130721
115 169856101434 <-- expected output
116 193905458276
117 220253625665
118 249089120712 <-- expected output
119 280613529205
120 315042247217
here is my code:
longint = (__int64)((col[3] << 24) | (col[2] << 16) | (col[1] << 8)) | ((col[0]) | (__int64)((col[7] << 56) | (col[6] << 48) | (col[5] << 40) | (col[4] << 32)) << 32);
sprintf(longintbuf,"%I64u", longint );
.OUTPUT GENERATED
74 18446744071691396878 <-- err
...
110 18446744071858548821 <-- err
111 18446744072307871326 <-- err
112 116372862414
113 18446744073113499551 <-- err
114 150330130721
115 18446744071766961210 <-- err
116 193905458276
117 220253625665
118 18446744073690569160 <-- err
119 280613529205
120 315042247217
if col isn't an array of 64-bit types, your shifts are causing undefined behaviour. Cast before shifting:
(__int64)col[7] << 56
It's also undefined behaviour if the shift would cause a sign change, so be careful for that when using signed types (like you are).
From C11 6.5.7 Bitwise shift operators (emphasis mine):
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. ... If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

c++ array sorting with some specifications

I'm using C++. Using sort from STL is allowed.
I have an array of int, like this :
1 4 1 5 145 345 14 4
The numbers are stored in a char* (i read them from a binary file, 4 bytes per numbers)
I want to do two things with this array :
swap each number with the one after that
4 1 5 1 345 145 4 14
sort it by group of 2
4 1 4 14 5 1 345 145
I could code it step by step, but it wouldn't be efficient. What I'm looking for is speed. O(n log n) would be great.
Also, this array can be bigger than 500MB, so memory usage is an issue.
My first idea was to sort the array starting from the end (to swap the numbers 2 by 2) and treating it as a long* (to force the sorting to take 2 int each time). But I couldn't manage to code it, and I'm not even sure it would work.
I hope I was clear enough, thanks for your help : )
This is the most memory efficient layout I could come up with. Obviously the vector I'm using would be replaced by the data blob you're using, assuming endian-ness is all handled well enough. The premise of the code below is simple.
Generate 1024 random values in pairs, each pair consisting of the first number between 1 and 500, the second number between 1 and 50.
Iterate the entire list, flipping all even-index values with their following odd-index brethren.
Send the entire thing to std::qsort with an item width of two (2) int32_t values and a count of half the original vector.
The comparator function simply sorts on the immediate value first, and on the second value if the first is equal.
The sample below does this for 1024 items. I've tested it without output for 134217728 items (exactly 536870912 bytes) and the results were pretty impressive for a measly macbook air laptop, about 15 seconds, only about 10 of that on the actual sort. What is ideally most important is no additional memory allocation is required beyond the data vector. Yes, to the purists, I do use call-stack space, but only because q-sort does.
I hope you get something out of it.
Note: I only show the first part of the output, but I hope it shows what you're looking for.
#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <cstdint>
// a most-wacked-out random generator. every other call will
// pull from a rand modulo either the first, or second template
// parameter, in alternation.
template<int N,int M>
struct randN
{
int i = 0;
int32_t operator ()()
{
i = (i+1)%2;
return (i ? rand() % N : rand() % M) + 1;
}
};
// compare to integer values by address.
int pair_cmp(const void* arg1, const void* arg2)
{
const int32_t *left = (const int32_t*)arg1;
const int32_t *right = (const int32_t *)arg2;
return (left[0] == right[0]) ? left[1] - right[1] : left[0] - right[0];
}
int main(int argc, char *argv[])
{
// a crapload of int values
static const size_t N = 1024;
// seed rand()
srand((unsigned)time(0));
// get a huge array of random crap from 1..50
vector<int32_t> data;
data.reserve(N);
std::generate_n(back_inserter(data), N, randN<500,50>());
// flip all the values
for (size_t i=0;i<data.size();i+=2)
{
int32_t tmp = data[i];
data[i] = data[i+1];
data[i+1] = tmp;
}
// now sort in pairs. using qsort only because it lends itself
// *very* nicely to performing block-based sorting.
std::qsort(&data[0], data.size()/2, sizeof(data[0])*2, pair_cmp);
cout << "After sorting..." << endl;
std::copy(data.begin(), data.end(), ostream_iterator<int32_t>(cout,"\n"));
cout << endl << endl;
return EXIT_SUCCESS;
}
Output
After sorting...
1
69
1
83
1
198
1
343
1
367
2
12
2
30
2
135
2
169
2
185
2
284
2
323
2
325
2
347
2
367
2
373
2
382
2
422
2
492
3
286
3
321
3
364
3
377
3
400
3
418
3
441
4
24
4
97
4
153
4
210
4
224
4
250
4
354
4
356
4
386
4
430
5
14
5
26
5
95
5
145
5
302
5
379
5
435
5
436
5
499
6
67
6
104
6
135
6
164
6
179
6
310
6
321
6
399
6
409
6
425
6
467
6
496
7
18
7
65
7
71
7
84
7
116
7
201
7
242
7
251
7
256
7
324
7
325
7
485
8
52
8
93
8
156
8
193
8
285
8
307
8
410
8
456
8
471
9
27
9
116
9
137
9
143
9
190
9
190
9
293
9
419
9
453
With some additional constraints on both your input and your platform, you can probably use an approach like the one you are thinking of. These constraints would include
Your input contains only positive numbers (i.e. can be treated as unsigned)
Your platform provides uint8_t and uint64_t in <cstdint>
You address a single platform with known endianness.
In that case you can divide your input into groups of 8 bytes, do some byte shuffling to arrange each groups as one uint64_t with the "first" number from the input in the lower-valued half and run std::sort on the resulting array. Depending on endianness you may need to do more byte shuffling to rearrange each sorted 8-byte group as a pair of uint32_t in the expected order.
If you can't code this on your own, I'd strongly advise you not to take this approach.
A better and more portable approach (you have some inherent non-portability by starting from a not clearly specified binary file format), would be:
std::vector<int> swap_and_sort_int_pairs(const unsigned char buffer[], size_t buflen) {
const size_t intsz = sizeof(int);
// We have to assume that the binary format in buffer is compatible with our int representation
// we also require an even number of integers
assert(buflen % (2*intsz) == 0);
// load pairwise
std::vector< std::pair<int,int> > pairs;
pairs.reserve(buflen/(2*intsz));
for (const unsigned char* bufp=buffer; bufp<buffer+buflen; bufp+= 2*intsz) {
// It would be better to have a more portable binary -> int conversion
int first_value = *reinterpret_cast<int*>(bufp);
int second_value = *reinterpret_cast<int*>(bufp + intsz);
// swap each pair here
pairs.emplace_back( second_value, firstvalue );
}
// less<pair<..>> does lexicographical ordering, which is what you are looking ofr
std::sort(pairs.begin(), pairs.end());
// convert back to linear vector
std::vector<int> result;
result.reserve(2*pairs.size());
for (auto& entry : pairs) {
result.push_back(entry.first);
result.push_back(entry.second);
}
return result;
}
Both the inital parse/swap pass (which you need anyway) and the final conversion are O(N), so the total complexity is still (O(N log(N)).
If you can continue to work with pairs, you can save the final conversion. The other way to save that conversion would be to use a hand-coded sort with two-int strides and two-int swap: much more work - and possibly still hard to get as efficient as a well-tuned library sort.
Do one thing at a time. First, give your data some *struct*ure. It seems that each 8 byte form a unit of the
form
struct unit {
int key;
int value;
}
If the endianness is right, you can do this in O(1) with a reinterpret_cast. If it isn't, you'll have to live with a O(n) conversion effort. Both vanish compared to the O(n log n) search effort.
When you have an array of these units, you can use std::sort like:
bool compare_units(const unit& a, const unit& b) {
return a.key < b.key;
}
std::sort(array, length, compare_units);
The key to this solution is that you do the "swapping" and byte-interpretation first and then do the sorting.