why don't identical bitsets convert to identical ulong - c++

I am dealing with data in a vector of std::bitset<16>, which i both have to convert to and from unsigned long (through std::bitset::to_ulong()) and to and from strings using a self-made function (the exact algorithm is irrelavant for this question)
the convertions between bitset vector and string does at first seem to work fine, since that if i first convert a vector of bitsets to string and then back to bitset it is identical; which i have proven by making a program which includes this:
for (std::bitset<16>& B : my_bitset16vector) std::cout<<B<<std::endl;//print bitsets before conversion
bitset_to_string(my_bitset16vector,my_str);
string_to_bitset(my_bitset16vector,my_str);
std::cout<<std::endl
for (std::bitset<16>& B : my_bitset16vector) std::cout<<B<<std::endl;//print bitsets after conversion
the output could look somewhat like this (in this case with only 4 bitsets):
1011000011010000
1001010000011011
1110100001101111
1001000011001111
1011000011010000
1001010000011011
1110100001101111
1001000011001111
Judging by this, the bitsets before and after conversion are clearly identical, however despite this, the bitsets converts completely differently when i tell them to convert to unsigned long; in a program which could look like this:
for (std::bitset<16>& B : my_bitset16vector) std::cout<<B<<".to_ulong()="<<B.to_ulong()<<std::endl;//print bitsets before conversation
bitset_to_string(my_bitset16vector,my_str);
string_to_bitset(my_bitset16vector,my_str);
std::cout<<std::endl
for (std::bitset<16>& B : my_bitset16vector) std::cout<<B<<".to_ulong()="<<B.to_ulong()<<std::endl;//print bitsets after conversion
the output could look somewhat like this:
1011000011010000.to_ulong()=11841744
1001010000011011.to_ulong()=1938459
1110100001101111.to_ulong()=22472815
1001000011001111.to_ulong()=18649295
1011000011010000.to_ulong()=45264
1001010000011011.to_ulong()=37915
1110100001101111.to_ulong()=59503
1001000011001111.to_ulong()=37071
firstly it is obvious that the bitsets still beyond all reasonable doubt are identical when displayed as binary, but when converted to unsigned long, the identical bitsets return completely different values (completely ruining my program)
Why is this? can it be that the bitsets are unidentical, even though they print as the same? can the error exist within my bitset to and from string converters, despite the bitsets being identical?
edit: not all programs including my conversations has this problem, it only happens when i have modified the bitset after creating it (from a string), in my case in an attempt to encrypt the bitset, which simply can not be cut down to something simple and short, but in my most compressed way of writing it looks like this:
(and that is even without including the defintion of the public key struct and the modular power function)
int main(int argc, char**argv)
{
if (argc != 3)
{
std::cout<<"only 2 arguments allowed: plaintext user"<<std::endl;
return 1;
}
unsigned long k=123456789;//any huge number loaded from an external file
unsigned long m=123456789;//any huge number loaded from an external file
std::vector< std::bitset<16> > data;
std::string datastring=std::string(argv[1]);
string_to_bitset(data,datastring);//string_to_bitset and bitset_to_string also empties string and bitset vector, this is not the cause of the problem
for (std::bitset<16>& C : data)
{
C =std::bitset<16>(modpow(C.to_ulong(),k,m));//repeated squaring to solve C.to_ulong()^k%m
}
//and now the problem happens
for (std::bitset<16>& C : data) std::cout<<C<<".to_ulong()="<<C.to_ullong()<<std::endl;
std::cout<<std::endl;
bitset_to_string(data,datastring);
string_to_bitset(data,datastring);
//bitset_to_string(data,datastring);
for (std::bitset<16>& C : data) std::cout<<C<<".to_ulong()="<<C.to_ullong()<<std::endl;
std::cout<<std::endl;
return 0;
}
I am well aware that you now all are thinking that i am doing the modular power function wrong (which i guarantee that i am not), but what i am doing to make this happen doesn't actually matter, for my question was not: what is wrong in my program; my question was: why don't the identical bitsets (which prints identical binary 1's and 0's) convert to identical unsigned longs.
other edit: i must also point out that the first printet values of unsigned longs are "correct" in that they when used allow me to decrypt the bitset perfectly, whereas the values of unsigned longs printed afterwards is "wrong" in that it produces a completely wrong result.

The "11841744" value is correct in the lower 16 bits, but has some extra set bits above the 16th. This could be a bug in your STL implementation where to_long accesses bits past the 16 it should be using.
Or (from your comment above) you're adding more bits to the bitset than it can hold and you're experiencing Undefined Behavior.

Related

Reading a char array as a float array in C++, without having to copy it using memcpy

I have a question really similar to this:
Building a 32-bit float out of its 4 composite bytes.
Specifically I have an array of unsigned char composed by 8 elements:
unsigned char c[8] = {0b01001000, 0b11100001, 0b00100110, 0b01000001, 0b01111011,0b00010100, 0b10000110, 0b01000000}
This, with a little endianness convention corresponds to two floats, namely { 10.4300f, 4.19000f }.
I know that I could obtain the latter with:
float f[2];
memcpy(&f, &c, sizeof(f))
//f = { 10.4300f, 4.19000f }
But this involves, a copy.
Is there a way to cast the c array inplace, changing its type so that I can avoid copying?
Is there a way to cast the c array inplace
No. However, if the array is sufficiently aligned to hold a float, what you can do after memcpy is to placement-new a copy of that float onto the array.
Optimisers are smart, and typically know that you copied same value back. Sometimes two copies for abstract machine results in zero copies for cpu.
This, with a little endianness convention corresponds
I know that I could obtain the latter with
Note that memcpy will always result in native byte order and thus you only get little endian result on little endian systems. Thus the assumption of the data being interpreted as little endian is not a portable assumption.
If you want to avoid assuming native endianness, you'll need to read bytes in correct order, shift / mask them into an unsigned integer, then memcpy (or bit_cast) that integer into float.

Effectively get the most significant byte of mpz_t struct

I wasn't able to find how gmpxx stores the mpz_t structs under the hood. Thus the only way to get the most significant byte of a number stored as mpz_t is using the mpz_get_str method, but I would expect it to be very slow.
Do you know of a more effective (and simple) way of doing this?
I mean the 'most significant byte' of the number (which is in my case saved as mpz_t) in binary. I.e. for 12345 (10) = 11000000111001 (2) it would be 11000000, no matter gmpxx actually stores it.
Two functions to look at here:
size_t mpz_sizeinbase(mpz_t op, int base): this returns the length in a base, and for base=2, it gives the number of bits.
void mpz_tdiv_r_2exp (mpz_t r, const mpz_t n, mp_bitcnt_t b): this is equivalent to r = n >> b;.
Combined, the operation you are looking for is to bit-shift right exactly sizeinbase-8times:
size_t bit_length = mpz_sizeinbase(number, 2);
mpz_tdiv_r_2exp(last_byte, number, bit_length-8);
As a sidenote, the mpz_t struct is stored in "limbs", which are primitives that are chained together. These limbs can have leading 0's to make editing the number easier for small value changes - so accessing them directly is not recommended.
A limb means the part of a multi-precision number that fits in a single machine word. (We chose this word because a limb of the human body is analogous to a digit, only larger, and containing several digits.) Normally a limb is 32 or 64 bits. The C data type for a limb is mp_limb_t.
~https://gmplib.org/manual/Nomenclature-and-Types.html#Nomenclature-and-Types
You can create a union without padding (see #pragma pack effect) with 2 members: your struct and a byte, then assign a value to your struct, then read the value of the byte. However, I am not sure if it fits to your definition of MSB.

Differences in assignment of integer variable

I just asked this question and it got me thinking if there is any reason
1)why you would assign a int variable using hexidecimal or octal instead of decimal and
2)what are the difference between the different way of assignment
int a=0x28ff1c; // hexideciaml
int a=10; //decimal (the most commonly used way)
int a=012177434; // octal
You may have some constants that are more easily understood when written in hexadecimal.
Bitflags, for example, in hexadecimal are compact and easily (for some values of easily) understood, since there's a direct correspondence 4 binary digits => 1 hex digit - for this reason, in general the hexadecimal representation is useful when you are doing bitwise operations (e.g. masking).
In a similar fashion, in several cases integers may be internally divided in some fields, for example often colors are represented as a 32 bit integer that goes like this: 0xAARRGGBB (or 0xAABBGGRR); also, IP addresses: each piece of IP in the dotted notation is two hexadecimal digits in the "32-bit integer" notation (usually in such cases unsigned integers are used to avoid messing with the sign bit).
In some code I'm working on at the moment, for each pixel in an image I have a single byte to use to store "accessory information"; since I have to store some flags and a small number, I use the least significant 4 bits to store the flags, the 4 most significant ones to store the number. Using hexadecimal notations it's immediate to write the appropriate masks and shifts: byte & 0x0f gives me the 4 LS bits for the flags, (byte & 0xf0)>>4 gives me the 4 MS bits (re-shifted in place).
I've never seen octal used for anything besides IOCCC and UNIX permissions masks (although in the last case they are actually useful, as you probably know if you ever used chmod); probably their inclusion in the language comes from the fact that C was initially developed as the language to write UNIX.
By default, integer literals are of type int, while hexadecimal literals are of type unsigned int or larger if unsigned int isn't large enough to hold the specified value. So, when assigning a hexadecimal literal to an int there's an implicit conversion (although it won't impact the performance, any decent compiler will perform the cast at compile time). Sorry, brainfart. I checked the standard right now, it goes like this:
decimal literals, without the u suffix, are always signed; their type is the smallest that can represent them between int, long int, long long int;
octal and hexadecimal literals without suffix, instead, may also be of unsigned type; their actual type is the smallest one that can represent the value between int, unsigned int, long int, unsigned long int, long long int, unsigned long long int.
(C++11, §2.14.2, ¶2 and Table 6)
The difference may be relevant for overload resolution1, but it's not particularly important when you are just assigning a literal to a variable. Still, keep in mind that you may have valid integer constants that are larger than an int, i.e. assignment to an int will result in signed integer overflow; anyhow, any decent compiler should be able to warn you in these cases.
Let's say that on our platform integers are in 2's complement representation, int is 16 bit wide and long is 32 bit wide; let's say we have an overloaded function like this:
void a(unsigned int i)
{
std::cout<<"unsigned";
}
void a(int i)
{
std::cout<<"signed";
}
Then, calling a(1) and a(0x1) will produce the same result (signed), but a(32768) will print signed and a(0x10000) will print unsigned.
It matters from a readability standpoint - which one you choose expresses your intention.
If you're treating the variable as an integral type, you know, like 2+2=4, you use the decimal representation. It's intuitive and straight-forward.
If you're using it as a bitmask, you can use hexa, octal or even binary. For example, you'll know
int a = 0xFF;
will have the last 8 bits set to 1. You'll know that
int a = 0xF0;
is (...)11110000, but you couldn't directly say the same thing about
int a = 240;
although they are equivalent. It just depends on what you use the numbers for.
well the truth is it doesn't matter if you want it on decimal, octal or hexadecimal its just a representation and for your information, numbers in computers are stored in binary(so they are just 0's and 1's) which you can use also to represent a number. so its just a matter of representation and readability.
NOTE:
Well in some of C++ debuggers(in my experience) I assigned a number as a decimal representation but in my debugger it is shown as hexadecimal.
It's similar to the assignment of and integer this way:
int a = int(5);
int b(6);
int c = 3;
it's all about preference, and when it breaks down you're just doing the same thing. Some might choose octal or hex to go along with their program that manipulates that type of data.

Converting string to int (C++)

I looked everywhere and can't find an answer to this specific question :(
I have a string date, which contains the date with all the special characters stripped away. (i.e : yyyymmddhhmm or 201212031204).
I'm trying to convert this string into an int to be able to sort them later. I tried atoi, did not work because the value is too high for the function. I tried streams, but it always returns -858993460 and I suspect this is because the string is too large too. I tried atol and atoll and they still dont give the right answer.
I'd rather not use boost since this is for a homework, I dont think i'd be allowed.
Am I out of options to convert a large string to an int ?
Thank you!
What i'd like to be able to do :
int dateToInt(string date)
{
date = date.substr(6,4) + date.substr(3,2) + date.substr(0,2) + date.substr(11,2) + date.substr(14,2);
int d;
d = atoi(date.c_str());
return d;
}
You get negative numbers because 201212031204 is too large to fit int. Consider using long longs
BTW, You may sort strings as well.
You're on the right track that the value is too large, but it's not just for those functions. It's too large for an int in general. ints only hold up to 32 bits, or a maximum value of 2147483647 (4294967295 if unsigned). A long long is guaranteed to be large enough for the numbers you're using. If you happen to be on a 64-bit system, a long will be too.
Now, if you use one of these larger integers, a stream should convert properly. Or, if you want to use a function to do it, have a look at atoll for a long long or atol for a long. (Although for better error checking, you should really consider strtoll or strtol.)
Completely alternatively, you could also use a time_t. They're integer types under the hood, so you can compare and sort them. And there's some nice functions for them in <ctime> (have a look at http://www.cplusplus.com/reference/ctime/).
typedef long long S64;
S64 dateToInt(char * s) {
S64 retval = 0;
while (*s) {
retval = retval * 10 + (*s - '0');
++s;
}
return retval;
}
Note that as has been stated, the numbers you're working with will not fit into 32 bits.

A warning - comparison between signed and unsigned integer expressions

I am currently working through Accelerated C++ and have come across an issue in exercise 2-3.
A quick overview of the program - the program basically takes a name, then displays a greeting within a frame of asterisks - i.e. Hello ! surrounded framed by *'s.
The exercise - In the example program, the authors use const int to determine the padding (blank spaces) between the greeting and the asterisks. They then ask the reader, as part of the exercise, to ask the user for input as to how big they want the padding to be.
All this seems easy enough, I go ahead ask the user for two integers (int) and store them and change the program to use these integers, removing the ones used by the author, when compiling though I get the following warning;
Exercise2-3.cpp:46: warning: comparison between signed and unsigned integer expressions
After some research it appears to be because the code attempts to compare one of the above integers (int) to a string::size_type, which is fine. But I was wondering - does this mean I should change one of the integers to unsigned int? Is it important to explicitly state whether my integers are signed or unsigned?
cout << "Please enter the size of the frame between top and bottom you would like ";
int padtopbottom;
cin >> padtopbottom;
cout << "Please enter size of the frame from each side you would like: ";
unsigned int padsides;
cin >> padsides;
string::size_type c = 0; // definition of c in the program
if (r == padtopbottom + 1 && c == padsides + 1) { // where the error occurs
Above are the relevant bits of code, the c is of type string::size_type because we do not know how long the greeting might be - but why do I get this problem now, when the author's code didn't get the problem when using const int? In addition - to anyone who may have completed Accelerated C++ - will this be explained later in the book?
I am on Linux Mint using g++ via Geany, if that helps or makes a difference (as I read that it could when determining what string::size_type is).
It is usually a good idea to declare variables as unsigned or size_t if they will be compared to sizes, to avoid this issue. Whenever possible, use the exact type you will be comparing against (for example, use std::string::size_type when comparing with a std::string's length).
Compilers give warnings about comparing signed and unsigned types because the ranges of signed and unsigned ints are different, and when they are compared to one another, the results can be surprising. If you have to make such a comparison, you should explicitly convert one of the values to a type compatible with the other, perhaps after checking to ensure that the conversion is valid. For example:
unsigned u = GetSomeUnsignedValue();
int i = GetSomeSignedValue();
if (i >= 0)
{
// i is nonnegative, so it is safe to cast to unsigned value
if ((unsigned)i >= u)
iIsGreaterThanOrEqualToU();
else
iIsLessThanU();
}
else
{
iIsNegative();
}
I had the exact same problem yesterday working through problem 2-3 in Accelerated C++. The key is to change all variables you will be comparing (using Boolean operators) to compatible types. In this case, that means string::size_type (or unsigned int, but since this example is using the former, I will just stick with that even though the two are technically compatible).
Notice that in their original code they did exactly this for the c counter (page 30 in Section 2.5 of the book), as you rightly pointed out.
What makes this example more complicated is that the different padding variables (padsides and padtopbottom), as well as all counters, must also be changed to string::size_type.
Getting to your example, the code that you posted would end up looking like this:
cout << "Please enter the size of the frame between top and bottom";
string::size_type padtopbottom;
cin >> padtopbottom;
cout << "Please enter size of the frame from each side you would like: ";
string::size_type padsides;
cin >> padsides;
string::size_type c = 0; // definition of c in the program
if (r == padtopbottom + 1 && c == padsides + 1) { // where the error no longer occurs
Notice that in the previous conditional, you would get the error if you didn't initialize variable r as a string::size_type in the for loop. So you need to initialize the for loop using something like:
for (string::size_type r=0; r!=rows; ++r) //If r and rows are string::size_type, no error!
So, basically, once you introduce a string::size_type variable into the mix, any time you want to perform a boolean operation on that item, all operands must have a compatible type for it to compile without warnings.
The important difference between signed and unsigned ints
is the interpretation of the last bit. The last bit
in signed types represent the sign of the number, meaning:
e.g:
0001 is 1 signed and unsigned
1001 is -1 signed and 9 unsigned
(I avoided the whole complement issue for clarity of explanation!
This is not exactly how ints are represented in memory!)
You can imagine that it makes a difference to know if you compare
with -1 or with +9. In many cases, programmers are just too lazy
to declare counting ints as unsigned (bloating the for loop head f.i.)
It is usually not an issue because with ints you have to count to 2^31
until your sign bit bites you. That's why it is only a warning.
Because we are too lazy to write 'unsigned' instead of 'int'.
At the extreme ranges, an unsigned int can become larger than an int.
Therefore, the compiler generates a warning. If you are sure that this is not a problem, feel free to cast the types to the same type so the warning disappears (use C++ cast so that they are easy to spot).
Alternatively, make the variables the same type to stop the compiler from complaining.
I mean, is it possible to have a negative padding? If so then keep it as an int. Otherwise you should probably use unsigned int and let the stream catch the situations where the user types in a negative number.
The primary issue is that underlying hardware, the CPU, only has instructions to compare two signed values or compare two unsigned values. If you pass the unsigned comparison instruction a signed, negative value, it will treat it as a large positive number. So, -1, the bit pattern with all bits on (twos complement), becomes the maximum unsigned value for the same number of bits.
8-bits: -1 signed is the same bits as 255 unsigned
16-bits: -1 signed is the same bits as 65535 unsigned
etc.
So, if you have the following code:
int fd;
fd = open( .... );
int cnt;
SomeType buf;
cnt = read( fd, &buf, sizeof(buf) );
if( cnt < sizeof(buf) ) {
perror("read error");
}
you will find that if the read(2) call fails due to the file descriptor becoming invalid (or some other error), that cnt will be set to -1. When comparing to sizeof(buf), an unsigned value, the if() statement will be false because 0xffffffff is not less than sizeof() some (reasonable, not concocted to be max size) data structure.
Thus, you have to write the above if, to remove the signed/unsigned warning as:
if( cnt < 0 || (size_t)cnt < sizeof(buf) ) {
perror("read error");
}
This just speaks loudly to the problems.
1. Introduction of size_t and other datatypes was crafted to mostly work,
not engineered, with language changes, to be explicitly robust and
fool proof.
2. Overall, C/C++ data types should just be signed, as Java correctly
implemented.
If you have values so large that you can't find a signed value type that works, you are using too small of a processor or too large of a magnitude of values in your language of choice. If, like with money, every digit counts, there are systems to use in most languages which provide you infinite digits of precision. C/C++ just doesn't do this well, and you have to be very explicit about everything around types as mentioned in many of the other answers here.
or use this header library and write:
// |notEqaul|less|lessEqual|greater|greaterEqual
if(sweet::equal(valueA,valueB))
and don't care about signed/unsigned or different sizes