C++ - Operating on the bits in a floating point value

C++ - Operating on the bits in a floating point value - c++

I'm trying to write a function that will reinterpret a set of bytes as a float. I have viewed a Stackoverflow question which lead me to try reinterpret_cast<>() on an array of characters, and I started experimenting with splitting a float into chars and then reassembling it again, but that only gives me seemingly random numbers rather than what I think the value should be, as the output is different each time. A few different examples are:
1.58661e-038
3.63242e-038
2.53418e-038
The float variable should contain the value 5.2.
Compiling the code:
float f = 5.2;
unsigned char* bytes = reinterpret_cast<unsigned char*>(&f);
float f1 = reinterpret_cast<float&>(bytes);
std::cout << f1 << std::endl;
gave me
1.75384e-038
And then of course, a different number each time the program is run. Interestingly, however, if I put the code in a loop and execute it several times in a single run, the output stays consistent. This leads me to think it might be a memory location, but if so, I'm not sure how to access the real value of the variable - the dereference operator doesn't work, because it's not a pointer.
So my question is - How can I split a variable of type float (and later on, a double) into individual bytes (allowing me to modify the bits), and then reassemble it.
Any help would be greatly appreciated.

bytes is a pointer.
Change
float f1 = reinterpret_cast<float&>(bytes);
to
float f1 = *reinterpret_cast<float*>(bytes);
// Cast to a different pointer... ^
// ^ ...and dereference that pointer.

Related

C++/Address Space: 2 Bytes per adress?

I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards

With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.

While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...

This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.

Casting/dereferencing char pointers to a double array

Is there anything wrong with the casting a double pointer to a char pointer? Goal in the following code is to change the 1 element in three different ways.
double vec1[100];
double *vp = vec1;
char *yp = (char*) vp;
vp++;
vec1[1] = 19.0;
*vp = 12.0;
*((double*) (yp + (1*sizeof (vec1[0])))) = 34.0;

Casts of this type fall into the category of "OK if you know what you're doing but dangerous if you don't".
For example, in this case you already know the pointer value of "yp" (it was pointing to a double) so it is technically safe to increase its value by the size of a double and re-cast back to a double*.
A counter-example: suppose you didn't know where the char* came from...say, it was given to you as a function parameter. Now, your cast would be a big problem: since char* is technically 1-byte-aligned and a double is usually 8-byte-aligned, you can't be sure if you were given an 8-byte-aligned address. If it's aligned, your arithmetic would produce a valid double*; if not, it would crash when dereferenced.
This is just one example of how casts can go wrong. What you're doing (at first glance) looks like it will work but in general you really have to pay attention when you cast things.

With newer INTEL processors the main problem you can run into is alignment. Say you were to write something like this:
*((double*) (yp + 4)) = 34.0;
Then you are likely to have a runtime error because a double should be aligned on 8 bytes. This was also true on processors such as 68k, or MIPS.
This is similar to having a structure and doing casts on that structure. You are not unlikely to break things.
In most cases, if you can avoid such, your code will be a lot stronger. Personally, I do not even use such casts when reading a file. Instead, I get the data from the file and put it in a structure as required. Say I read 4 bytes in a buffer to convert to an integer, I'd write something like this:
unsigned char buf[4];
...
fread(buf, 1, 4, f);
my_struct.integer = buf[0] | (buf[1] << 8) | (buf[2] << 16) | (buf[3] << 24);
Now I did not do an ugly cast and I could control the endianess of the integer in the file whatever the endian of the processor you are running with.

C++ union to represent data memory vs C scalar variable type

Today I've a weird question.
The Code(C++)
#include <iostream>
union name
{
int num;
float num2;
}oblong;
int main(void)
{
oblong.num2 = 27.881;
std::cout << oblong.num << std::endl;
return 0;
}
The Code(C)
#include <stdio.h>
int main(void)
{
float num = 27.881;
printf("%d\n" , num);
return 0;
}
The Question
As we know, C++ unions can hold more than one type of data element but only one type at a time. So basically the name oblong will only reserve one portion of memory which is 32-bit (because the biggest type in the union is 32-bit, int and float) and this portion could either hold a integer or float.
So I just assign a value of 27.881 into oblong.num2 (as you can see on the above code). But out of curiosity, I access the memory using oblong.num which is pointing to the same memory location.
As expected, it gave me a value which is not 27 because the way float and integer represented inside a memory is different, that's why when I use oblong.num to access the memory portion it'll treat that portion of memory value as integer and interpret it using integer representation way.
I know this phenomena also will happen in C , that's why I initialize a float type variable with a value and later on read it using the %d.So I just try it out by using the same value 27.881 which you can see above. But when I run it, something weird happens, that is the value of the one I get in C is different from C++.
Why does this happen? From what I know the two values I get from the two codes in the end are not garbage values, but why do I get different values? I also use the sizeof to verified both C and C++ integer and float size and both are 32-bit. So memory size isn't the one that causes this to happen, so what prompt this difference in values?

First of all, having the wrong printf() format string is undefined behavior. Now that said, here is what is actually happening in your case:
In vararg functions such as printf(), integers smaller than int are promoted to int and floats smaller than double are promoted to double.
The result is that your 27.881 is being converted to an 8-byte double as it is passed into printf(). Therefore, the binary representation is no longer the same as a float.
Format string %d expects a 4-byte integer. So in effect, you will be printing the lower 4-bytes of the double-precision representation of 27.881. (assuming little-endian)
*Actually (assuming strict-FP), you are seeing the bottom 4-bytes of 27.881 after it is cast to float, and then promoted to double.

In both cases you are encountering undefined behaviour. Your implementation just happens to do something strange.

Integer to Character conversion in C

Lets us consider this snippet:
int s;
scanf("%c",&s);
Here I have used int, and not char, for variable s, now for using s for character conversion safely I have to make it char again because when scanf reads a character it only overwrites one byte of the variable it is assigning it to, and not all four that int has.
For conversion I could use s = (char)s; as the next line, but is it possible to implement the same by subtracting something from s ?

What you've done is technically undefined behaviour. The %c format calls for a char*, you've passed it an int* which will (roughly speaking) be reinterpreted. Even assuming that the pointer value is still good after reinterpreting, storing an arbitrary character to the first byte of an int and then reading it back as int is undefined behaviour. Even if it were defined, reading an int when 3 bytes of it are uninitialized, is undefined behaviour.
In practice it probably does something sensible on your machine, and you just get garbage in the top 3 bytes (assuming little-endian).
Writing s = (char)s converts the value from int to char and then back to int again. This is implementation-defined behaviour: converting an out-of-range value to a signed type. On different implementations it might clean up the top 3 bytes, it might return some other result, or it might raise a signal.
The proper way to use scanf is:
char c;
scanf("%c", &c);
And then either int s = c; or int s = (unsigned char)c;, according to whether you want negative-valued characters to result in a negative integer, or a positive integer (up to 255, assuming 8-bit char).
I can't think of any good reason for using scanf improperly. There are good reasons for not using scanf at all, though:
int s = getchar();

Are you trying to convert a digit to its decimal value? If so, then
char c = '8';
int n = c - '0';
n should 8 at this point.

That's probably not a good idea; GCC gives me a warning for that code:
main.c:10: warning: format ‘%c’ expects type ‘char *’, but
argument 2 has type ‘int *’
In this case you're ok since you're passing a pointer to more space than you need (for most systems), but what if you did it the other way around? Could be crash city. If you really want to do something like what you have there, just do the typecast or mask it - the mask will be endian-dependent.

As written this won't work reliably . The argument, &s, to scanf is a pointer to int and scanf is expecting a pointer to char. The two data type (int and char) have different sizes (at least on most architectures) so the data may get put in the wrong spot in memeory, and the other part of s may not get properly cleared.
The answers suggesting manipulation of the result after using a pointer to int rely on unspecified behavior (i.e. that scanf will put the character value it has in the least significant byte of the int you're pointing to), and are not safe.

Not but you could use the following:
s = s & 0xFF
That will blank out all of the data except the first byte. But in general all these ideas (and the ones above) are bad ideas, since not all systems store the lowest part of the integer in memory first. So if you ever have to port this code to a big endian system, you'll be screwed.
True, you may never have to port the code, but why write unportable code to begin with?
See this for more info:
http://en.wikipedia.org/wiki/Endianness

Explicit Address Manipulation in C++

Please check out the following func and its output
void main()
{
Distance d1;
d1.setFeet(256);
d1.setInches(2.2);
char *p=(char *)&d1;
*p=1;
cout<< d1.getFeet()<< " "<< d1.getInches()<< endl;
}
The class Distance gets its values thru setFeet and setInches, passing int and float arguments respectively. It displays the values through through the getFeet and getInches methods.
However, the output of this function is 257 2.2. Why am I getting these values?

This is a really bad idea:
char *p=(char *)&d1;
*p=1;
Your code should never make assumptions about the internal structure of the class. If your class had any virtual functions, for example, that code would cause a crash when you called them.
I can only conclude that your Distance class looks like this:
class Distance {
short feet;
float inches;
public:
void setFeet(...
};
When you setFeet(256), it sets the high byte (MSB) to 1 (256 = 1 * 2^8) and the low byte (LSB) to 0. When you assign the value 1 to the char at the address of the Distance object, you're forcing the first byte of the short representing feet to 1. On a little-endian machine, the low byte is at the lower address, so you end up with a short with both bytes set to 1, which is 1 * 2^8 + 1 = 257.
On a big-endian machine, you would still have the value 256, but it would be purely coincidental because you happen to be forcing a value of 1 on a byte that would already be 1.
However, because you're using undefined behavior, depending on the compiler and the compile options, you might end up with literally anything. A famous expression from comp.lang.c is that such undefined behavior could "cause demons to fly out of your nose".

You are illegally munging memory via the 'p' pointer.
The output of the program is undefined; as you are directly manipulating memory that is owned by an object through a pointer of another type without regard to the underlying types.
Your code is somewhat like this:
struct Dist
{
int x;
float y;
};
union Plop
{
Dist s; // Your class
char p; // The type you are pretending to use via 'p'
};
int main()
{
Plop p;
p.s.x = 5; // Set up the Dist structure.
p.s.y = 2.3;
p.p = 1; // The value of s is now undefined.
// As you have scribbled over the memory used by s.
}

The behaviour based on the code given is going to be very unpredictable. Setting the first byte of d1's data could potentially clobber a vptr, compiler-specific memory, the sign/exponent of a floating point value, or LSB or MSB of an integer, all depending on the definition of Distance.

I assume you think doing *p = 1 will set one of the internal data members (presumably 'feet') in the Distance object. It may work, but (afaik) you've got no guarantees that the feet member is at the first address of the object, is of the correct size (unless its type is also char) or that it's aligned correctly.
If you want to do that why not make the 'feet' member public and do:
d1.feet = 1;

Another thing, to comment on the program: don't use void main(). It isn't standard, and it offers you no benefits. It will make people not take you as seriously when asking C or C++ questions, and could cause programs to not compile, or not work properly.
The C++ Standard, in 3.6.1 paragraph 2, says that main() always returns int, although the implementation may offer variations with different arguments.
This would be a good time to break the habit. If you're learning from a book that uses void main(), the book is unreliable. See about getting another book, if only for reference.

It looks like you are new to programming and could use some help with basic concepts.
It's good that you are looking for that, but SO may not be the right place to get it.
Good luck.

The Definition of class is
class Distance{
int feet;
float inches;
public:
//...functions
};
now the int feet would be 00000001 00000000 (2 bytes) where the zeros would occupy lower address in Little Endian so the char *p will be 00000000.. when u make *p=1, the lower byte becomes 00000001 so the int variable now is 00000001 00000001 which is exactly 257!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ - Operating on the bits in a floating point value - c++

bytes is a pointer. Change float f1 = reinterpret_cast<float&>(bytes); to float f1 = reinterpret_cast<float>(bytes); // Cast to a different pointer... ^ // ^ ...and dereference that pointer.

Related

C++/Address Space: 2 Bytes per adress?

Casting/dereferencing char pointers to a double array

C++ union to represent data memory vs C scalar variable type

Integer to Character conversion in C

Explicit Address Manipulation in C++

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ - Operating on the bits in a floating point value - c++

bytes is a pointer. Change float f1 = reinterpret_cast<float&>(bytes); to float f1 = *reinterpret_cast<float*>(bytes); // Cast to a different pointer... ^ // ^ ...and dereference that pointer.

Related

C++/Address Space: 2 Bytes per adress?

Casting/dereferencing char pointers to a double array

C++ union to represent data memory vs C scalar variable type

Integer to Character conversion in C

Explicit Address Manipulation in C++

Categories

Resources

bytes is a pointer. Change float f1 = reinterpret_cast<float&>(bytes); to float f1 = reinterpret_cast<float>(bytes); // Cast to a different pointer... ^ // ^ ...and dereference that pointer.