Explicit Address Manipulation in C++

Explicit Address Manipulation in C++ - c++

Please check out the following func and its output
void main()
{
Distance d1;
d1.setFeet(256);
d1.setInches(2.2);
char *p=(char *)&d1;
*p=1;
cout<< d1.getFeet()<< " "<< d1.getInches()<< endl;
}
The class Distance gets its values thru setFeet and setInches, passing int and float arguments respectively. It displays the values through through the getFeet and getInches methods.
However, the output of this function is 257 2.2. Why am I getting these values?

This is a really bad idea:
char *p=(char *)&d1;
*p=1;
Your code should never make assumptions about the internal structure of the class. If your class had any virtual functions, for example, that code would cause a crash when you called them.
I can only conclude that your Distance class looks like this:
class Distance {
short feet;
float inches;
public:
void setFeet(...
};
When you setFeet(256), it sets the high byte (MSB) to 1 (256 = 1 * 2^8) and the low byte (LSB) to 0. When you assign the value 1 to the char at the address of the Distance object, you're forcing the first byte of the short representing feet to 1. On a little-endian machine, the low byte is at the lower address, so you end up with a short with both bytes set to 1, which is 1 * 2^8 + 1 = 257.
On a big-endian machine, you would still have the value 256, but it would be purely coincidental because you happen to be forcing a value of 1 on a byte that would already be 1.
However, because you're using undefined behavior, depending on the compiler and the compile options, you might end up with literally anything. A famous expression from comp.lang.c is that such undefined behavior could "cause demons to fly out of your nose".

You are illegally munging memory via the 'p' pointer.
The output of the program is undefined; as you are directly manipulating memory that is owned by an object through a pointer of another type without regard to the underlying types.
Your code is somewhat like this:
struct Dist
{
int x;
float y;
};
union Plop
{
Dist s; // Your class
char p; // The type you are pretending to use via 'p'
};
int main()
{
Plop p;
p.s.x = 5; // Set up the Dist structure.
p.s.y = 2.3;
p.p = 1; // The value of s is now undefined.
// As you have scribbled over the memory used by s.
}

The behaviour based on the code given is going to be very unpredictable. Setting the first byte of d1's data could potentially clobber a vptr, compiler-specific memory, the sign/exponent of a floating point value, or LSB or MSB of an integer, all depending on the definition of Distance.

I assume you think doing *p = 1 will set one of the internal data members (presumably 'feet') in the Distance object. It may work, but (afaik) you've got no guarantees that the feet member is at the first address of the object, is of the correct size (unless its type is also char) or that it's aligned correctly.
If you want to do that why not make the 'feet' member public and do:
d1.feet = 1;

Another thing, to comment on the program: don't use void main(). It isn't standard, and it offers you no benefits. It will make people not take you as seriously when asking C or C++ questions, and could cause programs to not compile, or not work properly.
The C++ Standard, in 3.6.1 paragraph 2, says that main() always returns int, although the implementation may offer variations with different arguments.
This would be a good time to break the habit. If you're learning from a book that uses void main(), the book is unreliable. See about getting another book, if only for reference.

It looks like you are new to programming and could use some help with basic concepts.
It's good that you are looking for that, but SO may not be the right place to get it.
Good luck.

The Definition of class is
class Distance{
int feet;
float inches;
public:
//...functions
};
now the int feet would be 00000001 00000000 (2 bytes) where the zeros would occupy lower address in Little Endian so the char *p will be 00000000.. when u make *p=1, the lower byte becomes 00000001 so the int variable now is 00000001 00000001 which is exactly 257!

Related

ULP comparison code

The following code snippet is scattered all over the web and seems to be used in multiple different projects with very little changes:
union Float_t {
Float_t(float num = 0.0f) : f(num) {}
// Portable extraction of components.
bool Negative() const { return (i >> 31) != 0; }
int RawMantissa() const { return i & ((1 << 23) - 1); }
int RawExponent() const { return (i >> 23) & 0xFF; }
int i;
float f;
};
inline bool AlmostEqualUlpsAndAbs(float A, float B, float maxDiff, int maxUlpsDiff)
{
// Check if the numbers are really close -- needed
// when comparing numbers near zero.
float absDiff = std::fabs(A - B);
if (absDiff <= maxDiff)
return true;
Float_t uA(A);
Float_t uB(B);
// Different signs means they do not match.
if (uA.Negative() != uB.Negative())
return false;
// Find the difference in ULPs.
return (std::abs(uA.i - uB.i) <= maxUlpsDiff);
}
See, for example here or here or here.
However, I don't understand what is going on here. To my (maybe naive) understanding, the floating-point member variable f is initialized in the constructor, but the integer member i is not.
I'm not terribly familiar with the binary operators that are used here, but I fail to understand how accesses of uA.i and uB.i produce anything but random numbers, given that no line in the code actually connects the values of f and i in any meaningful way.
If somebody could enlighten my on why (and how) exactly this code produces the desired result, I would be very delighted!

A lot of Undefined Behaviour are being exploited here. First assumption is that fields of union can be accessed in place of each other, which is, in itself, UB. Furthermore, coder assumes that: sizeof(int) == sizeof(float), that floats have a given length of mantissa and exponent, that all union members are aligned to zero, that the binary representation of float coincides with the binary representation with int in a very specific way. In short, this will work as long as you're on x86, have specific int and float types and you say a prayer at every sunrise and sunset.
What you probably didn't note is that this is a union, therefore int i and float f is usually aligned in a specific manner in a common memory array by most compilers. This is, in general, still UB and you can't even safely assume that the same physical bits of memory will be used without restricting yourself to a specific compiler and a specific architecture. All that's guaranteed is, the address of both members will be the same (but there might be alignment and/or typedness issues). Assuming that your compiler uses the same physical bits (which is by no means guaranteed by standard) and they both start at offset 0 and have the same size, then i will represent the binary storage format of f.. as long as nothing changes in your architecture. Word of advice? Do not use it until you don't have to. Stick to floating point operations for AlmostEquals(), you can implement it like that. It's the very final pass of optimization when we consider these specialities and we usually do it in a separate branch, you shouldn't plan your code around it.

Faster execution in c++ structure creation

I am working on c++, I have a structure with all most 100 float variables within it, and i am going to initialize them with value 0 in no argument constructor ,so which way it is faster?
Type 1:
struct info
{
//no argument constructor
info();
float x1;
float x2;
.
.
.
float x100;
}Info;
info::info()
{
float x1 = 0;
float x2 =0;
.
.
.
.
.
float x100 = 0;
}
//creation
Info* info1 = new Info();
Type2 :
typedef struct info
{
float x1;
float x2;
.
.
.
.
float x100;
}Info;
Info* infoIns = new Info;
memset(infoIns,0,sizeof(Info));

One hundred variables called x1 .. x100 just CALLS out to be an array (or if the number varies, perhaps using a vector)
In which case std::fill(x, x+100, 0.0f) would probably beat all of the choices above.
A better solution is probably to just initialize the whole object:
Info* infoIns = new Info();
or
Info infoIns = {}; // C++11
or
Info infoIns = Info();
Whenever it's a question of performance, the ONLY answer that applies is "what you can measure". I can sit here and explain exactly why in my experience, on my machine (or my machines) method A is faster than method B or method C. But if you are using a different compiler, or have a different processor, that may not apply at all, because the compiler you use is doing something different.
Regardless of which is "faster", you should use a constructor to set the values to zero. If you want to use memset, then by all means do so, but inside the constructor. That way, you won't find some place in the code where you FORGOT to set one of your structures to zero before trying to use it. Bear in mind that setting a struct/class to zero using memset is very dangerous. If the class or struct has virtual member functions (or contain some object that does), this will most likely overwrite the VPTR, which describes the virtual functions. That is a bad thing. So if you want to use memset, use it with x1 and a size of 100 *sizeof(float) (but using an array is probably a better choice again).

In the absence of measurement, clearing will likely be faster but much much nastier, and, if the code you've written is what you actually implemented, the clearing will actually work a lot better. The 2nd way you have is not particularly good style in any case, and you should use member initialisation.
I'd be more inclined to use a std::array (C++11) or a std::vector or even a plain array in any case.

2nd version, apart from already mentioned flaws, actually doesn't guarantee that float values will be 0.0 after memset.
3.9.1 Fundamental types [basic.fundamental]
8
...
The value representation of floating-point types is implementation-defined.
...
Thus you can end up with non zero values in your floating points in case of weird float representation.

Data member offset in C++

"Inside the C++ Object Model" says that the offset of a data member in a class is always 1 more than the actual offset in order to distinguish between the pointer to 0 and the pointer to the first data member, here is the example:
class Point3d {
public:
virtual ~Point3d();
public:
static Point3d origin;
float x, y, z;
};
//to be used after, ignore it for the first question
int main(void) {
/*cout << "&Point3d::x = " << &Point3d::x << endl;
cout << "&Point3d::y = " << &Point3d::y << endl;
cout << "&Point3d::z = " << &Point3d::z << endl;*/
printf("&Point3d::x = %p\n", &Point3d::x);
printf("&Point3d::y = %p\n", &Point3d::y);
printf("&Point3d::z = %p\n", &Point3d::z);
getchar();
}
So in order to distinguish the two pointers below, the offset of a data member is always 1 more.
float Point3d::*p1 = 0;
float Point3d::*p2 = &Point3d::x;
The main function above is attempt to get the offset of the members to verify this argument, which is supposed to output: 5, 9, 13(Consider the vptr of 4bytes at the beginning). In MS Visual Studio 2012 however, the output is:
&Point3d::x = 00000004
&Point3d::y = 00000008
&Point3d::z = 0000000C
Question: So is MS C++ compiler did some optimization or something to prevent this mechanism?

tl;dr
Inside the C++ Object Model is a very old book, and most of its contents are implementation details of a particular compiler anyway. Don't worry about comparing your compiler to some ancient compiler.
Full version
An answer to the question linked to in a comment on this question addresses this quite well.
The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.
[...]
Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.
And, if that book you're taking the excerpt from is really Inside the C++ Object Model, it's getting a little long in the tooth.
The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care) dates it quite a bit.
[...]
The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.

The language doesn't specify how member-pointers are represented, so anything you read in a book will just be an example of how they might be represented.
In this case, as you say, it sounds like the vptr occupies the first four bytes of the object; again, this is not something specified by the language. If that is the case, no accessible members would have an offset of zero, so there's no need to adjust the offsets to avoid zero; a member-pointer could simply be represented by the member's offset, with zero reserved for "null". It sounds like that is what your compiler does.
You may find that the offsets for non-polymorphic types are adjusted as you describe; or you may find that the representation of "null" is not zero. Either would be valid.

class Point3d {
public:
virtual ~Point3d();
public:
static Point3d origin;
float x, y, z;
};
Since your class contains a virtual destructor, and (most of) the compiler(s) typically puts a pointer to the virtual function table as the first element in the object, it makes sense that the first of your data is at offset 4 (I'm guessing your compiler is a 32-bit compiler).
Note however that the C++ standard does not stipulate how data members should be stored inside the class, and even less how much space, if any, the virtual function table should take up.
[And yes, it's invalid (undefined behaviour) to take the address of a element that is not to a "real" member object, but I don't think this is causing an issue in this particular example - it may with a different compiler or on a different processor architecture, etc]

Unless you specify a different alignment, your expectation of the offset bing 5, ... would be wwong anyway. Normaly the adresses of bigger elements than char are usually aligned on even adresses and I guess even to the next 4-byte boundary. The reason is efficiency of accessing the memory in the CPU.
On some architectures, accessing an odd address could cause an exception (i.e. Motorola 68000), depending on the member, or at least a performance slowdown.

While it's true that the a null pointer of type "pointer to member of a given type" must be different from any non-null value of that type, offsetting non-null pointers by one is not the only way that a compiler can ensure this. For example, my compiler uses a non-zero representation of null pointer-to-members.
namespace {
struct a {
int x, y;
};
}
#include <iostream>
int main() {
int a::*p = &a::x, a::*q = &a::y, a::*r = nullptr;
std::cout << "sizeof(int a::*) = " << sizeof(int a::*)
<< ", sizeof(unsigned long) = " << sizeof(long);
std::cout << "\n&a::x = " << *reinterpret_cast<long*>(&p)
<< "\n&a::y = " << *reinterpret_cast<long*>(&q)
<< "\nnullptr = " << *reinterpret_cast<long*>(&r)
<< '\n';
}
Produces the following output:
sizeof(int a::*) = 8, sizeof(unsigned long) = 8
&a::x = 0
&a::y = 4
nullptr = -1
Your compiler is probably doing something similar, if not identical. This scheme is probably more efficient for most 'normal' use cases for the implementation because it won't have to do an extra "subtract 1" every time you use a non-null pointer-to-member.

That book (available at this link) should make it much clearer that it is just describing a particular implementation of a C++ compiler. Details like the one you mention are not part of the C++ language specification -- it's just how Stanley B. Lippman and his colleagues decided to implement a particular feature. Other compilers are free to do things a different way.

short pointer to a float

i run this code in c++:
#include <iostream>
using namespace std;
int main()
{
float f = 7.0;
short s = *(short *)&f;
cout << sizeof(float) << endl
<< sizeof(short) << endl
<< s << endl;
return 0;
}
i get the following out pot:
4
2
0
but, in a lecture given in Stanford university, Professor Jerry Cain says he is sure the out pot well not be 0.
the lecture is can be fond here. he says that around the 48 minute.
is he wrong, or that some standard change since? or is there a difference between platforms?
I'm using g++ to compile my code.
EDIT: in the next lecture he does mention "big endian" and "small endian" and says that they well affect the result.

static void bitPrint(float f)
{
assert(sizeof(int) == sizeof(float));
int *data = reinterpret_cast<int*>(&f);
for (int i = 0; i < sizeof(int) * 8; ++i)
{
int bit = (1 << i) & *data;
if (bit) bit = 1;
cout << bit;
}
cout << endl;
}
int main()
{
float f = 7.0;
bitPrint(f);
return 0;
}
This program prints 00000000000000000000011100000010
Since the sizeof(short) == 2 on your platform you get the first 2 bytes which are both zeros
Note that since size of types and possibly float implementation (not sure about this) are implementation defined different output can be seen on different platforms.

Well, let's see. First you write a float into the memory. It occupies 4 bytes, and it's value is 7. A float in the memory looks something like "sign bit -> exponent bits -> mantissa bits". I'm not sure how many bits are there for each part exactly, probably that depends on your platform.
Since the float's value is 7, it only occupies some of the least-significant bits on the right (I assume big-endian).
Your short pointer points to the beginning of the float, which means to the most significant bit. Since the value is greater than 0, the sign bit is zero. Since the float value is far on the right, we can say that those two most significant bytes are filled with zeros.
Now, provided that a size of short is 2, which means we will only take two bytes out of float's 4 bytes, we get our 0.
I believe though, that this result is rather UB and can differ on different platforms, compilers, etc.

Accessing data through a pointer to a different type than it was stored as gives (except in a few special cases) undefined behavour.
Firstly it's platform dependent how the data it stored so different systems may well give different values, and secondly the compiler might well generate code that doesn't even see the value you'd expect as it's allowed to do anything it likes when you do this (It's undefined behavour due to the strict aliases rules).
Having said that there are probably reasons why the number you are seeing is valid, but you can't rely on it unless you specifically know your platform will do what you expect, it's not guarenteed by the standard.

He's "pretty" sure it's not zero, he says that explicitly.
However, given that the representation of a short can be big-endian or little-endian, I wouldn't be so certain. In any case, this is a throwaway line at the end of a fifty-minute lecture so we can forgive him a little. It may be he came back in the next lecture with a clarification.
You would need to examine the underlying bits at (at least) a byte-by-byte level to understand what's going on.

confusion in union concept

#include<stdio.h>
union node {
int i;
char c[2];
};
main() {
union node n;
n.c[0] = 0;
n.c[1] = 2;
printf("%d\n", n.i);
return 0;
}
I think it gives 512 output becouse c[0] value stores in first byte and c[1] value stores in second byte, but gives 1965097472. Why ?.
I compiled this program in codeblocks in windows.

Your union allocates four bytes, starting off as:
[????] [????] [????] [????]
You set the least two significant bytes:
[????] [????] [0x02] [0x00]
You then print out all four bytes as an integer. You're not going to get 512, necessarily, because anything can be in those most significant two bytes. In this case, you had:
[0x75] [0x21] [0x02] [0x00]

Because undefined behavior. Accessing an union member that wasn't set does that, simple as that. It can do anything, print anything, and even crash.

Undefined behavior is, well... undefined.
We can try to answer why a specific result was given (and the other answers do that by guessing compiler implementation details), but we cannot say why another result was not given. For all that we know, the compiler could have printed 0, formatted your hard drive, set your house on fire or transferred 100,000,000 USD to your bank account.

The intis compiled as a 32 bit number, little endian. By setting the two lower bytes to 2 and 0 respectively and then reading the int you get 1965097472. If you look at the hexadecimal representation 7521 0200 you see your bytes again, and besides it is undefined behaviour and part of it depends on the memory architecture of the platform the program is running on.

Note that your int is likely to be at least 4 bytes (not 2, like it was in the good ol' days). To let the sizes match, change the type of i to uint16_t.
Even after this, the standard does not really permit setting one union member, and then accessing a different one in an attempt to reinterpret the bytes. However, you could get the same effect with a reinterpret_cast.
union node {
uint16_t i;
uint8_t c[2];
};
int main() {
union node n;
n.c[0] = 0;
n.c[1] = 2;
std::cout << *reinterpret_cast<uint16_t *>(&n) << std::endl;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js