const char* to int cast? - c++

I suppose the behaviour of the following snippet is supposed to be undefined but I just wanted to make sure I am understanding things right.
Let's say we have this code:
#include <iostream>
int main()
{
std::cout << "mamut" - 8 << std::endl;
return 0;
}
So what I think this does is (char*)((int)(const char*) - (int)), though the output after this is pretty strange, not that I expect it to make any real sense. So my question is about the casting between char* and int - is it undefined, or is there some logic behind it?
EDIT:
Let me just add this:
#include <iostream>
int main ()
{
const char* a = "mamut";
int b = int(a);
std::cout << b << std::endl;
std::cout << &a <<std::endl;
// seems b!= &a
for( int i = 0; i<100;i++)
{
std::cout<<(const char*)((int)a - i)<<std::endl;
}
return 0;
}
The output after i gets big enough gives me a something like _Jv_RegisterClasses etc.
Just for the record:
std::cout << a - i << std::endl;
produces the same result as:
std::cout<<(const char*)((int)a - i)<<std::endl;

There is no cast, you are merely telling cout that you want to print the string at the address of the string literal "mamut" minus 8 bytes. You are doing pointer arithmetic. cout will then print whatever happens to be at that address, or possibly crash & burn, since accessing arrays out of bounds leads to undefined behavior.
EDIT
Regarding the edit by the op: converting an address to int doesn't necessarily result in a correct number identical to the address. An address doesn't necessarily fit in an int and on top of that, int is a signed type and it doesn't make any sense to store addresses in signed types.
To guarantee a conversion from pointer to integer without losses, you need to use uintptr_t from stdint.h.
To quote the C standard 6.3.2.3 (I believe C++ is identical in this case):
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.

There is no casting going on. "mamut" is a pointer to characters, and - 8 will do pointer arithmetic on it. You are right that it's undefined behavior, so even though the semantic behavior is pointer arithmetic, the runtime behavior can be literally anything.

You are printing string starting from address of "mamut" minus 8 bytes till null terminator i.e. in total 8+5 = 13 chars

Related

How is it possible that a null pointer still retains a memory adress?

When you initialize a pointer with nothing(NULL), that element still has a memory address big enough for the initialising type of that pointer(4 bytes for int, 1 for char etc.) but why,since it's tehnically nothing, not even the value zero? I mean, NULL can't be a fixed value like 0 because zero still is considered a value, so it is something more than that?
Example:
#include <iostream>
int *a=NULL;
int main()
{
std::cout <<&a; //it will show the address in hexadecimal system;
return 0;
}
Your program does not answer the question you were asking. This program shows that, yes the pointer has an address, it needs one to store the value (the address it is pointing to). When you print the value you see that it is indeed nullptr (since this is C++ not C).
#include <iostream>
int *a= nullptr;
int main()
{
std::cout << &a << '\n'; // Will show the address OF THE POINTER in hexadecimal system;
std::cout << a << '\n'; // Will show the address at a is pointing to.
return 0;
}
Output:
0x601180
0
I'm pretty sure there's a duplicate, but I don't see any now. You confuse the meaning of operators * and & in different contexts.
Here, &p means "address of p". And what is p? p is a global variable of pointer type. It is perfectly valid to take address of any global variable.
So, to clear things up:
#include <iostream>
int *a=NULL;
int main()
{
std::cout << &a; //perfectly valid, address of p, type int** (pointer-to-pointer-to-int)
std::cout << a; //still valid, it gives address to where p is pointing, i.e. 0 (NULL)
std::cout << *a; //wrong, dereferencing an invalid address, there's no memory allocated
return 0;
}
You also seem to have few misconceptions about pointers:
"that element still has a memory address big enough for the initialising type of that pointer(4 bytes for int, 1 for char etc.)"
Not at all. Pointer is just a pointer. It doesn't care where does it point to. In fact, on lower levels it's just plain int. It can point to an array, to an element, to nothing at all or to some wild place where nothing was ever stored.
"NULL can't be a fixed value like 0 because zero still is considered a value, so it is something more than that?"
Again, pointer is just a pointer. Pointer doesn't know anything at all about value. Value may or may not exist, and the memory where pointer points to may or may not be valid. And in fact, NULL is defined to be exactly 0 (or nullptr in newer standards): https://en.cppreference.com/w/cpp/types/NULL

C++ Read access violation for vector

Im getting an exception when I`m trying to use vector[int_number] and my program stop working.
uint64_t data = 0xffeeddccbbaa5577;
uint16_t *vector = (uint16_t*) data;
int currentPosition = 0;
while (currentPosition <= 3) {
uint16_t header = vector[currentPosition]; // problem here
Visual Studio 2017 returns me: Unhandled exception thrown: read access violation.
vector was 0x6111F12.
Im stuck here. If you have any idea what I should do I`ll be grateful. Thanks in advance!
Setting aside all the undefined behaviour you get due to strict aliasing violations, in the current crop of Intel chips and MSVC runtime, all pointers are 48 bits.
So 0xffeeddccbbaa5577 is never a valid pointer value.
So the behaviour on dereferencing that value will be undefined.
If you wanted to break up data, into four elements of an appropriate type, then one method is to create a uint16_t foo[4] say and memcpy the data starting at &data to foo.
By accessing the data through a pointer of different type you obtained by casting you wander off into undefined-behavior-land. Instead of this, try the following (note I also replaced your while loop with a ranged for loop avoiding to have to keep a counter)
#include <iostream>
#include <cstring>
int main() {
uint64_t data = 0xffeeddccbbaa5577;
uint16_t vector[4];
memcpy(vector, &data, sizeof(uint64_t));
for (uint16_t header : vector)
{
std::cout << std::hex << header << std::endl;
}
}
yielding
5577
bbaa
ddcc
ffee
If you use reinterpret_cast you hold two pointers of different type pointing to same address which may easily lead to undefined behavior. memcpy avoids that by creating a copy of the memory location and you may safly access it with a pointer of a different type. Also take a look into type-punning (as pointed out by #DanielLangr)
It's really very easy, but you were so far off with your original attempt that you've confused everyone.
uint16_t vector[] = { 0x5577, 0xbbaa, 0xddcc, 0xffee };
Ask the right question, if you'd asked the question you have in the comments we'd have got there a lot quicker.
Here's a concrete example that should avoid any undefined behavior due to strict aliasing / "illegal" casts / etc., since this seems to be what you're actually interested in.
This code takes a std::uint64_t, copies it into an array of four std::uint16_ts, modifies the values in the array, and then copies them back into the original std::uint64_t.
#include <cstdint>
#include <cstring>
#include <iostream>
int main() {
std::uint64_t data = 0xffeeddccbbaa5577;
std::uint16_t data_spliced[4];
std::memcpy(&data_spliced, &data, sizeof(data));
std::cout << "Original data:\n" << data << "\nOriginal, spliced data:\n";
for (const auto spliced_value : data_spliced) {
std::cout << spliced_value << " ";
}
std::cout << "\n\n";
data_spliced[2] = 0xd00d;
memcpy(&data, &data_spliced, sizeof(data));
std::cout << "Modified data:\n" << data << "\nModified, spliced data:\n";
for (const auto spliced_value : data_spliced) {
std::cout << spliced_value << " ";
}
std::cout << '\n';
}
With output (on my machine):
Original data:
18441921395520329079
Original, spliced data:
21879 48042 56780 65518
Modified data:
18441906281530414455
Modified, spliced data:
21879 48042 53261 65518
You need to take the address of that variable if you want to assign it to a pointer
const uint16_t* vector = reinterpret_cast<const uint16_t*>( &data ) ;
NOTE:
This works on MSVC 2017, but...
This is a truck load of undefined behaviour! – Bathsheba
As cpprefrence for reinterpret_cast says:
5) Any pointer to object of type T1 can be converted to pointer to object of another type cv T2. This is exactly equivalent to static_cast<cv T2*>(static_cast<cv void*>(expression)) (which implies that if T2's alignment requirement is not stricter than T1's, the value of the pointer does not change and conversion of the resulting pointer back to its original type yields the original value). In any case, the resulting pointer may only be dereferenced safely if allowed by the type aliasing rules (see below)
...
Type aliasing.
Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:
AliasedType and DynamicType are similar.
AliasedType is the
(possibly cv-qualified) signed or unsigned variant of DynamicType.
AliasedType is std::byte (since C++17), char, or unsigned char: this
permits examination of the object representation of any object as an
array of bytes.
Note that many C++ compilers relax this rule, as a non-standard language extension, to allow wrong-type access through the inactive member of a union (such access is not undefined in C)
The above code does not fulfill any of the Aliasing rules.

Is std::cout of a char reference defined by the standard?

I have read this post and the answers indicate a behavior described in a paragraph below. I am not trying to make it work on my machine, or find a workaround to make it work on my machine, it is a question of is it defined behavior according to the standard.
Consider the following code which creates an int variable, an int-reference variable, and prints out the result of calling the address-operator on the int-reference variable
#include <iostream>
int main() {
int a = 70;
int& b = a;
std::cout << &b << std::endl;
return 0;
}
It prints out what I would expect, which is an address in memory, i.e., the address of int variable a.
But now I change int to char, or unsigned char, or signed char, and both on Xcode (Version 6.4) and Visual Studio (VS 2013 Ultimate) I get unexpected behavior.
#include <iostream>
int main() {
// or unsigned char or signed char, same weird behavior
char a = 70;
char& b = a;
std::cout << &b << std::endl;
return 0;
}
In Xcode, the console prints something like F\330\367\277_\377 . I get that F is the ASCII code for 70, but I do not understand the rest of it. I assume it is also a set of ASCII characters, since on Visual Studio it prints out the F followed by some weird characters.
I tried other integer types and it worked fine. And I know that often char/signed char/unsigned char or some combination of them are implemented as the same type. The only thing I can think of is that the reference type is being implemented as a pointer type and then interpreting the call to &b as returning a pointer type, and then std::cout is taking its input to mean to print out all characters in a char array.
Is this defined behavior?
To reiterate: my question is more specifically, is this a defined behavior which is part of the standard, is this behavior not defined by the standard, is this a non-standard implementation of the compilers? Something else?

Please explain what is incorrect about this procedure to find the largest pointer

Wouldn't the highest pointer be the one which can't be incremented through pointer arithmetic?
#include <iostream>
int main()
{
// Find the largest pointer
int x = 0;
int* px = &x;
while ((px+1) != px)
++px;
std::cout << "The largest pointer is: " << px;
return 0;
}
yields
Timeout
As already mentioned, you've got an infinite loop because the condition can never be false.
That being said, what you're doing is undefined behaviour, illegal C++. Pointer arithmetic is only legal with pointers pointing to the same array (and a single object is treated as an array of one element) and right past the end of it. You can't expect a reasonable outcome of your program even if you fix the loop.
I suspect the value of std::numeric_limits<uintptr_t>::max() is the theoretical maximum value of pointer (converted to integer), but it might not be avaliable to your program. There are things such as virtual address space and segmented memory model to consider. Anyway, exact values of pointers (except for nullptr) is not something you should be concerned with. You get pointers by taking addresses of existing objects or by calling allocation functions and that's that.
N.B. I think you have a misconception that attempting to increment an integer type beyond its maximum value will just do nothing. That's incorrect - unsigned integers will wrap around to 0 and with signed integers you get undefined behaviour again (see signed integer overflow).
Hope that helps.
This will never be false and thus never quit
while ((px+1) != px)
Look at this program:
#include <iostream>
int main()
{
int *px = (int *) (~0);
std::cout << "Value: " << px;
++px;
std::cout << " Value: " << px << std::endl;
}
whose output is:
Value: 0xffffffffffffffff Value: 0x3
As you can see, when you increment a pointer that is at its maximum, it values is reseted and begins again
You might want to look for the largest pointer value that occurs before wrap-around, i.e.:
while (px+1 > px)
px++;
...which will not work, of course, without the proper casts:
while ((unsigned long long)(px + 1) > (unsigned long long)px)
px++;

Why strange behavior with casting back pointer to the original class?

Assume that in my code I have to store a void* as data member and typecast it back to the original class pointer when needed. To test its reliability, I wrote a test program (linux ubuntu 4.4.1 g++ -04 -Wall) and I was shocked to see the behavior.
struct A
{
int i;
static int c;
A () : i(c++) { cout<<"A() : i("<<i<<")\n"; }
};
int A::c;
int main ()
{
void *p = new A[3]; // good behavior for A* p = new A[3];
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
}
This is just a test program; in actual for my case, it's mandatory to store any pointer as void* and then cast it back to the actual pointer (with help of template). So let's not worry about that part. The output of the above code is,
p->i = 0
p->i = 0 // ?? why not 1
p->i = 1
However if you change the void* p; to A* p; it gives expected behavior. WHY ?
Another question, I cannot get away with (A*&) otherwise I cannot use operator ++; but it also gives warning as, dereferencing type-punned pointer will break strict-aliasing rules. Is there any decent way to overcome warning ?
Well, as the compiler warns you, you are violating the strict aliasing rule, which formally means that the results are undefined.
You can eliminate the strict aliasing violation by using a function template for the increment:
template<typename T>
void advance_pointer_as(void*& p, int n = 1) {
T* p_a(static_cast<T*>(p));
p_a += n;
p = p_a;
}
With this function template, the following definition of main() yields the expected results on the Ideone compiler (and emits no warnings):
int main()
{
void* p = new A[3];
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
}
You have already received the correct answer and it is indeed the violation of the strict aliasing rule that leads to the unpredictable behavior of the code. I'd just note that the title of your question makes reference to "casting back pointer to the original class". In reality your code does not have anything to do with casting anything "back". Your code performs reinterpretation of raw memory content occupied by a void * pointer as a A * pointer. This is not "casting back". This is reinterpretation. Not even remotely the same thing.
A good way to illustrate the difference would be to use and int and float example. A float value declared and initialized as
float f = 2.0;
cab be cast (explicitly or implicitly converted) to int type
int i = (int) f;
with the expected result
assert(i == 2);
This is indeed a cast (a conversion).
Alternatively, the same float value can be also reinterpreted as an int value
int i = (int &) f;
However, in this case the value of i will be totally meaningless and generally unpredictable. I hope it is easy to see the difference between a conversion and a memory reinterpretation from these examples.
Reinterpretation is exactly what you are doing in your code. The (A *&) p expression is nothing else than a reinterpretation of raw memory occupied by pointer void *p as pointer of type A *. The language does not guarantee that these two pointer types have the same representation and even the same size. So, expecting the predictable behavior from your code is like expecting the above (int &) f expression to evaluate to 2.
The proper way to really "cast back" your void * pointer would be to do (A *) p, not (A *&) p. The result of (A *) p would indeed be the original pointer value, that can be safely manipulated by pointer arithmetic. The only proper way to obtain the original value as an lvalue would be to use an additional variable
A *pa = (A *) p;
...
pa++;
...
And there's no legal way to create an lvalue "in place", as you attempted to by your (A *&) p cast. The behavior of your code is an illustration of that.
As others have commented, your code appears like it should work. Only once (in 17+ years of coding in C++) I ran across something where I was looking straight at the code and the behavior, like in your case, just didn't make sense. I ended up running the code through debugger and opening a disassembly window. I found what could only be explained as a bug in VS2003 compiler because it was missing exactly one instruction. Simply rearranging local variables at the top of the function (30 lines or so from the error) made the compiler put the correct instruction back in. So try debugger with disassembly and follow memory/registers to see what it's actually doing?
As far as advancing the pointer, you should be able to advance it by doing:
p = (char*)p + sizeof( A );
VS2003 through VS2010 never give you complaints about that, not sure about g++