Cast between a pointer and integer in x86_32/64 - c++

I have a simple virtual machine which I made for fun. It works in a very low level and it doesn't have any notion of types. Everything is just an integer. There are some instructions for getting a pointer and accessing memory by a pointer. The problem is that these pointers are simply stored as uint64_t, and any pointer arithmetic is an integer arithmetic. The machine casts this to a void* when using it as a pointer. Well, this kind of code is obvious in the assembly level, but the C and C++ standard doesn't let programmers do integer-pointer cast safely, especially when the integer is used to change the value of the pointer.
I'm not making this toy VM portable everywhere. It is just expected to work under x86_32/64 machines, and it does seem to work very fine even after full compiler optimizations. I think it's because pointers are represented no differently as integers in the x86 architecture.
What kind of solution is usually applied in such situations, that the language standards doesn't declare certain code as safe, but it really should be safe in the targeted hardware, and the results does seem okay?
Or as a more practical question, how can I let a compiler (gcc) not perform breaking optimizations on code like
uint64_t registers[0x100];
registers[0] = (uint64_t)malloc(8);
registers[0] += 4;
registers[2] = 0;
memcpy(&registers[2], (void*)registers[0], 4);
The above isn't real code, but a certain sequence of bytecode instructions would actually do something similar as above.

If you really need to cast a pointer to an integer, use at least uintptr_t, for spaghetti monster's sake! This type (along with its signed counterpart) is meant to be casted safely to/from a pointer. It is, however, not to be used for operations (but might be safe for a linear model with no modifications to the representation of both values).
Still then, your code does not seem to make sense, but might hinder the compiler actively to optimize. Without deeper knowledge about what you intend to accomplish, I would say, chances are there are better ways.

Related

Do type conversions slow program running?

Th title is quite obvious.
In my case, and for the sake of simplicity, I avoid using, for instance, unsigned int instead of int, as it makes coding faster and simpler.
(BTW, Im using an Android IDE, CppDroid)
Yet, the IDE frequently alerts me to implicit conversions at, for example, For loops where the incremented variable (int) is compared with the size of a vector (size_t/unsigned int).
My questions are:
Do type conversions take time?
If so, how long do they take compared to other common operations?
In the case convertions do take some time, is it worth to correctly define variables in order to avoid convertions?
Your question is valid, although the goal is misconstrued. It is paramount to correctly define variables, but not because of mysterious performance.
It is to ensure correctness. Comparing unsigned integer with signed one is a ticking bomb, as well as (most usually) comparing size_t with integer.
For example, consider following snippet:
for (int i = 0; i < vec.size(); ++i) { }
For all you know, this code can lead to undefined behavior! If the size of the vector is bigger than maximum size signed integer can hold (which is usually the case with 64bit systems) your integer will be overflowing, which is undefined. Compiler might just remove the loop altogether, if it can proove that size of the vector is bigger than maximum int!
Similar looking (and incorrecet as well) line
for (unsigned int i = 0; i < vec.size(), ++i) { }
Is not going to cause undefined behaviour, but it will hang the program when vector size is greater than maximum int. No good thing either.
And of course, the correct way of doing this is
for (decltype(vec.size()) i = 0; i < vec.size(), ++i) { }
Depends what you convert to what.
That particular warning of signed/unsigned mismatch results in zero overhead, but you may end treating negative number as huge unsigned one (or other way around) - so as long as you are using int, and you don't expect to break into 2^31 numbers land, you are safe.
As safe, as people writing file I/O routines around 1990 (never expecting to see 3GiB file in their life). ...not very funny nowadays (still so much SW is broken on 2+GiB file size).
Some other conversions like from int to uint_8 may have tiny overhead, so it's better to avoid them - if possible (by designing the code to use the desired data type all around).
I would firstly address clarity and functionality of the code, and that usually leads to usage of particular data type for particular value all the time, without any conversion.
After the code works, you can measure the performance and consider what optimization makes sense (including usage of mismatched data types with conversions between them).
conclusion: just fix it, use proper data type.
Type conversions might give performance hits (when signedness, bitness or conversion between floating-point types are involved), but, as a general rule, the type identity of the many things in a program is merely a conceptual language front-end feature. When such hits do happen, however, it is because the types involved mean reasonably different things, and hence code must be emitted in order to properly solve the conversion.
Another thing which is completely different from the above is the invocation of type conversion operators in C++, which can run arbitrary code and, thus, most obviously influence in the final program behavior (not only performance).
As mentioned by others, correct use of the type system is most important for program correctness, at least or specially in languages such as C and C++. Using mismatched types can affect the program behavior in some corner cases, albeit can have no impact whatsoever on the execution time otherwise.
It depends.
Actually converting the data between types will require extra calculations. That much should probably be obvious. Usually those calculations take extra time, so they will have a performance impact. However, there are several factors that mitigate the actual impact of this:
The compiler can optimize types in some cases to minimize conversions.
Some platforms implement certain conversions in hardware.
The primary concern surrounding calculations with unlike types typically has much less to do with performance and more to do with safety and producing expected results. That is why the compiler is warning you; the vast majority of compilers will not tell you that you are doing something inefficient, but they will tell you that you are doing something dangerous.
For example, comparing an int with an unsigned int is asking for trouble. The int has a negative range, the unsigned int has a larger positive range. On conversion to unsigned int, the negative range of the int will appear to be larger than its own positive range. It is very easy to generate endless loops or out-of-bounds errors with such a construct.
You should normally only worry yourself about type conversion performance if you are dealing with huge amounts of data - like large vectors/arrays that need converted between formats. You would have to loop over the data in some way, so it would be a conscious action. For example, converting a 10000 element vector of chars to ints. In these cases, you might need to consider if you have a design flaw that is requiring needless conversion of data.
It is worth pointing out that in the above example, even if the conversion itself were instant, the iteration and copy is not.
As for an example of platforms where this can be done to an extent in hardware, most video cards are able to interpret integers as floats on a normalized range, of the sort 255 --> 1.0. However, many other conversions, like conversions between image formats, are still done in software.
Given the platform and optimization details vary greatly, answering how long a given conversion takes relative to other operations is effectively impossible. If you are dealing with enough data that a conversion is creating a noticeable performance bottleneck, then profile that conversion.
It is worth it to make sure your types match to the best of your ability if you are dealing with enough data for it to matter; that is a subjective measurement of value, though, and will depend on what you are doing.
It is always worth it to make sure implicit type conversions do not cause errors, as errors due to them can be some of the worst possible in C/C++ (memory leaks, buffer overflows, access violations, etc.).

Why worry about 'undefined behavior' in >> of signed type?

My question is related to this one and will contain few questions.
For me the most obvious (means I would use it in my code) solution to above problem is just this:
uint8_t x = some value;
x = (int8_t)x >> 7;
Yes, yes I hear you all .... undefined behavior and this is why I've not posted my 'solution'.
I have a feeling (maybe it is only my sick mind) that term 'undefined behavior' is overused on SO just to justify downvoting someone if question is tagged c/c++.
So - let's (for a while) put aside C/C++ standards and think about everyday life/programming, real compiler implementations and code they generate for contemporary hardware.
Taking into account the following:
As far as I remember all the hardware I had encountered had distinct instructions for arithmetic and logical shift.
All compilers that I know translate >> into arithmetic shift for signed types and logical shift for unsigned types.
I cannot recall any compiler ever emitting div-like low level instruction when >> was used in c/c++ code (and we are not talking about operator overloading here).
All the hardware I know use U2.
So ... is there anything (any contemporary compiler, hardware) that behaves differently than mentioned above? Put simply should I ever be worried about right shifting signed value not being translated to arithmetic shift?
My 'solution' compiles to just one low level instruction on many platforms while others require multiple low level instructions. What would you use in your code?
Truth please ;-)
Why worry about 'undefined behavior' in >> of signed type?
Because it doesn't really matter how well defined any particular undefined behaviour is now; the point is that it may break at any point in the future. You're relying on a side-effect that may be optimized (or un-optimized) away at any point for any reason or no reason.
Also, I don't want to have to ask somebody with detailed knowledge of many different compiler's implementations before I use something I shouldn't use in the first place, so I skip it.
Yes, there are compilers which behave different from what you assume.
In particular, optimization phases within compilers. These take advantage of the known possible values of variables, and will derive those possible values from the absence of UB. A pointer must be non-null if it's been dereferenced, an integer must be non-zero if it's been used as a divider, and a right-shifted value must be non-negative.
And that works back in time:
if (x<0) {
printf("This is dead code\n");
}
x >> 3;
What it really comes down to is, are you willing to take the risk?
"The standard doesn't guarantee yada yada" is nice and all, but let's be honest now, the risk isn't big. If you're going to run your code on some crazy platform, you generally know in advance. And if it takes you by surprise, well, that's the risk you took.
Also, the workaround is horrible. If you're not going to need it, it's just polluting your codebase with pointless "function calls instead of right shifts" that will be harder to maintain (and thus carry a cost). And you'll never to able to "paste and forget" code from other places into the project - you'd always have to check the code for the possibility of right shifting negative signed integers.

Is pointer conversion expensive or not?

Is pointer conversion considered expensive? (e.g. how many CPU cycles it takes to convert a pointer/address), especially when you have to do it quite frequently, for instance (just an example to show the scale of freqency, I know there are better ways for this particular cases):
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i);
};
(e.g. how many CPU cycles it takes to convert a pointer/address)
In most machine code languages there is only 1 "type" of pointer and so it doesn't cost anything to convert between them. Keep in mind that C++ types really only exist at compile time.
The real issue is that this sort of code can break strict aliasing rules. You can read more about this elsewhere, but essentially the compiler will either produce incorrect code through undefined behavior, or be forced to make conservative assumptions and thus produce slower code. (note that the char* and friends is somewhat exempt from the undefined behavior part)
Optimizers often have to make conservative assumptions about variables in the presence of pointers. For example, a constant propagation process that knows the value of variable x is 5 would not be able to keep using this information after an assignment to another variable (for example, *y = 10) because it could be that *y is an alias of x. This could be the case after an assignment like y = &x.
As an effect of the assignment to *y, the value of x would be changed as well, so propagating the information that x is 5 to the statements following *y = 10 would be potentially wrong (if *y is indeed an alias of x). However, if we have information about pointers, the constant propagation process could make a query like: can x be an alias of *y? Then, if the answer is no, x = 5 can be propagated safely.
Another optimization impacted by aliasing is code reordering. If the compiler decides that x is not aliased by *y, then code that uses or changes the value of x can be moved before the assignment *y = 10, if this would improve scheduling or enable more loop optimizations to be carried out.
To enable such optimizations in a predictable manner, the ISO standard for the C programming language (including its newer C99 edition, see section 6.5, paragraph 7) specifies that it is illegal (with some exceptions) for pointers of different types to reference the same memory location. This rule, known as "strict aliasing", sometime allows for impressive increases in performance,[1] but has been known to break some otherwise valid code. Several software projects intentionally violate this portion of the C99 standard. For example, Python 2.x did so to implement reference counting,[2] and required changes to the basic object structs in Python 3 to enable this optimisation. The Linux kernel does this because strict aliasing causes problems with optimization of inlined code.[3] In such cases, when compiled with gcc, the option -fno-strict-aliasing is invoked to prevent unwanted optimizations that could yield unexpected code.
[edit]
http://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts_with_optimization
What is the strict aliasing rule?
On any architecture you're likely to encounter, all pointer types have the same representation, and so conversion between different pointer types representing the same address has no run-time cost. This applies to all pointer conversions in C.
In C++, some pointer conversions have a cost and some don't:
reinterpret_cast and const_cast (or an equivalent C-style cast, such as the one in the question) and conversion to or from void* will simply reinterpret the pointer value, with no cost.
Conversion between pointer-to-base-class and pointer-to-derived class (either implicitly, or with static_cast or an equivalent C-style cast) may require adding a fixed offset to the pointer value if there are multiple base classes.
dynamic_cast will do a non-trivial amount of work to look up the pointer value based on the dynamic type of the object pointed to.
Historically, some architectures (e.g. PDP-10) had different representations for pointer-to-byte and pointer-to-word; there may be some runtime cost for conversions there.
unsigned long long *x;
/* fill data to x*/
for (int i = 0; i < 1000*1000*1000; i++)
{
A[i]=foo((unsigned char*)x+i); // bad cast
}
Remember, the machine only knows memory addresses, data and code. Everything else (such as types etc) are known only to the Compiler(that aid the programmer), and that does all the pointer arithmetic, only the compiler knows the size of each type.. so on and so forth.
At runtime, there are no machine cycles wasted in converting one pointer type to another because the conversion does not happen at runtime. All pointers are treated as of 4 bytes long(on a 32 bit machine) nothing more and nothing less.
It all depends on your underlying hardware.
On most machine architectures, all pointers are byte pointers, and converting between a byte pointer and a byte pointer is a no-op. On some architectures, a pointer conversion may under some circumstances require extra manipulation (there are machines that work with word based addresses for instance, and converting a word pointer to a byte pointer or vice versa will require extra manipulation).
Moreover, this is in general an unsafe technique, as the compiler can't perform any sanity checking on what you are doing, and you can end up overwriting data you didn't expect.

reinterpret_cast cast cost

My understanding is that C++ reinterpret_cast and C pointer cast is a just
a compile-time functionality and that it has no performance cost at all.
Is this true?
It's a good assumption to start with. However, the optimizer may be restricted in what it can assume in the presence of a reinterpret_cast<> or C pointer cast. Then, even though the cast itself has no associated instructions, the resulting code is slower.
For instance, if you cast an int to a pointer, the optimizer likely will have no idea what that pointer could be pointing to. As a result, it probably has to assume that a write through that pointer can change any variable. That beats very common optimizations such as storing variables in registers.
That's right. No cost other than any gain/loss in performance for performing instructions at the new width, which I might add, is only a concern in rare cases. Casting between pointers on every platform I've ever heard of has zero cost, and no performance change whatsoever.
C style casts in C++ will attempt a static_cast first and only perform a reinterpret_cast if a static cast cannot be performed. A static_cast may change the value of the pointer in the case of multiple inheritance (or when casting an interface to a concrete type), this offset calculation may involve an extra machine instruction. This will at most be 1 machine instruction so really very small.
Yes, this is true. Casting type which has runtime cost is dynamic_cast.
You're right, but think about it: reinterpret_cast means maybe a bad design or that you're doing something very low level.
dynamic-cast instead it will cost you something, because it has to look in a lookup table at runtime.
reinterpret_cast does not incur runtime cost.. however you have to be careful, as every use of reinterpret_cast is implementation defined. For example, it is possible reinterpreting a char array as an int array could cause the target architecture to throw an interrupt, because different types may have different alignment rules.
Get correct first, then worry about efficiency.
I was looking at my assembler code before and after reinterpret casting signed char as unsigned char. The instructions grew by about 3 or four more instructions.
int main()
{
signed char i = 0x80;
(unsigned char&)i >>= 7;
return i;
}
I was casting to unsigned char to make the compiler use SHL instruction, rather than SAR instruction, so that newly shifted shift in bits would be zer0s instead of var i signed bit value.
The compiler still and seems to always use SAR instruction. But the reinterpret casting made the compiler add more instructions. 3 to 4 more instructions!
I was concerned why my unicode function for converting UTF8 to UTF16 string was almost 3 times slower than Win32 MultiByteToWideChar(). Now I am worried that casting is one of the main factors.
Which is IRONIC, as we use reinterpret cast for speed.

What is the uintptr_t data type?

What is uintptr_t and what can it be used for?
First thing, at the time the question was asked, uintptr_t was not in C++. It's in C99, in <stdint.h>, as an optional type. Many C++03 compilers do provide that file. It's also in C++11, in <cstdint>, where again it is optional, and which refers to C99 for the definition.
In C99, it is defined as "an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer".
Take this to mean what it says. It doesn't say anything about size.
uintptr_t might be the same size as a void*. It might be larger. It could conceivably be smaller, although such a C++ implementation approaches perverse. For example on some hypothetical platform where void* is 32 bits, but only 24 bits of virtual address space are used, you could have a 24-bit uintptr_t which satisfies the requirement. I don't know why an implementation would do that, but the standard permits it.
uintptr_t is an unsigned integer type that is capable of storing a data pointer (whether it can hold a function pointer is unspecified). Which typically means that it's the same size as a pointer.
It is optionally defined in C++11 and later standards.
A common reason to want an integer type that can hold an architecture's pointer type is to perform integer-specific operations on a pointer, or to obscure the type of a pointer by providing it as an integer "handle".
It's an unsigned integer type exactly the size of a pointer. Whenever you need to do something unusual with a pointer - like for example invert all bits (don't ask why) you cast it to uintptr_t and manipulate it as a usual integer number, then cast back.
There are already many good answers to "what is uintptr_t data type?". I will try to address the "what it can be used for?" part in this post.
Primarily for bitwise operations on pointers. Remember that in C++ one cannot perform bitwise operations on pointers. For reasons see Why can't you do bitwise operations on pointer in C, and is there a way around this?
Thus in order to do bitwise operations on pointers one would need to cast pointers to type uintptr_t and then perform bitwise operations.
Here is an example of a function that I just wrote to do bitwise exclusive or of 2 pointers to store in a XOR linked list so that we can traverse in both directions like a doubly linked list but without the penalty of storing 2 pointers in each node.
template <typename T>
T* xor_ptrs(T* t1, T* t2)
{
return reinterpret_cast<T*>(reinterpret_cast<uintptr_t>(t1)^reinterpret_cast<uintptr_t>(t2));
}
Running the risk of getting another Necromancer badge, I would like to add one very good use for uintptr_t (or even intptr_t) and that is writing testable embedded code.
I write mostly embedded code targeted at various arm and currently tensilica processors. These have various native bus width and the tensilica is actually a Harvard architecture with separate code and data buses that can be different widths.
I use a test driven development style for much of my code which means I do unit tests for all the code units I write. Unit testing on actual target hardware is a hassle so I typically write everything on an Intel based PC either in Windows or Linux using Ceedling and GCC.
That being said, a lot of embedded code involves bit twiddling and address manipulations. Most of my Intel machines are 64 bit. So if you are going to test address manipulation code you need a generalized object to do math on. Thus the uintptr_t give you a machine independent way of debugging your code before you try deploying to target hardware.
Another issue is for the some machines or even memory models on some compilers, function pointers and data pointers are different widths. On those machines the compiler may not even allow casting between the two classes, but uintptr_t should be able to hold either.
-- Edit --
Was pointed out by #chux, this is not part of the standard and functions are not objects in C. However it usually works and since many people don't even know about these types I usually leave a comment explaining the trickery. Other searches in SO on uintptr_t will provide further explanation. Also we do things in unit testing that we would never do in production because breaking things is good.