Lets say we are defining a variable :
float myFloat{3};
I assume that these steps are done in memory while defining a variable, but I am not certainly sure.
Initial Assume: Memory is consist of addresses and correspond values. And values are kept as binary codes.
1- create binary code(value) for literal 3 in an address1.
2- turn this integer binary code of 3 to float binary code of 3 in the address2. (type conversion)
3- copy this binary code(value) from address2 to the memory part created for myFloat.
Are these steps accurate ? I would like to hear from you. Thanks..
Conceptually that’s accurate, but with any optimization, the compiler will probably generate the 3.0f value at compile time, making it just a load of that constant to the right stack address. Furthermore, the optimizer may well optimize it out entirely. If the next line says myFloat *= 0.0f; return myFloat;, the compiler will turn the whole function into essentially return 0.0f; although it may spell it in a funny way. Check out Compiler Explorer to get a sense of it.
Related
I'm still learning C++. I'm trying to understand how evaluation is carried out, in a rather step-by-step fashion. So using this simple example, an expression statement:
int x = 8 * 5 - 5;
This is what I believe happens. Please tell me how far off the mark I am:
The operands x, 8, 5, and 5 are "evaluated." Possibly, a temporary object is created to hold each value (I am not too sure about this).
8 * 5 evaluates to 40, which is stored in a temporary.
40 (temporary) - 5 evaluates to 35 (another temporary).
35 is copied into x.
All temporary objects are destroyed in the reverse order they were created in (the value is discarded).
Am I at least close to being right?
"Thank you, sir. Hm. What would happen if all the operands were named objects, rather than literals? Would it create temporaries on the fly, so to speak, rather than at compile time?"
As Sam mentioned, you are on the right track on a high level.
In your first example it would use CPU registers to store temporaries (since they are not named objects), if they would be named objects it depends on the optimization flags that are set on the compiler and the complexity of the code as to how 'optimized' the code will be that is generated. you can take a look at the disassembly to really see what happens. for example if you do
a = 5;
b = 2;
c = a * b;
the compiler will try and generate the most optimal code, and since in this case there are 2 constants that are known at compile time, and you do a multiplication by 2, it will be able to take shortcuts, sometimes multiplications are replaced by bit operations which are cheaper (multiply by 2 is the same as shifting 1 to the left)
named variables have to live somewhere, either on the stack or heap, and the CPU will use the address of named objects to pass them around and perform functions on. (if they are small enough it will fit in registers and operate on them, otherwise it will start using memory, first the cache, and then bleed out to RAM)
You could google for 'abstract syntax tree' to get an idea of how readable c++ code is converted to machine code.
this is why it is important to learn about const correctness, aliasing and pointer vs references to make sure you give the compiler the best chance at generating optimal code for you. (aside from the advantages a user gets from that)
I am writing some numeric code in C++ and I want to be able to swap between using double and float. I have therefore added a #define MYFLT which I can make either a float or a double as needed. However, how do I deal with the various numeric literals.
For example
MYFLT someNumber = 1.2;
MYFLT someOtherNumber = 1.5f;
gives compiler warnings for the first line when MYFLT is a float and for the second line when MYFLT is a double. I know this is a trivial example, but there are other cases where I have longer expresions with literals in and floats can end up being converted to doubles then the result back to floats which I think is costing me significant performance. How should I deal with this?
I could do things like
MYFLT someNumber = MYFLT(1.2);
MYFLT someOtherNumber = MYFLT(1.5);
but this is quite tedious. I'm assuming that in that if I do this the compiler is clever enough to just use a float when needed (can anyone confirm that?). What would be better would be if there was a MSVC++ compiler switch or #define that will tell the compiler to treat all floating point literals as floats instead of doubles. Does such a switch exist?
Even when I wrap all my literals as above my code runs 50% slower when I use float rather than double. I was expecting a performance boost through simd type operations, not a penalty!
Phil
What you'd want is #define MYFLTCONST(x) x##f or #define MYFLTCONST(x) x depending on whether you want a f suffix for float appended.
This is a (not quite complete) answer to my own question.
I found that a small function that was called many times (a fast approximation to sin) didn't have its literals cast as MYFLT. The extra computational hit of this also meant that the compiler wasn't inlining it. This function accounted for most of the difference. Some further profiling seemed to indicate that accessing std::vector<float> was slower than std::vector<double> ( I am using [] to do the access if it matters ). Replacing std::vectors with raw fixed sized arrays sped up the double implementation a little and closed the gap significantly for the float implementation. The float version is now only about 10% slower than the double version. But definitely no speed increase due to either RAM access nor vectorization. I guess I need to think more carefully about my loops to get any benefit there.
I guess the conclusion here (yet again) is that the compiler is pretty good at optimising code - it's much better to work with it and do careful profiling than it is to try and do your own blind "optimisations" which might actually have negative effects, like stopping the compiler performing good inlining.
I'm porting some code, and the original author was evidently quite concerned with squeezing as much performance as possible out of the code.
Throughout (and there's hundreds of source files), there are lots of things like this:
float f = (float)(6);
type_float tf = (type_float)(0); //type_float is a typedef of float xor double
In short, the author tried to make the RHS of assignments equal to the variable being assigned into. The aim, I presume, was to coerce the compiler into making e.g. the 6 in the first example into 6.0f so that no conversion overhead happens when that value is copied into the variable.
This would actually be useful for something like the second example, where the proper form of the literal (one of {0.0f,0.0}) isn't known/can be changed from a line far away. However, I can see it being problematic if the literal is converted and stored into a temporary and then copied, instead of the conversion happening on copy.
Is this author onto something here? Are all these literals actually being stored with the intended type? Or is this just a massive waste of source file bits? What is the best way to handle these sorts of cases in modern code?
Note: I believe this applies to both C and C++, so I have applied both tags.
This is a complete waste. No modern optimizing compiler will keep any track of intermediate values, but directly initialize with the final correct value. There is really no point in it, default conversion should always do the right thing, here. And yes this should apply to both, C and C++, and they shouldn't differ much in behavior.
I'm trying to figure out how it is that two variable types that have the same byte size?
If i have a variable, that is one byte in size.. how is it that the computer is able to tell that it is a character instead of a Boolean type variable? Or even a character or half of a short integer?
The processor doesn't know. The compiler does, and generates the appropriate instructions for the processor to execute to manipulate bytes in memory in the appropriate manner, but to the processor itself a byte of data is a byte of data and it could be anything.
The language gives meaning to these things, but it's an abstraction the processor isn't really aware of.
The computer is not able to do that. The compiler is. You use the char or bool keyword to declare a variable and the compiler produces code that makes the computer treat the memory occupied by that variable in a way that makes sense for that particular type.
A 32-bit integer for example, takes up 4 bytes in memory. To increment it, the CPU has an instruction that says "increment a 32-bit integer at this address". That's what the compiler produces and the CPU blindly executes it. It doesn't care if the address is correct or what binary data is located there.
The size of the instruction for incrementing the variable is another matter. It may very well be another 4 or so bytes, but instructions (code) are stored separately from data. There may be many instructions generated for a program that deal with the same location in memory. It is not possible to formally specify the size of the instructions beforehand because of optimizations that may change the number of instructions used for a given operation. The only way to tell is to compile your program and look at the generated assembly code (the instructions).
Also, take a look at unions in C. They let you use the same memory location for different data types. The compiler lets you do that and produces code for it but you have to know what you're doing.
Because you specify the type. C++ is a strongly typed language. You can't write $x = 10. :)
It knows
char c = 0;
is a char because of... well, the char keyword.
The computer only sees 1 and 0. You are in command of what the variable contains.
you can cast that data also into what ever you want.
char foo = 'a';
if ( (bool)(foo) ) // true
{
int sumA = (byte)(foo) + (byte)(foo);
// sumA == (97 + 97)
}
Also look into data casting to look at the memory location as different data types. This can be as small as a char or entire structs.
In general, it can't. Look at the restrictions of dynamic_cast<>, which tries to do exactly that. dynamic_cast can only work in the special case of objects derived from polymorphic base classes. That's because such objects (and only those) have extra data in them. Chars and ints do not have this information, so you can't use dynamic_cast on them.
Recently I changed some code
double d0, d1;
// ... assign things to d0/d1 ...
double result = f(d0, d1)
to
double d[2];
// ... assign things to d[0]/d[1]
double result = f(d[0], d[1]);
I did not change any of the assignments to d, nor the calculations in f, nor anything else apart from the fact that the doubles are now stored in a fixed-length array.
However when compiling in release mode, with optimizations on, result changed.
My question is, why, and what should I know about how I should store doubles? Is one way more efficient, or better, than the other? Are there memory alignment issues? I'm looking for any information that would help me understand what's going on.
EDIT: I will try to get some code demonstrating the problem, however this is quite hard as the process that these numbers go through is huge (a lot of maths, numerical solvers, etc.).
However there is no change when compiled in Debug. I will double check this again to make sure but this is almost certain, i.e. the double values are identical in Debug between version 1 and version 2.
Comparing Debug to Release, results have never ever been the same between the two compilation modes, for various optimization reasons.
You probably have a 'fast math' compiler switch turned on, or are doing something in the "assign things" (which we can't see) which allows the compiler to legally reorder calculations. Even though the sequences are equivalent, it's likely the optimizer is treating them differently, so you end up with slightly different code generation. If it's reordered, you end up with slight differences in the least significant bits. Such is life with floating point.
You can prevent this by not using 'fast math' (if that's turned on), or forcing ordering thru the way you construct the formulas and intermediate values. Even that's hard (impossible?) to guarantee. The question is really "Why is the compiler generating different code for arrays vs numbered variables?", but that's basically an analysis of the code generator.
no these are equivalent - you have something else wrong.
Check the /fp:precise flags (or equivalent) the processor floating point hardware can run in more accuracy or more speed mode - it may have a different default in an optimized build
With regard to floating-point semantics, these are equivalent. However, it is conceivable that the compiler might decide to generate slightly different code sequences for the two, and that could result in differences in the result.
Can you post a complete code example that illustrates the difference? Without that to go on, anything anyone posts as an answer is just speculation.
To your concerns: memory alignment cannot effect the value of a double, and a compiler should be able to generate equivalent code for either example, so you don't need to worry that you're doing something wrong (at least, not in the limited example you posted).
The first way is more efficient, in a very theoretical way. It gives the compiler slightly more leeway in assigning stack slots and registers. In the second example, the compiler has to pick 2 consecutive slots - except of course if the compiler is smart enough to realize that you'd never notice.
It's quite possible that the double[2] causes the array to be allocated as two adjacent stack slots where it wasn't before, and that in turn can cause code reordering to improve memory access efficiency. IEEE754 floating point math doesn't obey the regular math rules, i.e. a+b+c != c+b+a