It makes sense to define mathematical constants as double values but what happens when one requires float values instead of doubles? Does the compiler automatically interpret the doubles as floats at compile-time (so they are actually treated as they were const floats) or is this conversion made at runtime?
If by "defining", you mean using #define, here's what happens:
Say you have:
#define CONST1 1.5
#define CONST2 1.12312455431461363145134614 // Assume some number too
// precise for float
Now if you have:
float x = CONST1;
float y = CONST2;
you don't get any warning for x because the compiler automatically makes CONST1 a float. For y, you get a warning because CONST2 doesn't fit in a float, but the compiler casts it to float anyway.
If by "defining", you mean using const variables, here's what happens:
Say you have
const double CONST1=1.5;
const double CONST2=1.12312455431461363145134614; // Assume some number too
// precise for float
Now if you have:
float x = CONST1;
float y = CONST2;
there is no way for the compiler to know the values of CONST1 and CONST2(*) and therefore cannot interpret the values as float at compile them. You will be given two warnings about possible loss of data and the conversion will be done at runtime.
(*) Actually there is a way. Since the values are const, the optimizer may decide not to take a variable for them, but replace the values throughout the code. This could get complicated though, as you may pass the address to these variables around, so the optimizer may decide not to do that. That is, don't count on it.
Note that, this whole thing is true for any basic type conversions. If you have
#define CONST3 1
then you think CONST3 is int, but if you put it in a float, it would become float at compile-time, or if you put it in a char, it would become char at compiler-time.
Related
I have a constant integer, steps, which is calculated using the floor function of the quotient of two other constant variables. However, when I attempt to use this as the length of an array, visual studio tells me it must be a constant value and the current value cannot be used as a constant. How do I make this a "true" constant that can be used as an array length? Is the floor function the problem, and is there an alternative I could use?
const int simlength = 3.154*pow(10,7);
const float timestep = 100;
const int steps = floor(simlength / timestep);
struct body bodies[bcount];
struct body {
string name;
double mass;
double position[2];
double velocity[2];
double radius;
double trace[2][steps];
};
It is not possible with the standard library's std::pow and std::floor function, because they are not constexpr-qualified.
You can probably replace std::pow with a hand-written implementation my_pow that is marked constexpr. Since you are just trying to take the power of integers, that shouldn't be too hard. If you are only using powers of 10, floating point literals may be written in the scientific notation as well, e.g. 1e7, which makes the pow call unnecessary.
The floor call is not needed since float/double to int conversion already does flooring implicitly. Or more correctly it truncates, which for positive non-negative values is equivalent to flooring.
Then you should also replace the const with constexpr in the variable declarations to make sure that the variables are usable in constant expressions:
constexpr int simlength = 3.154*my_pow(10,7); // or `3.154e7`
constexpr float timestep = 100;
constexpr int steps = simlength / timestep;
Theoretically only float requires this change, since there is a special exception for const integral types, but it seems more consistent this way.
Also, I have a feeling that there is something wrong with the types of your variables. A length and steps should not be determined by floating-point operations and types, but by integer types and operations alone. Floating-point operations are not exact and introduce errors relative to the mathematical precise calculations on the real numbers. It is easy to get unexpected off-by-one or worse errors this way.
You cannot define an array of a class type before defining the class.
Solution: Define body before defining bodies.
Furthermore, you cannot use undefined names.
Solution: Define bcount before using it as the size of the array.
Is the floor function the problem, and is there an alternative I could use?
std::floor is one problem. There's an easy solution: Don't use it. Converting a floating point number to integer performs similar operation implicitly (the behaviour is different in case of negative numbers).
std::pow is another problem. It cannot be replaced as trivially in general, but in this case we can use a floating point literal in scientific notation instead.
Lastly, non-constexpr floating point variable isn't compile time constant. Solution: Use constexpr.
Here is a working solution:
constexpr int simlength = 3.154e7;
constexpr float timestep = 100;
constexpr int steps = simlength / timestep;
P.S. trace is a very large array. I would recommend against using so large member variables, because it's easy for the user of the class to not notice such detail, and they are likely to create instances of the class in automatic storage. This is a problem because so large objects in automatic storage are prone to cause stack overflow errors. Using std::vector instead of an array is an easy solution. If you do use std::vector, then as a side effect the requirement of compile time constant size disappear and you will no longer have trouble using std::pow etc.
Because simlength is 3.154*10-to-the-7th, and because timestep is 10-squared, then the steps variable's value can be written as:
3.154e7 / 1e2 == 3.154e5
And, adding a type-cast, you should be able to write the array as:
double trace[2][(int)(3.154e5)];
Note that this is HIGHLY IRREGULAR, and should have extensive comments describing why you did this.
Try switching to constexpr:
constexpr int simlength = 3.154e7;
constexpr float timestep = 1e2;
constexpr int steps = simlength / timestep;
struct body {
string name;
double mass;
double position[2];
double velocity[2];
double radius;
double trace[2][steps];
};
Basically an integer variable should allow only integer values to be set for its variable. Then how come such special words as follows are allowed?
int a = 200L;
int a = 200U;
int a = 200F;
I found this when i run the program, it ran perfectly without giving any error. Other letters are not allowed as expected. But why these?
L, U and F means long, unsigned and float respectively.
so, the code means
int a = (long) 200;
int a = (unsigned) 200;
int a = (float) 200;
What you do is called implicit conversion.
If you are using gcc compiler you can add
-Wconversion
(not part of -Wall) option to check any implicit conversion that may alter the value.
Without any option, conversion from signed to unsigned is not warned by default. So you need to active
-Wsign-conversion
If you want an explicit conversion, it will not be warned by those 2 options.
int percent = (int)((int)4.1)*.5;
Two different things are going on here.
1) Some letters when stuck on the end of a number take on meaning. 'l' is for long, 'u' is for unsigned, and 'f' is for float.
"Long" is generally 64 bits wide vs int's 32 bits... but that can
vary wildly from machine to machine. DO NOT depend on bit width of
int and long.
"Unsigned" means it doesn't bother to track positive or
negative values... assuming everything is positive. This about
doubles how high an integer can go. Look up "two's complement" for
further information.
"Float" means "floating point". Non whole numbers. 1.5, 3.1415, etc. They can be very large, or very precise, but not both. Floats ARE 32 bits. "Double" is a 64-bit floating point value, which can permit some extreme values of size or precision.
2) Type Coercion, pronounced "co ER shun".
The compiler knows how to convert (coerce) from long to int, unsigned to int, or float to int. They're all just numbers, right? Note that converting from float to into "truncates" (drops) anything after a decimal place. ((int)3.00000001) == 3. ((int)2.9999999) == 2
If you dial your warnings up to max sensitivity, I believe those statements will all trigger warnings because all those conversions could potentially lose data... though the exact phrasing of that warning will vary from compiler to compiler.
Bonus Information:
You can trigger this same behavior (accidentally) with classes.
struct Foo {
Foo(int bar) {...}
};
Foo baz = 42;
The compiler will treat the above constructor as an option when looking to convert from int to Foo. The compiler is willing to hop through more than one hoop to get there... so Foo qux = 3.14159; would also compile. This is also true of other class constructors... so if you have some other class that takes a foo as it's only constructor param, you can declare a variable of that class and assign it something that can be coerced to a foo... and so on:
struct Corge {
Corge(Foo foo) {...}
};
corge grault = 1.2345; // you almost certainly didn't intend what will happen here
That's three layers of coercion. double to int, into to foo, and foo to corge. Bleh!
You can block this with the explicit keyword:
struct Foo {
explicit Foo(int bar) {...}
};
Foo baz = 1; // won't compile
I wish they'd made explicit the default and used some keyword to define conversion constructors instead, but that change would almost certainly break someone's code, so it'll never happen.
What happens is that you are telling the compiler to convert the value into a different type of data. That is to say:
int a = 200L; // It's like saying: Hey C++, convert this whole to Long
int a = 200U; // And this to Unsigned
int a = 200F; // And this one to Float
There is no error because the compiler understands that these letters at the end indicate a type of conversion.
I would like to quickly and correctly change the use of float to double and was advised this could be done through the correct usage of pre-processor definition. Unfortunately I've never done this, but I did however run into the following answer on here:
Switching between float and double precision at compile time
I was therefore wondering, if this method is how I should proceed, especially given how large the scope of the program I am dealing with is, or do I use an alternative method? If an alternative method, what recommendations would everyone have?
You don't want to use the pre-processor, you want to use typedef or using.
EDIT: Something like:
#ifdef USE_DOUBLES
typedef double my_float_type;
#else
typedef float my_float_type;
#endif
And then use my_float_type as the type of all you floating point variables in question. Then you can #define USE_DOUBLES or not from your make system or a common header file, etc.
Your OP design is fundamentally flawed because #defines of keywords (i.e. double and float) results in Undefined Behavior and you can't re-typedef builtin types (i.e. double and float).
I was therefore wondering, if this method is how I should proceed
The accepted answer to the linked question is a good way to do it: Use a typedef depending on the preprocessor definition. Being able to switch between a float and double types is sometimes useful. If you think you need to do it, then this is how you should proceed.
especially given how large the scope of the program I am dealing with is
The larger your program is, the harder it is to change all the variables whose type you want to switch unless you use the typedef.
So do I, as the link mentioned previously, simply do typedef double float for the first ifdef clause, and then for the else do I just do typedef float or would it be typedef float float?
No, don't do that.
Instead, modify the existing code and change all types that should be switchable to use a typedef instead of float.
If you were to do that (but don't), then typedef float float is pointless. You simply wouldn't need the else side of the ifdef.
do I need to change the f appended to instantiated values for the floats already in place or can they remain?
you should use the highest precision literal (double in this case) to avoid losing precision. But you should always convert the literal before using it in an expression to avoid precision related bugs.
If you have C++14 support, then you could use a user defined literal for the conversion:
// the typedef
#ifdef USE_DOUBLE_PRECISION
typedef double float_t;
#else
typedef float float_t;
#endif
// user defined literal
float_t operator "" _f(long double value) {
return value;
}
// example
float_t approximate_pi = 3.14_f;
Otherwise,
I'm often using the wrong literals in expressions, e.g. dividing a float by an int, like this:
float f = read_f();
float g = f / 2;
I believe that the compiler will in this case first convert the int literal (2) to float, and then apply the division operator. GCC and Clang have always let stuff like that pass, but Visual C++ warns about an implicit conversion. So I have to write it like this:
float f = read_f();
float g = f / 2.0f;
That got me wondering: Should I always use the appropriate literals for float, double, long etc.? I normally use int literals whenever I can get away with it, but I'm not sure if that's actually a good idea.
Is this a likely cause of subtle errors?
Is this only an issue for expressions or also for function parameters?
Are there warning levels for GCC or Clang that warn about such implicit conversions?
How about unsigned int, long int etc?
You should always explicitly indicate the type of literal that you intend to use. This will prevent problems when for example this sort of code:
float foo = 9.0f;
float bar = foo / 2;
changes to the following, truncating the result:
int foo = 9;
float bar = foo / 2;
It's a concern with function parameters as well when you have overloading and templates involved.
I know gcc has -Wconversion but I can't recall everything that it covers.
For integer values that fit in int I usually don't qualify those for long or unsigned as there is usually much less chance there for subtle bugs.
There's pretty much never an absolutely correct answer to a "should" question. Who's going to use this code, and for what? That's relevant here. But also, particularly for anything to do with floats, it's good to get into the habit of specifying exactly the operations you require. float*float is done in single-precision. anything with a double is done double-precision, 2 gets converted to a double so you're specifying different operations here.
The best answer here is What Every Computer Scientist Should Know About Floating-Point Arithmetic. I'd say don't tl;dr it, there are no simple answers with floating point.
Sometimes I have to convert from an unsigned integer value to a float. For example, my graphics engine takes in a SetScale(float x, float y, float z) with floats and I have an object that has a certain size as an unsigned int. I want to convert the unsigned int to a float to properly scale an entity (the example is very specific but I hope you get the point).
Now, what I usually do is:
unsigned int size = 5;
float scale = float(size);
My3DObject->SetScale(scale , scale , scale);
Is this good practice at all, under certain assumptions (see Notes)? Is there a better way than to litter the code with float()?
Notes: I cannot touch the graphics API. I have to use the SetScale() function which takes in floats. Moreover, I also cannot touch the size, it has to be an unsigned int. I am sure there are plenty of other examples with the same 'problem'. The above can be applied to any conversion that needs to be done and you as a programmer have little choice in the matter.
My preference would be to use static_cast:
float scale = static_cast<float>(size);
but what you are doing is functionally equivalent and fine.
There is an implicit conversion from unsigned int to float, so the cast is strictly unnecessary.
If your compiler issues a warning, then there isn't really anything wrong with using a cast to silence the warning. Just be aware that if size is very large it may not be representable exactly by a float.