C++ compilers emit warnings when a local variable may be uninitialized on first usage. However, sometimes, I know that the variable will always be written before being used, so I do not need to initialize it. When I do this, the compiler emits a warning, of course. Since my team is building with -Werror, the code will not compile. How can I turn off this warning for specific local variables. I have the following restrictions:
I am not allowed to change compiler flags
The solution must work on all compilers (i.e., no gnu-extensions or other compiler specific attributes)
I want to use this only on specific local variables. Other uninitialized locals should still trigger a warning
The solution should not generate any instructions.
I cannot alter the class of the local variable. I.e., I cannot simply add a "do nothing" constructor.
Of course, the easiest solution would be to initialize the variable. However, the variable is of a type that is costly to initialize (even default initialization is costly) and the code is used in a very hot loop, so I do not want to waste the CPU cycles for an initialization that is guaranteed to be overwritten before it is read anyway.
So is there a platform-independent, compiler-independent way of telling the compiler that a local variable does not need to be initialized?
Here is some example code that might trigger such a warning:
void foo(){
T t;
for(int i = 0; i < 100; i++){
if (i == 0) t = ...;
if (i == 1) doSomethingWith(t);
}
}
As you see, the first loop cycle initializes t and the second one uses it, so t will never be read uninitialized. However, the compiler is not able to deduce this, so it emits a warning. Note that this code is quite simplified for the sake of brevity.
My answer will recommend another approach: instead of disabling the warning code, just do some reformulation on the implementation. I see two approaches:
First Option
You can use pointers instead of a real object and guarantee that it will be initialized just when you need it, something like:
std::unique_ptr<T> t;
for(int i=0; i<100; i++)
{
if(i == 0) if(t.empty()) t = std::unique_ptr<T>(new T); *t = ...;
if(i == 1) if(t.empty()) t = std::unique_ptr<T>(new T); doSomethingWith(*t);
}
It's interesting to note that probably when i==0, you don't need to construct t using the default constructor. I can't guess how your operator= is implemented, but I supose that probably you are assigning an object that's already allocated in the code that you are omitting in the ... segment.
Second Option
As your code experiences such a huge performance loss, I can infer that T will never be an basic tipe (ints, floats, etc). So, instead of using pointers, you can reimplement your class T in a way that you use an init method and avoid initializing it on the constructor. You can use some boolean to indicate if the class needs initalization or not:
class FooClass()
{
public:
FooClass() : initialized(false){ ... }
//Class implementation
void init()
{
//Do your heavy initialization code here.
initialized = true;
}
bool initialized() const { return initialized; }
private:
bool initialized;
}
Than you will be able to write it like this:
T t;
for(int i=0; i<100; i++)
{
if(i == 0) if(!t.initialized()) t.init(); t = ...;
if(i == 1) if(!t.initialized()) t.init(); doSomethingWith(t);
}
If the code is not very complex, I usually unroll one of the iterations:
void foo(){
T t;
t = ...;
for(int i = 1; i < 100; i++){
doSomethingWith(t);
}
}
Related
Consider the following code class:
class A {
public:
int number;
vector<int> powers;
A () {
number = 5;
powers.resize(100);
}
long long getPower(int x) {
return powers[x];
}
void precompute() {
powers[0] = 1;
for (int i = 1; i < 100; i++) {
powers[i] = powers[i - 1] * number;
}
}
};
In the class A, we have a vector called powers and an integer number with the property that powers[k] stores the quantity numbers^k after the precompute() function has been called. If we want to answer several queries of the form "Compute numbers^x for some integer 0 <= x < 100", it makes sense to precompute all of these powers and return them when we need them as a constant-time operation (note: this is not a problem that I am actually facing. I have made this problem up for the sake of an example. Please ignore the fact that numbers^x would exceed the maximum value of a long long).
However, there is one issue: the user must have called the precompute() function before calling the getPower() function.
This leads me to the following question: Is there some nice way to enforce the constraint that some function A can only be called after function B is called? Of course, one could just use a flag variable, but I am wondering if there is a more elegant way to do this so that it becomes a compile-time error.
Another option would be to always call the precompute() function in the constructor, but this may not be an optimal solution if we weren't always going to call precompute() in the first place. If calling precompute() is a sufficiently expensive (computationally), then this method would not be preferable.
I would prefer getting a compile-time error over a runtime error, but I am open to all approaches. Does anyone have any ideas?
One solution to your problem would be to call the precompute function in the constructor of class A.
Alternatively, as has already been suggested in the comments section, you could make the function getPower check a flag which specifies whether precompute has already been called, and if not, either perform the call itself or print an error message.
I can't think of a way to force this check to be done at compile time. However, if you want to eliminate this run-time check from release builds, you could use conditional compilation so that these checks are only included in debug builds, for example by using the assert macro or by using preprocessor directives, like this:
// note that NDEBUG is normally only defined in release builds, not debug builds
#ifdef NDEBUG
//check for flag here and print error message if flag has unexpected value
#endif
As alternative, to enforce timing dependency, you might make the dependency explicit.
For example:
class PowerGetter
{
friend class A;
const A& a;
public:
long long getPower(int x) {
return a.powers[x];
}
};
class A {
public:
int number = 5;
std::vector<int> powers = std::vector<int>(100);
A() = default;
PowerGetter precompute() {
powers[0] = 1;
for (int i = 1; i < 100; i++) {
powers[i] = powers[i - 1] * number;
}
return {*this};
}
};
Then to call getPower we need a PowerGetter which can only be obtained by calling precompute first.
For that contrived example, simpler would be to place initialization in A though.
class RF
{
public:
bitset<32> ReadData1, ReadData2;
RF()
{
Registers.resize(32);
Registers[0] = bitset<32> (0);
}
void ReadWrite(bitset<5> RdReg1, bitset<5> RdReg2, bitset<5> WrtReg, bitset<32> WrtData, bitset<1> WrtEnable)
{
// implement the funciton by you.
}
void OutputRF() // write RF results to file
{
ofstream rfout;
rfout.open("RFresult.txt",std::ios_base::app);
if (rfout.is_open())
{
rfout<<"A state of RF:"<<endl;
for (int j = 0; j<32; j++)
{
rfout << Registers[j]<<endl;
}
}
else cout<<"Unable to open file";
rfout.close();
}
private:
vector<bitset<32> >Registers;
};
RF() is the constructor, but since all it does is resize Registers to 32, you can remove it if you specify that initialization on the member directly, like this:
vector<bitset<32> > Registers = vector<bitset<32> >(32);
Then Registers will be constructed with size 32x32 bits by default, and all the bits will be zero as well, so you can remove the entire RF() function.
Note: At first I thought you could use vector<bitset<32> > Registers{32} but due to vagaries of C++ syntax that does the wrong thing. Thanks to Fureeish for that.
The short answer to your question is that, yes, for your current program, it is necessary.
The RF() function in this case is the function called when we initialize the RF object, eg.
RF new_RF;
Would run the RF() function and set things up. For this reason, it is called a 'constructor', because it helps you 'construct' your class.
In your case, the constructor is necessary for your program because it sets up your Registers variable, so that the code below from your OutputRF() function can run.
for (int j = 0; j<32; j++)
{
rfout << Registers[j]<<endl;
}
It's also useful because we can use it to set up many things, for example, if our RF() constructor looked like this:
RF(int a)
{
Registers.resize(a);
Registers[0] = bitset<a> (0);
}
It would instead resize the RF Registers to int a. You can look here for a more in-depth tutorial about constructors.
Hope that helps!
It's my first year of using C++ and learning on the way. I'm currently reading up on Return Value Optimizations (I use C++11 btw). E.g. here https://en.wikipedia.org/wiki/Return_value_optimization, and immediately these beginner examples with primitive types spring to mind:
int& func1()
{
int i = 1;
return i;
}
//error, 'i' was declared with automatic storage (in practice on the stack(?))
//and is undefined by the time function returns
...and this one:
int func1()
{
int i = 1;
return i;
}
//perfectly fine, 'i' is copied... (to previous stack frame... right?)
Now, I get to this and try to understand it in the light of the other two:
Simpleclass func1()
{
return Simpleclass();
}
What actually happens here? I know most compilers will optimise this, what I am asking is not 'if' but:
how the optimisation works (the accepted response)
does it interfere with storage duration: stack/heap (Old: Is it basically random whether I've copied from stack or created on heap and moved (passed the reference)? Does it depend on created object size?)
is it not better to use, say, explicit std::move?
You won't see any effect of RVO when returning ints.
However, when returning large objects like this:
struct Huge { ... };
Huge makeHuge() {
Huge h { x, y, x };
h.doSomething();
return h;
}
The following code...
auto h = makeHuge();
... after RVO would be implemented something like this (pseudo code) ...
h_storage = allocate_from_stack(sizeof(Huge));
makeHuge(addressof(h_storage));
auto& h = *properly_aligned(h_storage);
... and makeHuge would compile to something like this...
void makeHuge(Huge* h_storage) // in fact this address can be
// inferred from the stack pointer
// (or just 'known' when inlining).
{
phuge = operator (h_storage) new Huge(x, y, z);
phuge->doSomething();
}
I thought I will save some time if I declare iterating variable once as a class member:
struct Foo {
int i;
void method1() {
for(i=0; i<A; ++i) ...
}
void method2() {
for(i=0; i<B; ++i) ...
}
} foo;
however, this seems to be cca 20% faster
struct Foo {
void method1() {
for(int i=0; i<A; ++i) ...
}
void method2() {
for(int i=0; i<B; ++i) ...
}
} foo;
in this code
void loop() { // Arduino loops
foo.method1();
foo.method2();
}
Can you explain the performance difference?
(I need to run many simple paralel "processes" on Arduino where such microoptimalization makes a difference.)
When you declare your loop variable inside a loop, it is scoped very narrowly. The compiler is free to keep it in a register all the time, so it does not get committed to memory even once.
When you declare your loop variable as an instance variable, the compiler has no such flexibility. It must keep the variable in memory, in case some of your methods would want to examine its state. For example, if you do this in your first code example
void method2() {
for(i=0; i<B; ++i) { method3(); }
}
void method3() {
printf("%d\n", i);
}
the value of i in method3 must be changing as the loop progresses. The compiler has no way around committing all its side effects to memory. Moreover, it cannot assume that i stayed the same when you come back from method3, further increasing the number of memory accesses.
Dealing with updates in memory requires a lot more CPU cycles than performing updates to register-based variables. That is why it is always a good idea to keep your loop variables scoped down to the loop level.
Can you explain the performance difference?
The most plausible explanation I could come up for this performance difference is:
Data member i is declared on the global memory, which cannot be kept in the register all the time, hence operations on it would be way slower than on the loop variable i due to a very broad scope (The data member i has to cater for all the member functions of the class).
#DarioOO adds:
In addition the compiler is not free to store it temporary in a
register because method3() could throw an exception leaving the object
in a unwanted state (because theoretically no one prevent to you to
write
int k=this->i; for(k=0;k<A;k++)method3(); this->i=k;. That code
would be almost as fast as local variable but you have to keep into
account when method3() throws (I believe when there is the guarantee it
does not throw the compiler will optimize that with -O3 or -O4 to be
verified)
With the code below, the question is:
If you use the "returnIntVector()" function, is the vector copied from the local to the "outer" (global) scope? In other words is it a more time and memory consuming variation compared to the "getIntVector()"-function? (However providing the same functionality.)
#include <iostream>
#include <vector>
using namespace std;
vector<int> returnIntVector()
{
vector<int> vecInts(10);
for(unsigned int ui = 0; ui < vecInts.size(); ui++)
vecInts[ui] = ui;
return vecInts;
}
void getIntVector(vector<int> &vecInts)
{
for(unsigned int ui = 0; ui < vecInts.size(); ui++)
vecInts[ui] = ui;
}
int main()
{
vector<int> vecInts = returnIntVector();
for(unsigned int ui = 0; ui < vecInts.size(); ui++)
cout << vecInts[ui] << endl;
cout << endl;
vector<int> vecInts2(10);
getIntVector(vecInts2);
for(unsigned int ui = 0; ui < vecInts2.size(); ui++)
cout << vecInts2[ui] << endl;
return 0;
}
In theory, yes it's copied. In reality, no, most modern compilers take advantage of return value optimization.
So you can write code that acts semantically correct. If you want a function that modifies or inspects a value, you take it in by reference. Your code does not do that, it creates a new value not dependent upon anything else, so return by value.
Use the first form: the one which returns vector. And a good compiler will most likely optimize it. The optimization is popularly known as Return value optimization, or RVO in short.
Others have already pointed out that with a decent (not great, merely decent) compiler, the two will normally end up producing identical code, so the two give equivalent performance.
I think it's worth mentioning one or two other points though. First, returning the object does officially copy the object; even if the compiler optimizes the code so that copy never takes place, it still won't (or at least shouldn't) work if the copy ctor for that class isn't accessible. std::vector certainly supports copying, but it's entirely possible to create a class that you'd be able to modify like in getIntVector, but not return like in returnIntVector.
Second, and substantially more importantly, I'd generally advise against using either of these. Instead of passing or returning a (reference to) a vector, you should normally work with an iterator (or two). In this case, you have a couple of perfectly reasonable choices -- you could use either a special iterator, or create a small algorithm. The iterator version would look something like this:
#ifndef GEN_SEQ_INCLUDED_
#define GEN_SEQ_INCLUDED_
#include <iterator>
template <class T>
class sequence : public std::iterator<std::forward_iterator_tag, T>
{
T val;
public:
sequence(T init) : val(init) {}
T operator *() { return val; }
sequence &operator++() { ++val; return *this; }
bool operator!=(sequence const &other) { return val != other.val; }
};
template <class T>
sequence<T> gen_seq(T const &val) {
return sequence<T>(val);
}
#endif
You'd use this something like this:
#include "gen_seq"
std::vector<int> vecInts(gen_seq(0), gen_seq(10));
Although it's open to argument that this (sort of) abuses the concept of iterators a bit, I still find it preferable on practical grounds -- it lets you create an initialized vector instead of creating an empty vector and then filling it later.
The algorithm alternative would look something like this:
template <class T, class OutIt>
class fill_seq_n(OutIt result, T num, T start = 0) {
for (T i = start; i != num-start; ++i) {
*result = i;
++result;
}
}
...and you'd use it something like this:
std::vector<int> vecInts;
fill_seq_n(std::back_inserter(vecInts), 10);
You can also use a function object with std::generate_n, but at least IMO, this generally ends up more trouble than it's worth.
As long as we're talking about things like that, I'd also replace this:
for(unsigned int ui = 0; ui < vecInts2.size(); ui++)
cout << vecInts2[ui] << endl;
...with something like this:
std::copy(vecInts2.begin(), vecInts2.end(),
std::ostream_iterator<int>(std::cout, "\n"));
In C++03 days, getIntVector() is recommended for most cases. In case of returnIntVector(), it might create some unncessary temporaries.
But by using return value optimization and swaptimization, most of them can be avoided. In era of C++11, the latter can be meaningful due to the move semantics.
In theory, the returnIntVector function returns the vector by value, so a copy will be made and it will be more time-consuming than the function which just populates an existing vector. More memory will also be used to store the copy, but only temporarily; since vecInts is locally scoped it will be stack-allocated and will be freed as soon as the returnIntVector returns. However, as others have pointed out, a modern compiler will optimize away these inefficiencies.
returnIntVector is more time consuming because it returns a copy of the vector, unless the vector implementation is realized with a single pointer in which case the performance is the same.
in general you should not rely on the implementation and use getIntVector instead.