Limiting the scope of a temporary variable - c++

Without resorting to the heap, I would like a temporary variable to pass out of scope, freeing its storage on the stack. However, I can think of no neat way to achieve the desired effect in a case like this:
#include <cstdlib>
#include <iostream>
int main()
{
const int numerator {14}, denominator {3};
// ...
// The desired scope begins here.
const std::div_t quotient_and_remainder {std::div(numerator, denominator)};
const int x_lo {
quotient_and_remainder.quot
};
const int x_hi {
quotient_and_remainder.quot + (quotient_and_remainder.rem ? 1 : 0)
};
// The desired scope ends here.
// The quotient_and_remainder should pass out of scope.
// ...
std::cout
<< "The quotient " << numerator << "/" << denominator
<< " is at least " << x_lo
<< " and is not more than " << x_hi << "." << "\n";
return 0;
}
At the low machine level, one desires the following.
Storage is first reserved on the stack for x_lo and x_hi.
The quotient_and_remainder is then pushed temporarily onto the stack.
The quotient_and_remainder is used to calculate values which are written into x_lo and x_hi.
Having done its duty, the quotient_and_remainder is popped off the stack.
At the high programming level, one desires the following.
The compiler enforces that x_lo and x_hi remain constant.
The name quotient_and_remainder does not clutter the function's entire local namespace.
At both levels, one would like to avoid wasting memory and runtime. (To waste compilation time is okay.)
In my actual program—which is too long to post here—the values of numerator and denominator are unknown at compile time, so constexpr cannot be used.
Ordinarily, I can think of various sound ways to achieve such effects using, for example, extra braces or maybe an anonymous lambda; but this time, I'm stumped.
(One should prefer to solve the problem without anti-idiomatic shenanigans like const_cast. However, if shenanigans were confined to a small part of the program to avert maintainabilty problems, then shenanigans might be acceptable if there existed no better choice.)

In C++17, you can use a function and a structured binding declaration:
auto x_lo_x_hi = [](int numerator, int denominator) {
std::div_t quotient_and_remainder {std::div(numerator, denominator)};
return std::pair{
quotient_and_remainder.quot,
quotient_and_remainder.quot + (quotient_and_remainder.rem ? 1 : 0)
};
};
const auto [x_lo, x_hi] = x_lo_x_hi(numerator, denominator);
Demo
x_lo_x_hi doesn't have to be a local lambda. It could be some function declared elsewhere or it could even be an IIFE (an anonymous lambda that you call immediately on the same line).
That said, this only serves to remove the name quotient_and_remainder from the local scope. Like MSalters said in a comment, any decent compiler will optimize that variable away in your original code. And even if it didn't, removing one variable from the stack will not improve the runtime, and if you have stack memory problems, the actual issue is somewhere else.

Related

Is Lambda Expression just capture objects before it? [duplicate]

I have a Visual Studio 2010 C++ program, the main function of which is:
vector<double> v(10);
double start = 0.0; double increment = 10.0;
auto f = [&start, increment]() { return start += increment; };
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
cout << endl << "Changing vars to try again..." << endl;
start = 15; increment = -1.5;
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
return 0;
When I compile this in MS Visual Studio, the first generate does what I expected, resulting in "10, 20, ... 100, ". The second does not; the lambda "sees" the change in start but not the change in increment, so I get "25, 35, ... 115, ".
MSDN explains that
The Visual C++ compiler binds a lambda expression to its captured variables when the expression is declared instead of when the expression is called. ... [T]he reassignment of [a variable captured by value] later in the program does not affect the result of the expression.
So my question is: is this standards-compliant C++11 behavior, or is it Microsoft's own eccentric implementation? Bonus: if it is standard behavior, why was the standard written that way? Does it have to do with enforcing referential transparency for functional programming?
With a lambda expression, the bound variables are captured at the time of declaration.
This sample will make it very clear: https://ideone.com/Ly38P
std::function<int()> dowork()
{
int answer = 42;
auto lambda = [answer] () { return answer; };
// can do what we want
answer = 666;
return lambda;
}
int main()
{
auto ll = dowork();
return ll(); // 42
}
It is clear that the capture must be happening before the invocation, since the variables being captured don't even exist (not in scope, neither in lifetime) anymore at a later time.
It's bound at creation time. Consider:
#include <functional>
#include <iostream>
std::function<int(int)> foo;
void sub()
{
int a = 42;
foo = [a](int x) -> int { return x + a; };
}
int main()
{
sub();
int abc = 54;
abc = foo(abc); // Note a no longer exists here... but it was captured by
// value, so the caller shouldn't have to care here...
std::cout << abc; //96
}
There's no a here when the function is called -- there'd be no way for the compiler to go back and update it. If you pass a by reference, then you have undefined behavior. But if you pass by value any reasonable programmer would expect this to work.
I think you are confusing the mechanism of capture with the mechanism of variable passing. They are not the same thing even if they bear some superficial resemblance to one another. If you need the current value of a variable inside a lambda expression, capture it by reference (though, of course, that reference is bound to a particular variable at the point the lambda is declared).
When you 'capture' a variable, you are creating something very like a closure. And closures are always statically scoped (i.e. the 'capture' happens at the point of declaration). People familiar with the concept of a lambda expression would find C++'s lambda expressions highly strange and confusing if it were otherwise. Adding a brand new feature to a programming language that is different from the same feature in other programming languages in some significant way would make C++ even more confusing and difficult to understand than it already is. Also, everything else in C++ is statically scoped, so adding some element of dynamic scoping would be very strange for that reason as well.
Lastly, if capture always happened by reference, then that would mean a lambda would only be valid as long as the stack frame was valid. Either you would have to add garbage collected stack frames to C++ (with a huge performance hit and much screaming from people who are depending on the stack being largely contiguous) or you would end up creating yet another feature where it was trivially easy to blow your foot off with a bazooka by accident as the stack frame referenced by a lambda expression would go out of scope and you'd basically be creating a lot of invisible opportunities to return local variables by reference.
Yes, it has to capture by value at the point because otherwise you could attempt to capture a variable (by reference for example) that no longer exists when the lambda/function is actually called.
The standard supports capturing both by value AND by reference to address both possible use cases. If you tell the compiler to capture by value it's captured at the point the lambda is created. If you ask to capture by reference, it will capture a reference to the variable which will then be used at the point the lambda is called (requiring of course that the referenced variable must still exist at the point the call is made).

How bad is redefining/shadowing a local variable?

While upgrading a legacy project to VS2015, I noticed there were a lot of errors where a local variable was redefined inside a function for example.
void fun()
{
int count = applesCount();
cout << "Apples cost= " << count * 1.25;
for (int bag=0; bag<5;bag++)
{
int count = orangesCount(bag);
cout << "Oranges cost = " << count * 0.75;
}
}
The error/warning message by compiler is:
declaration of 'count' hides previous local declaration
I know it is obviously not a good practice to use the same name for variable count but can the compiler really mess things up as well or generally they deal this situation rather gracefully?
Is it worth it to change and fix the variable names or is unlikely to cause any harm and is low or no risk?
I noticed there were a lot of errors where a local variable was redefined inside a function for example.
You are not demonstrating redefining here. You show an example of variable shadowing.
Variable shadowing is not an error syntactically. It is valid and well defined. However, if your intention was to use the variable from the outer scope, then you could consider it a logical error.
but can the compiler really mess things up
No.
The problem with shadowing is that it can be hard to keep track of for the programmer. It is trivial for the compiler. You can find plenty of questions on this very site, stemming from confusion caused by shadowed variables.
It is not too difficult to grok which expression uses which variable in this small function, but imagine the function being dozens of lines and several nested and sequential blocks. If the function is long enough that you cannot see all the different definitions in different scopes at a glance, you are likely to make a misinterpretation.
declaration of 'count' hides previous local declaration
This is a somewhat useful compiler warning. You haven't run out of names, so why not give a unique name for all local variables in the function? However, there is no need to treat this warning as an error. It is merely a suggestion to improve the readability of your program.
In this particular example, you don't need the count in the outer scope after the inner scope opens, so you might as well reuse one variable for both counts.
Is it worth it to change and fix the variable names
Depends on whether you value more short term workload versus long term. Changing the code to use unique, descriptive local variable names is "extra" work now, but every time someone has to understand the program later, unnecessary shadowing will increase the mental challenge.
IMHO, bad coding practice. Hard to maintain and read.
The compiler can discern between the outer variable and the internal variable.
With a good vocabulary (and a thesaurus), one doesn't need to use the same variable names.
Shadowing a variable (which is what this is) has completely well defined semantics, so the compiler won't mess it up. It will do exactly what it has been told, with well defined result.
The problem is (as is often the case) with the humans. It is very easy to make mistakes when reading and modifying the code. If one is not very careful it can be tricky to keep track of which variable with a given name is being referenced and it's easy to make mistakes where you think you are modifying one but in reality you are modifying another.
So, the compiler is fine, the programmer is the problem.
In my experience, compilers usually handle this issue pretty gracefully.
However, it is definitely bad practice, and unless you have a really compelling reason to do so, you should either reuse the old variable (if it logically makes sense to do so), or declare a [differently-named] new variable.
Can be very bad:
Consider this
std::vector<int> indices(100);
if (true) {
std::vector<int> indices(100);
std::iota(indices.begin(), indices.end(), 0);
}
// Now indices will be an all-0 vector instead of the an arithmetic progression!
In Visual Studio, compiling with Warning Level4 /W4 will output a warning even when disambiguating by prefixing with the implicit this pointer, like in this->Count:
warning C4458: declaration of 'Count' hides class member
Although the compiler rarely makes a mistake with the values, shadowing can be abused and get confusing overtime.
Below is an example of shadowing the Count member variable which should be avoided:
From the Unreal Coding Standard document.
class FSomeClass
{
public:
void Func(const int32 Count)
{
for (int32 Count = 0; Count != 10; ++Count)
{
// Use Count
}
}
private:
int32 Count;
}
One option to dealing with this would adding a prefix to the incoming argument name when shadowing occurs, like so:
class FSomeClass
{
public:
void Func(const int32 InCount)
{
Count = InCount;
for (int32 Counter = 0; Counter != 10; ++Counter)
{
// Use Count
}
}
private:
int32 Count;
}
When using RAII resources it may actually simplify the code to shadow variables and not create new variable names.
An example, a simple logger which writes to stdout when entering and leaving a block:
class LogScope {
private:
std::string functionName;
uint32_t lineNo;
public:
LogScope(std::string _functionName, uint32_t _lineNo) :
functionName(_functionName), lineNo(_lineNo) {
std::cout << "Entering scope in " << functionName << " starting line " << lineNo << std::endl;
};
~LogScope(void) {
std:: cout << "Exiting scope in " << functionName << " starting line " << lineNo << std::endl;
};
};
It could be used like:
void someFunction() { // First scope here.
LogScope logScope(__FUNCTION__, __LINE__);
// some code...
// A new block.
// There is really no need to define a new name for LogScope.
{
LogScope logScope(__FUNCTION__, __LINE__);
// some code...
}
}
Having said that, normally it is good practice, not to reuse your variable names.
Zar,
The compiler will handle this situation fine. In your example, the count variable is defined in two different scopes '{}'. Due to the scope of the variable, the assembly language will refer to two different addresses on the stack. The first 'count' might be the stack point, SP-8, while the inner count might be SP-4. Once transformed into an address the name is irrelevant.
I would not generally change working code for stylistic reasons. If the code is a mess then you run the risk of breaking it. Usually messy code doesn't have any good tests so it hard to know if you broke it.
If you need to enhance the code then certainly tidy up.
--Matt

lifetime of a temporary function parameter

Creating a temporary char buffer as a default function argument and binding an r-value reference to it allows us to compose statements on a single line whilst preventing the need to create storage on the heap.
const char* foo(int id, tmp_buf&& buf = tmp_buf()) // buf exists at call-site
Binding a reference/pointer to the temporary buffer and accessing it later yields undefined behaviour, because the temporary no longer exists.
As can be seen from the example app below the destructor for tmp_buf is called after the first output, and before the second output.
My compiler (gcc-4.8.2) doesn't warn that I'm binding a variable to a temporary. This means that using this kind of micro-optimisation to use an auto char buffer rather than std::string with associated heap allocation is very dangerous.
Someone else coming in and capturing the returned const char* could inadvertently introduce a bug.
1. Is there any way to get the compiler to warn for the second case below (capturing the temporary)?
Interestingly you can see that I tried to invalidate the buffer - which I failed to do, so it likely shows I don't fully understand where on the stack tmp_buf is being created.
2. Why did I not trash the memory in tmp_buf when I called try_stomp()? How can I trash tmp_buf?
3. Alternatively - is it safe to use in the manner I have shown? (I'm not expecting this to be true!)
code:
#include <iostream>
struct tmp_buf
{
char arr[24];
~tmp_buf() { std::cout << " [~] "; }
};
const char* foo(int id, tmp_buf&& buf = tmp_buf())
{
sprintf(buf.arr, "foo(%X)", id);
return buf.arr;
}
void try_stomp()
{
double d = 22./7.;
char buf[32];
snprintf(buf, sizeof(buf), "pi=%lf", d);
std::cout << "\n" << buf << "\n";
}
int main()
{
std::cout << "at call site: " << foo(123456789);
std::cout << "\n";
std::cout << "after call site: ";
const char* p = foo(123456789);
try_stomp();
std::cout << p << "\n";
return 0;
}
output:
at call site: foo(75BCD15) [~]
after call site: [~]
pi=3.142857
foo(75BCD15)
For question 2.
The reason you didn't trash the variable is that the compile probably allocated all the stack space it needed at the start of the function call. This includes all the stack space for the temporary objects, and objects that are declared inside a nested scope. You can't guarantee that the compiler does this (I think), rather than push objects on the stack as needed, but it is more efficient and easier to keep track of where your stack variables are this way.
When you call the try_stomp function, that function then allocates its stack after (or before, depending on your system) the stack for the main function.
Note that the default variables for a function call are actually by the compile to the calling code, rather than being part of the called function (which is why the need to be part of the function declaration, rather than the definition, if it was declared separately).
So your stack when in try_stomp looks something like this (there is a lot more going on in the stack, but these are the relevant parts):
main - p
main - temp1
main - temp2
try_stomp - d
try_stomp - buf
So you can't trash the temporary from try_stomp, at least not without doing something really outrageous.
Again, you can't rely on this layout, as it is compile dependent, and is just an exmaple of how the compiler might do it.
The way to trash the temporary buffer would be to do it in the destructor of tmp_buf.
Also interestingly, MSVC seems to allocate stack space for all of the temporary objects separately, rather than re-use the stack space for both objects. This means that even repeated calls to foo won't trash each other. Again, you can't depend on this behavior (I think - I couldn't find an reference to it).
For question 3.
No, don't do this!

Saving variables sequentially onto the stack

I'm trying to write a simple program to show how variables can be manipulated indirectly on the stack. In the code below everything works as planned: even though the address for a is passed in, I can indirectly change the value of c. However, if I delete the last line of code (or any of the last three), then this no longer applies. Do those lines somehow force the compiler to put my 3 in variables sequentially onto the stack? My expectation was that that would always be the case.
#include <iostream>
using namespace std;
void someFunction(int* intPtr)
{
// write some code to break main's critical output
int* cptr = intPtr - 2;
*cptr = 0;
}
int main()
{
int a = 1;
int b = 2;
int c = 3;
someFunction(&a);
cout << a << endl;
cout << b << endl;
cout << "Critical value is (must be 3): " << c << endl;
cout << &a << endl;
cout << &b << endl;
cout << &c << endl; //when commented out, critical value is 3
}
Your code causes undefined behaviour. You can't pass a pointer to an int and then just subtract an arbitrary amount from it and expect it to point to something meaningful. The compiler can put a, b, and c wherever it likes in whatever order it likes. There is no guaranteed relationship of any kind between them, so you you can't assume someFunction will do anything meaningful.
The compiler can place those wherever and in whatever order it likes in the current stack frame, it may even optimize them out if not used. Just make the compiler do what you want, by using arrays, where pointer arithmetic is safe:
int main()
{
int myVars[3] = {1,2,3};
//In C++, one could use immutable (const) references for convenience,
//which should be optimized/eliminated pretty well.
//But I would never ever use them for pointer arithmetic.
int& const a = myVars[0];
int& const b = myVars[1];
int& const c = myVars[2];
}
What you do is undefined behaviour, so anything may happen. But what is probably going on, is that when you don't take the adress of c by commenting out cout << &c << endl;, the compiler may optimize avay the variable c. It then substitutes cout << c with cout << 3.
As many have answered, your code is wrong since triggering undefined behavior, see also this answer to a similar question.
In your original code the optimizing compiler could place a, b and c in registers, overlap their stack location, etc....
There are however legitimate reasons for wanting to know what are the location of local variables on the stack (precise garbage collection, introspection and reflection, ...).
The correct way would then to pack these variables in a struct (or a class) and to have some way to access that structure (for example, linking them in a list, etc.)
So your code might start with
void fun (void)
{
struct {
int a;
int b;
int c;
} _frame;
#define a _frame.a
#define b _frame.b
#define c _frame.c
do_something_with(&_frame); // e.g. link it
You could also use array members (perhaps even flexible or zero-length arrays for housekeeping routines), and #define a _frame.v[0] etc...
Actually, a good optimizing compiler could optimize that nearly as well as your original code.
Probably, the type of the _frame might be outside of the fun function, and you'll generate housekeeping functions for inspecting, or garbage collecting, that _frame.
Don't forget to unlink the frame at end of the routine. Making the frame an object with a proper constructor and destructor definitely helps. The constructor would link the frame and the destructor would unlink it.
For two examples where such techniques are used (both because a precise garbage collector is needed), see my qish garbage collector and the (generated C++) code of MELT (a domain specific language to extend GCC). See also the (generated) C code of Chicken Scheme or Ocaml runtime conventions (and its <caml/memory.h> header).
In practice, such an approach is much more welcome for generated C or C++ code (precisely because you will also generate the housekeeping code). If writing them manually, consider at least fancy macros (and templates) to help you. See e.g. gcc/melt-runtime.h
I actually believe that this is a deficiency in C. There should be some language features (and compiler implementations) to introspect the stack and to (portably) backtrace on it.

In C++11, when are a lambda expression's bound variables supposed to be captured-by-value?

I have a Visual Studio 2010 C++ program, the main function of which is:
vector<double> v(10);
double start = 0.0; double increment = 10.0;
auto f = [&start, increment]() { return start += increment; };
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
cout << endl << "Changing vars to try again..." << endl;
start = 15; increment = -1.5;
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
return 0;
When I compile this in MS Visual Studio, the first generate does what I expected, resulting in "10, 20, ... 100, ". The second does not; the lambda "sees" the change in start but not the change in increment, so I get "25, 35, ... 115, ".
MSDN explains that
The Visual C++ compiler binds a lambda expression to its captured variables when the expression is declared instead of when the expression is called. ... [T]he reassignment of [a variable captured by value] later in the program does not affect the result of the expression.
So my question is: is this standards-compliant C++11 behavior, or is it Microsoft's own eccentric implementation? Bonus: if it is standard behavior, why was the standard written that way? Does it have to do with enforcing referential transparency for functional programming?
With a lambda expression, the bound variables are captured at the time of declaration.
This sample will make it very clear: https://ideone.com/Ly38P
std::function<int()> dowork()
{
int answer = 42;
auto lambda = [answer] () { return answer; };
// can do what we want
answer = 666;
return lambda;
}
int main()
{
auto ll = dowork();
return ll(); // 42
}
It is clear that the capture must be happening before the invocation, since the variables being captured don't even exist (not in scope, neither in lifetime) anymore at a later time.
It's bound at creation time. Consider:
#include <functional>
#include <iostream>
std::function<int(int)> foo;
void sub()
{
int a = 42;
foo = [a](int x) -> int { return x + a; };
}
int main()
{
sub();
int abc = 54;
abc = foo(abc); // Note a no longer exists here... but it was captured by
// value, so the caller shouldn't have to care here...
std::cout << abc; //96
}
There's no a here when the function is called -- there'd be no way for the compiler to go back and update it. If you pass a by reference, then you have undefined behavior. But if you pass by value any reasonable programmer would expect this to work.
I think you are confusing the mechanism of capture with the mechanism of variable passing. They are not the same thing even if they bear some superficial resemblance to one another. If you need the current value of a variable inside a lambda expression, capture it by reference (though, of course, that reference is bound to a particular variable at the point the lambda is declared).
When you 'capture' a variable, you are creating something very like a closure. And closures are always statically scoped (i.e. the 'capture' happens at the point of declaration). People familiar with the concept of a lambda expression would find C++'s lambda expressions highly strange and confusing if it were otherwise. Adding a brand new feature to a programming language that is different from the same feature in other programming languages in some significant way would make C++ even more confusing and difficult to understand than it already is. Also, everything else in C++ is statically scoped, so adding some element of dynamic scoping would be very strange for that reason as well.
Lastly, if capture always happened by reference, then that would mean a lambda would only be valid as long as the stack frame was valid. Either you would have to add garbage collected stack frames to C++ (with a huge performance hit and much screaming from people who are depending on the stack being largely contiguous) or you would end up creating yet another feature where it was trivially easy to blow your foot off with a bazooka by accident as the stack frame referenced by a lambda expression would go out of scope and you'd basically be creating a lot of invisible opportunities to return local variables by reference.
Yes, it has to capture by value at the point because otherwise you could attempt to capture a variable (by reference for example) that no longer exists when the lambda/function is actually called.
The standard supports capturing both by value AND by reference to address both possible use cases. If you tell the compiler to capture by value it's captured at the point the lambda is created. If you ask to capture by reference, it will capture a reference to the variable which will then be used at the point the lambda is called (requiring of course that the referenced variable must still exist at the point the call is made).