Un-initialize a variable in C/C++ - c++

This is more a theoretical question than a practical one, but I was wondering whether is it possible to un-initialize a variable in C (or C++). So let's say we have the following code:
void some_fun()
{
int a; // <-- Here a is an un-initialized variable, it's value is the value
// found in the memory at the given location.
a = 7; // Now I have initialized it
// do something with a ...
// and here do something again with a that it
// will have the same state (ie: indeterministic) as before initialization
}
(No, I don't want to put a random value in a because that would be also an initialization, nor 0, because that's a very nice value, ... I just want it to be again in that "I don't know anything about it" stage it was before initializing it).
(Yes I am aware of: What happens to a declared, uninitialized variable in C? Does it have a value?)

You can use setjmp() and longjmp() to get the behavior you want, with some rearrangement of your code. The code below initializes a to 1 so that the print statements do not invoke undefined behavior.
jmp_buf jb;
void some_func (void)
{
int a = 1;
if (setjmp(jb) == 0) {
a = 7;
// do something
printf("a = %d (initialized)\n", a);
// use longjmp to make `a` not initialized
longjmp(jb, 1);
// NOTREACHED
} else {
printf("a = %d (not initialized)\n", a);
}
}
The longjmp() call returns back to the saved context of setjmp(), and moving to the else case means that a had not been initialized.
When compiled with GCC with optimizations, the above function outputs:
a = 7 (initialized)
a = 1 (not initialized)
If you want this behavior without optimizations enabled, try adding the register storage class to a's declaration.
A demo.
A longer explanation
So, why did I think setjmp() and longjmp() would work? This is what C.11 §7.13 ¶1-2 has to say about it:
The header <setjmp.h> defines the macro setjmp, and declares one function and
one type, for bypassing the normal function call and return discipline.
The type declared is
jmp_buf
which is an array type suitable for holding the information needed to restore a calling
environment. The environment of a call to the setjmp macro consists of information
sufficient for a call to the longjmp function to return execution to the correct block and
invocation of that block, were it called recursively. It does not include the state of the
floating-point status flags, of open files, or of any other component of the abstract
machine.
This explains that what is supposed to happen is that a longjmp back to the context saved in the jmp_buf by a call to setjmp will act as if the code that ran up until the longjmp call was a recursive function call, the the longjmp acts like a return from that recursive call back the setjmp. To me, this implies that the automatic variable would be "uninitialized".
int a;
// the following expression will be false if returning from `longjmp`
if (setjmp(jb) == 0) {
// this section of code can be treated like the `setjmp` call induced
// a recursive call on this function that led to the execution of the
// code in this body
a = 7;
//...
// assuming not other code modified `jb`, the following call will
// jump back to the `if` check above, but will behave like a recursive
// function call had returned
longjmp(jb, 1);
} else {
// `a` expected to be uninitialized here
}
But, there seems to be a catch. From C.11 §7.13.2 ¶3:
All accessible objects have values, and all other components of the abstract machine
have state, as of the time the longjmp function was called, except that the values of
objects of automatic storage duration that are local to the function containing the
invocation of the corresponding setjmp macro that do not have volatile-qualified type
and have been changed between the setjmp invocation and longjmp call are
indeterminate.
Since a is local, is not volatile-qualified, and has been changed between setjmp and longjmp calls, its value is indeterminate, even if it was properly initialized before calling setjmp!
So, using longjmp back to a local setjmp after an automatic non-volatile variable has been modified will always result in making those modified variables "uninitialized" after returning to the point of the setjmp.

You can emulate this using boost::optional<T>:
#include <boost/optional.hpp>
int main()
{
boost::optional<int> a;
a = 7;
std::cout << a.is_initialized() << std::endl; // true
a.reset(); // "un-initialize"
std::cout << a.is_initialized() << std::endl; // false
}

I am curious why you want to do that. However did you try following:
void some_fun() {
int a;
int b = a; // Hoping compiler does not discard this.
a = 7;
// do something
a = b;
}

Another approach is along the lines of:
int a, olda;
memcpy(&olda, &a, sizeof(a));
a = 7;
//...
memcpy(&a, &olda, sizeof(a));
// a is "uninitialized"
This avoids the trap representation issues of using assignment, relying on the fact that char does not have any trap representations. It also benefits from being vastly simpler than using setjmp() and longjmp().

Related

Is it unsafe to co_yield a pointer to a local coroutine variable?

It's common knowledge that returning a pointer to a stack variable is generally a bad idea:
int* foo() {
int i = 0;
return &i;
}
int main() {
int* p = foo();
}
In the example above, my understanding is that the int is destroyed and so p is a dangling pointer.
I am wondering about the extent to which this applies to the newly introduced coroutines of C++20:
generator<span<byte>> read(stream& s) {
array<byte, 4096> b;
while (s.is_open()) {
const size_t n = s.read_some(b);
co_yield span(b, n);
}
}
int main() {
stream s;
for (span<byte> v : read(s)) {
/* ... */
}
}
In this example, the coroutine read yields a span view into the local buffer b. Internally, that view stores a pointer to the buffer. Will that pointer ever be dangling when v is used with the body of the range-for loop?
For context, the coroutine code in the second example is modeled after the code in my own project. There, AddressSanitizer ends the program with a "use-after-free" error. Ordinarily I'd consider that enough to answer my question, but since coroutine development is still coming along at this point in time (my project is using boost::asio::experimental::coro, emphasis on "experimental"), I was wondering if the error was caused by a bug with generator's implementation or if returning pointers in this way is fundamentally incorrect (similar to the first example).
With language coroutines, this has to be safe: the lifetime of b must continue until the generator is finished, so pointers to it must be useful that long.

C++ understanding RVO (as compared to returning local variable reference)

It's my first year of using C++ and learning on the way. I'm currently reading up on Return Value Optimizations (I use C++11 btw). E.g. here https://en.wikipedia.org/wiki/Return_value_optimization, and immediately these beginner examples with primitive types spring to mind:
int& func1()
{
int i = 1;
return i;
}
//error, 'i' was declared with automatic storage (in practice on the stack(?))
//and is undefined by the time function returns
...and this one:
int func1()
{
int i = 1;
return i;
}
//perfectly fine, 'i' is copied... (to previous stack frame... right?)
Now, I get to this and try to understand it in the light of the other two:
Simpleclass func1()
{
return Simpleclass();
}
What actually happens here? I know most compilers will optimise this, what I am asking is not 'if' but:
how the optimisation works (the accepted response)
does it interfere with storage duration: stack/heap (Old: Is it basically random whether I've copied from stack or created on heap and moved (passed the reference)? Does it depend on created object size?)
is it not better to use, say, explicit std::move?
You won't see any effect of RVO when returning ints.
However, when returning large objects like this:
struct Huge { ... };
Huge makeHuge() {
Huge h { x, y, x };
h.doSomething();
return h;
}
The following code...
auto h = makeHuge();
... after RVO would be implemented something like this (pseudo code) ...
h_storage = allocate_from_stack(sizeof(Huge));
makeHuge(addressof(h_storage));
auto& h = *properly_aligned(h_storage);
... and makeHuge would compile to something like this...
void makeHuge(Huge* h_storage) // in fact this address can be
// inferred from the stack pointer
// (or just 'known' when inlining).
{
phuge = operator (h_storage) new Huge(x, y, z);
phuge->doSomething();
}

C++ code with undefined behavior, compiler generates std::exception

I came across an interesting secure coding rule in C++ which states:
Do not reenter a function during the initialization of a static variable declaration. If a function is reentered during the constant initialization of a static object inside that function, the behavior of the program is undefined. Infinite recursion is not required to trigger undefined behavior, the function need only recur once as part of the initialization.
The non_compliant example of the same is:
#include <stdexcept>
int fact(int i) noexcept(false) {
if (i < 0) {
// Negative factorials are undefined.
throw std::domain_error("i must be >= 0");
}
static const int cache[] = {
fact(0), fact(1), fact(2), fact(3), fact(4), fact(5),
fact(6), fact(7), fact(8), fact(9), fact(10), fact(11),
fact(12), fact(13), fact(14), fact(15), fact(16)
};
if (i < (sizeof(cache) / sizeof(int))) {
return cache[i];
}
return i > 0 ? i * fact(i - 1) : 1;
}
which according to the source gives the error:
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
what(): std::exception
when executed in Visual Studio 2013. I tried similar code of my own and got the same error (compiled using g++ and executed, on Ubuntu).
I am doubtful if my understanding is correct with respect to this concept as I am not well-versed with C++. According to me, since the cache array is constant, which means it can be read-only and needs to be initialized only once as static, it is getting initialized again and again as the values for this array is the value returned by each of the comma-separated recursive function calls which is against the behavior of the declared array. Thus, it gives undefined behavior which is also stated in the rule.
What is a better explanation for this?
In order to execute fact(), you need to first statically initialize fact::cache[]. In order to initially fact::cache, you need to execute fact(). There's a circular dependency there, which leads to the behavior you see. cache will only be initialized once, but it requires itself to be initialized in order to initialize itself. Even typing this makes my head spin.
The right way to introduce a cache table like this is to separate it into a different function:
int fact(int i) noexcept(false) {
if (i < 0) {
// Negative factorials are undefined.
throw std::domain_error("i must be >= 0");
}
return i > 0 ? i * fact(i - 1) : 1;
}
int memo_fact(int i) noexcept(false) {
static const int cache[] = {
fact(0), fact(1), fact(2), fact(3), fact(4), fact(5),
fact(6), fact(7), fact(8), fact(9), fact(10), fact(11),
fact(12), fact(13), fact(14), fact(15), fact(16)
};
if (i < (sizeof(cache) / sizeof(int))) {
return cache[i];
}
else {
return fact(i);
}
}
Here, memo_fact::cache[] will only be initialized once - but its initialization is no longer dependent on itself. So we have no issue.
The C++ standard, §6.7/4, says the following about the initialisation of block-scope variables with static storage duration:
If control re-enters the declaration recursively while the variable is
being initialized, the behavior is undefined.
The following informative example is given:
int foo(int i) {
static int s = foo(2*i); // recursive call - undefined
return i+1;
}
This applies to your example as well. fact(0) is a recursive call, so the declaration of cache is re-entered. Undefined behaviour is invoked.
It's important to recall what undefined behaviour means. Undefined behaviour means that everything can happen, and "everything" quite naturally includes exceptions being thrown.
Undefined behaviour also means that you can no longer reason about anything else in the code, except when you really want to get down to compiler-implementation details. But then you are no longer talking about C++ in terms of using a programming language but in terms of how to implement that language.

I can call a function imported with dlsym() with a wrong signature, why?

host.cpp has:
int main (void)
{
void * th = dlopen("./p1.so", RTLD_LAZY);
void * fu = dlsym(th, "fu");
((void(*)(int, const char*)) fu)(2, "rofl");
return 0;
}
And p1.cpp has:
#include <iostream>
extern "C" bool fu (float * lol)
{
std::cout << "fuuuuuuuu!!!\n";
return true;
}
(I intentionally left errors checks out)
When executing host, “fuuuuuuuu!!!” is printed correctly, even though I typecasted the void pointer to the symbol with a completely different function signature.
Why did this happen and is this behavior consistent between different compilers?
This happened because UB, and this behaviour isn't consistent with anything, at all, ever, for any reason.
Because there's no information about function signature in void pointer. Or any information besides the address. You might get in trouble if you started to use parameters, tho.
This actually isn't a very good example of creating a case that will fail since:
You never use the arguments from the function fu.
Your function fu has less arguments (or the activation frame itself is smaller memory-footprint-wise) than the function pointer-type you're casting to, so you're never going to end-up with a situation where fu attempts to access memory outside its activation record setup by the caller.
In the end, what you're doing is still undefined behavior, but you don't do anything to create a violation that could cause issues, so therefore it ends up as a silent error.
is this behavior consistent between different compilers?
No. If your platform/compiler used a calling convention that required the callee to clean-up the stack, then oops, you're most likely hosed if there's a mis-match in the size of the activation record between what the callee and caller expect... upon return of the callee, the stack pointer would be moved to the wrong spot, possibly corrupting the stack, and completely messing up any stack-pointer relative addressing.
It's just happened, that
C uses cdecl call conversion (so caller clears the stack)
your function does not use given arguments arguments
so your call seems to work correctly.
But actually behavior is undefined. Changing signature or using arguments will cause your program crash:
ADD:
For example, consider stdcall calling conversion, where callee mast clear the stack. In this case, even if you declare correct calling conversion for both caller and callee, your program will still crash, because your stack will be corrupted, due to callee will clear it according to it signature, but caller fill according another signature:
#include <iostream>
#include <string>
extern "C" __attribute__((stdcall)) __attribute__((noinline)) bool fu (float * lol)
{
std::cout << "fuuuuuuuu!!!\n";
return true;
}
void x()
{
(( __attribute__((stdcall)) void(*)(int, const char*)) fu)(2, "rofl");
}
int main (void)
{
void * th = reinterpret_cast<void*>(&fu);
std::string s = "hello";
x();
std::cout << s;
return 0;
}

what for we have to write at the end of the function "return"? in c++

Can someone exlain me what is that "return" at the end of the function and why we have to write at the end of the main function return 0.e.g
int main()
{
.....
return 0;
}
You don't have to write return at the end of main in C++; a return value of 0 is implicit. (This is different in C, where you do have to return a value.)
What this does is return a value to the program's environment, so that it can be known whether the program succeeded (zero) or encountered some error (non-zero). Other programs, including shell scripts/batch files can use this information to make decisions, e.g. they can stop early when an error is encountered in a program they run.
All CPUs that support function calls have an instruction like RET, to explicitly return from inside the called function, back to the code that called the function. The memory address of the code to return to after the function call has already been saved in a "well know place" (e.g. the stack). The RET instruction will retrieve that memory address and point the CPU at the correct location, in order to resume executing at the code that comes after the original function call.
In c++, some functions are declared to "return" specific values (like the function main above), while other functions never return any values (those declared as having a return type of void). Its your choice how you declare the functions that you write. If the function return type is void, you would not need an explicit return statement in your code, unless you were returning prematurely, like from inside an if, else or loop. For example:
void foo(int x) {
if (x == 0)
return; // premature return to caller
int b = x*2;
// do some more stuff
// and now no need to say return, its done implicitly because we are at function end
}
However when your function is declared as having a non-void return type (for example int), then you should have an explicit return statement in the function, even if you are not returning prematurely.
int bar(int y) {
return y*7;
}
because the caller is expecting it and may assign the return value to a variable like this:
int z = bar(4);