Detect dangling references to temporary - c++

Clang 3.9 extremely reuses memory used by temporaries.
This code is UB (simplified code):
template <class T>
class my_optional
{
public:
bool has{ false };
T value;
const T& get_or_default(const T& def)
{
return has ? value : def;
}
};
void use(const std::string& s)
{
// ...
}
int main()
{
my_optional<std::string> m;
// ...
const std::string& s = m.get_or_default("default value");
use(s); // s is dangling if default returned
}
We have tons of code something like above (my_optional is just a simple example to illustrate it).
Because of UB all clang compiler since 3.9 starts to reuse this memory, and it is lawful behavior.
The question is: how to detect such dangling references at compile time or with something like sanitizer at runtime? No clang sanitizer can detect them.
Upd. Please do not answer: "use std::optional". Read carefully: question is NOT about it.
Upd2. Please do not answer: "your code design is bad". Read carefully: question is NOT about code design.

You can detect misuses of this particular API by adding an additional overload:
const T& get_or_default(T&& rvalue) = delete;
If the argument given to get_or_default is a true rvalue, it will be chosen instead, so compilation will fail.
As for detecting such errors at runtime, try using Clang's AddressSanitizer with use-after-return (ASAN_OPTIONS=detect_stack_use_after_return=1) and/or use-after-scope (-fsanitize-address-use-after-scope) detection enabled.

You could try out lvalue_ref wrapper from Explicit library. It prevents the unwanted binding to a temporary in one declaration, like:
const T& get_or_default(lvalue_ref<const T> def)
{
return has ? value : def.get();
}

That is an interesting question. The actual cause of the dangling ref is that you use an rvalue reference as if it was an lvalue one.
If you have not too much of that code, you can try to throw an exception that way:
class my_optional
{
public:
bool has{ false };
T value;
const T& get_or_default(const T&& def)
{
throw std::invalid_argument("Received a rvalue");
}
const T& get_or_default(const T& def)
{
return has ? value : def;
}
};
That way, if you pass it a ref to a temporary (which is indeed an rvalue), you will get an exception, that you will be able to catch or at least will give a soon abort.
Alternatively, you could try a simple fix by forcing to return a temporary value (and not a ref) if you were passed an rvalue:
class my_optional
{
public:
bool has{ false };
T value;
const T get_or_default(const T&& def)
{
return get_or_default(static_cast<const T&>(def));
}
const T& get_or_default(const T& def)
{
return has ? value : def;
}
};
Another possibility would be to hack the Clang compiler to ask it to detect whether the method is passed an lvalue or an rvalue, by I am not enough used to those techniques...

Related

Best practice for getters with ref-qualifier

The following code causes undefined behaviour:
class T
{
public:
const std::string& get() const { return s_; }
private:
std::string s_ { "test" };
}
void breaking()
{
const auto& str = T{}.get();
// do sth with "str" <-- UB
}
(because lifetime extension by const& doesn't apply here, as it's my understanding).
To prevent this, one solution might be to add a reference qualifier to get() to prevent it being called on LValues:
const std::string& get() const & { return s_; }
However, because the function now is both const and & qualified, it is still possible to call get() on RValues, because they can be assigned to const&:
const auto& t = T{}; // OK
const auto& s1 = t.get(); // OK
const auto& s2 = T{}.get(); // OK <-- BAD
The only way to prevent this (as far as I can see) is to either overload get() with a &&-qualified variant that doesn't return a reference, or to = delete it:
const std::string& get() const & { return s_; }
const std::string& get() const && = delete; // Var. 1
std::string get() const && { return s_; }; // Var. 2
However, this implies that to implement getter-functions that return (const) references correctly, I always have to provide either Var. 1 oder 2., which amounts to a lot of boilerplate code.
So my question is:
Is there a better/leaner way to implement getter-funtions that return references, so that the compiler can identify/prevent the mentioned UB-case? Or is there a fundamental flaw in my understanding of the problem?
Also, so far I couldn't find an example where adding & to a const member function brings any advantages without also handling the && overload...maybe anyone can provide one, if it exists?
(I'm on MSVC 2019 v142 using C++17, if that makes any difference)
Thank you and best regards
It's somewhat unclear what limitations you're working with. If it is an option, you could get rid of the getter(s), and let lifetime extension do its thing:
struct T
{
std::string s_ { "test" };
};
const auto& str = T{}.s_; // OK; lifetime extended
With getters, you have the options of 1. providing duplicate getters or 2. accept that the caller must be careful to not assume that a reference from getter of a temporary would remain valid. As shown in the question.
You could keep private access while still making lifetime management easy by using shared ownership:
class T
{
std::shared_ptr<std::string> s = std::make_shared<std::string>("test");
public:
// alternatively std::weak_ptr
const std::shared_ptr<const std::string>
get() const {
return s;
}
};
But you must consider whether the runtime cost is worth the easiness.

Bullet-proofing C++ temporary lifetimes?

Revisiting lifetime extension in C++, I found out that there are some patterns that break "decomposability" of C++ expressions. For example, the following two blocks are a valid C++ code:
class NonMovable {
public:
NonMovable(NonMovable&&) = delete;
NonMovable(const NonMovable&) = delete;
NonMovable();
int Value() const;
};
template <class T>
const T& identity(const T& x) {
return x;
}
template <class T>
class A {
public:
explicit A(const T& value) : value_(value) {}
const T& GetValue() const {
return value_;
}
private:
const T& value_;
};
Correct usage:
int main() {
int retcode = identity(
identity(/*tmp1*/ A(/*tmp2*/ NonMovable{}).GetValue())).Value();
// tmp1 and tmp2 end their lifetimes here:
// their full-expression is the whole previous line
return retcode;
}
But if we decompose the first expression in main, it becomes invalid:
int main() {
auto&& a_obj = /*tmp1*/ A(/*tmp2*/ NonMovable{});
// tmp2 lifetime ends here
// oops! dereferencing dangling reference:
int retcode = identity(
identity(a_obj.GetValue())).Value();
return retcode;
// tmp1 lifetime ends here
}
My question is:
Is it possible to disable the second kind of usage?
P.S.: I'm not really sure if the second main introduces UB, because I've tested it with clang -Wlifetime, and it doesn't complain. But I still believe it is UB. In real life I've came across a similar behaviour: the code broke, emmiting UBSan warnings and segfaults if I decomposed a single expression into two separate ones.
P.P.S.: those identitys don't really matter much, if I understand object lifetimes correctly (which I now doubt)
Your analysis is correct. Without lifetime extension, all temporaries are destroyed at the end of the "full expression", i.e. the ; at the end of the line. So when you say
int retcode = A(NonMovable{}).GetValue().Value();
(comments and identity calls removed for clarity) then everything is okay; the NonMovable object is still alive at the time you ask for its value.
On the other hand, when you say
auto&& a_obj = A(NonMovable{});
then the NonMovable is destroyed at the end of the line, and the A object will be holding a dangling reference. (As an aside, auto&& just lifetime-extends the temporary A here -- you may as well just use plain auto)
My question is: Is it possible to disable the second kind of usage?
Not really, at least as far as I know. You could add a deleted A(NonMovable&&) constructor, but this would also prevent "correct" usage as in the first example. The is exactly the same issue that occurs with std::string_view (and will occur with std::span in C++20) -- essentially, your A class has reference semantics, but is referring to a temporary which has been destroyed.
So by using collective mind, in the comments under the question we've managed to come up with the following implementation of A, which might be applicable to some use cases (but not std::span or std::string_view usage):
struct Dummy;
template <class T>
class A {
public:
explicit A(const T& value) : value_(value) {}
template <class TDummy = Dummy>
const T& GetValue() const& {
static_assert(!std::is_same_v<TDummy, Dummy>,
"Stop and think, you're doing something wrong!"
"And in any case, don't use std::move on this class!");
}
const T& GetValue() && {
return value_;
}
private:
const T& value_;
};
Now, if one tries to compile the following code, he will get a descriprive error message:
int main() {
auto&& a_obj = A(NonMovable{});
// will not compile:
int retcode = identity(
identity(a_obj.GetValue())).Value();
return retcode;
}
The reason is that decltype((a_obj)) == A<NonMovable>&, so it binds to the method that produces a compile time error.
It satisfies my use cases, but, sadly, this is not a universal solution -- it depends on what one wants from class A.

Are references helpful when operating in a constexpr/compile-time context only?

I am exploring the world of constexpr and have decided to create a class that should only be used in constexpr context and other compile-time constructs.
Usually, I take great care to provide all necessary overloads a class may need, for example:
template <typename T>
struct Thing
{
Thing(T value) : m_value(value) {}
T &value() & { return m_value; }
const T &value() const & { return m_value; }
T &&value() && { return std::move(m_value); }
private:
T m_value;
};
The set of overloads for Thing::value should take care of efficient access to the stored value, no unnecessary copies are made. If the Thing instance is a temporary, the stored value can even be moved out.
But what if Thing is only to be used as a constexpr type, are all these different overloads for Thing::value required or even helpful at all? Or would the following be equivalent:
template <typename T>
struct Thing
{
constexpr Thing(T value) : m_value(value) {}
constexpr T value() const;
private:
T m_value;
};
My question basically boils down to: are references helpful (more efficient) when operating in a constexpr/compile-time context only; or is passing everything by value equivalent?
My question basically boils down to: are references helpful (more efficient) when operating in a constexpr/compile-time context only; or is passing everything by value equivalent?
It's a matter of what's your actual problem and how you plan to solve it. In most of the cases (all of them?) you won't need to use references in such a context, I agree, but you can still use them if required.
Here as a minimal, working example:
const int i = 0;
template <typename T>
struct Thing {
constexpr Thing(const T &value) : m_value(value) {}
constexpr const T & value() const { return m_value; }
private:
const T & m_value;
};
int main() {
static_assert(Thing<int>{i}.value() == 0, "!");
}
See it up and running on wandbox.
So, are references helpful (more efficient) in this case? Well, it's not a matter of efficiency or whatever. To use references in such a context you have to have a good reason and the language sets a lot of limitations. They solve a specific problem, it's not your taste to decide to use references.
If your problem requires you to use references, they are there for you (and please, contact me - I'm just curious to know what's that problem!). Otherwise feel free to keep on with passing by value.

RVO: Return value passed by value even if explicitly assigned to a const reference

I have a settings framework, which eventually caches values storing them into an std::map of boost::any.
Since I don't want the client to deal with exceptions, it provides a default value the settings framework would fallback in case the setting retrieval fails : this forces me to return the setting value by copy.
class SettingsMgr
{
public:
template<class T>
T getSetting(const std::string& settingName, const T& settingDefValue)
{
try
{
if(cache.find(settingName) != cache.end)
{
return any_cast<const T&>(cache.find(settingName)->second);
}
else
{
cache[settingName] = someDbRetrievalFunction<T>(settingName);
return any_cast<const T&>(cache.find(settingName)->second);
}
}
catch(...)
{
return settingDefValue;
}
}
// This won't work in case the default value needs to be returned
// because it would be a reference to a value the client - and not the SettingsMgr -
// owns (which might be temporary etc etc)
template<class T>
const T& getSettingByRef(const std::string& settingName, const T& settingDefValue);
private:
std::map<std::string, boost::any> cache;
}
Now, I wasn't expecting this to be a big deal since I thought that thanks to RVO magic a reference to the cached value owned by the settings framework would have been retured - especially when the client explicitly encapsulates the return value in a const reference!
According to my tests it does not seem to be the case.
void main() {
SettingsMgr sm;
// Assuming everything goes fine, SNAME is cached
const std::string& asettingvalue1 = sm.getSetting<std::string>("SNAME", "DEF_VALUE");
// Assuming everything goes fine, cached version is returned (no DB lookup)
const std::string& asettingvalue2 = sm.getSetting<std::string>("SNAME", "DEF_VALUE");
ASSERT_TRUE(&asettingvalue1 == &asettingvalue2); // Fails
const std::string& awrongsettingname = sm.getSettingByRef<std::string>("WRONGSETTINGNAME", "DEF_VALUE");
ASSERT_TRUE(awrongsettingname == "DEF_VALUE"); // Fails, awrongsettingname is random memory
}
You can go with the getSettingByRef version and prevent the possibility to pass any rvalue references:
template<class T>
const T & getSetting(const std::string& settingName, T&& settingDefValue) {
static_assert(false, "No rvalue references allowed!");
}

g++ strange warning

Working on a toy project that I started to answer an SO question I'm getting flooded by a g++ warning that I don't understand.
format.hpp:230: warning: dereferencing pointer ‘<anonymous>’
does break strict-aliasing rules
searching on the internet I've got the impression that this could be a g++ bug; is it really a bug and if yes is there any workaround for it? The full source code is too big for inclusion but is available here. Here is the part where the warning is triggered...
template<typename T>
class ValueWrapper : public ValueWrapperBase
{
public:
T x;
ValueWrapper(const T& x) : x(x) {}
virtual std::string toString(const Field& field) const
{
return Formatter<T>().toString(x, field);
}
private:
// Taboo
ValueWrapper(const ValueWrapper&);
ValueWrapper& operator=(const ValueWrapper&);
};
typedef std::map<std::string, ValueWrapperBase *> Env;
class Dict
{
private:
Env env;
public:
Dict() {}
virtual ~Dict()
{
for (Env::iterator i=env.begin(), e=env.end(); i!=e; ++i)
delete i->second;
}
template<typename T>
Dict& operator()(const std::string& name, const T& value)
{
Env::iterator p = env.find(name);
if (p == env.end())
{
env[name] = new ValueWrapper<T>(value);
}
else
{
ValueWrapperBase *vw = new ValueWrapper<T>(value);
delete p->second;
p->second = vw;
}
return *this;
}
const ValueWrapperBase& operator[](const std::string& name) const
{
Env::const_iterator p = env.find(name);
if (p == env.end())
throw std::runtime_error("Field not present");
return *(p->second);
}
private:
// Taboo
Dict(const Dict&);
Dict& operator=(const Dict&);
};
Line 230 is p->second = vw;.
I get the warning for every instantiation of the template method operator(), always about line 230.
EDIT
Apparently the bug is about the use of map iterators that can generate inline code that confuses the optimizer. Rewriting a section avoiding using iterators I got shorter code that also compiles cleanly without warnings.
template<typename T>
Dict& operator()(const std::string& name, const T& value)
{
ValueWrapperBase *vw = new ValueWrapper<T>(value);
ValueWrapperBase *& p(env[name]);
delete p;
p = vw;
return *this;
}
As far as I can tell this actually stems from code in map and not from your code itself.
According to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42032 and http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43978 which both deal with maps and are very similar to each other, that there are absolutely some cases where it warns incorrectly because it loses track of the dynamic types of the objects. They equally state that there are some cases where it warms properly.
Also they indicate that the warning is shushed in 4.5 until they can implement it properly.
Finally, did you try rewriting your method as follows to see if it helps the warning in 4.3/4.4?
template<typename T>
Dict& operator()(const std::string& name, const T& value)
{
ValueWrapperBase *vw = new ValueWrapper<T>(value);
delete env[name];
env[name] = new ValueWrapper<T>(value);
return *this;
}
-fno-strict-aliasing (see http://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Optimize-Options.html#index-fstrict_002daliasing-572) turns off gcc's strict aliasing optimisations, and (presumably) with it the warning.
See also What is the strict aliasing rule? and http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.
I've seen this "error" before and decided that it's often meaningless. I don't see anything wrong with your code. You might try your luck with newer versions of GCC--I seem to recall seeing this pop up somewhere around 4.3-4.4.
Edit: I said this warning/error is "often" meaningless. Not "usually." I absolutely do not advocate simply ignoring or disabling warnings just because they are annoying, but in this code, and in some of my own code, there is no apparent problem despite GCC's complaint.