just looking in optimizing some std::map code. The map contains objects, accessed via the string-identifier.
Example:
std::map<std::string, CVeryImportantObject> theMap;
...
theMap["second"] = new CVeryImportantObject();
Now, when using the find-function as theMap->find("second"), the String is converted into std::string("second"), which causes new string allocations (over all when using IDL=2 with Visual Studio).
1. Is there a possibility to use a string-only class to avoid such allocations?
Intentionally I've tried to use another String-Class as well:
std::map<CString, CVeryImportantObject> theMap;
This code works also. But CString indeed is an object.
And: If you remove an object from the map, I'll need to release both the related object and the key, do I?
Any suggestions?
Now, when using the find-function as theMap->find("second"), the
String is converted into std::string("second"), which causes new
string allocations (over all when using IDL=2 with Visual Studio).
This is a Standard issue, which is fixed in C++14 for ordered containers. The newest version of VS, VS 14 CTP (which is a pre-release) contains a fix for this issue, as will new versions of other implementations.
If you need to avoid allocations, you can try a class like llvm::StringRef which can refer to std::string or string literals interchangably, but then you will be left trying to handle the ownership externally.
You can try something like unique_ptr<char[], maybe_delete> that sometimes deletes the contents. This is a bit of a mess to interface with though.
And: If you remove an object from the map, I'll need to release both
the related object and the key, do I?
The map will automatically destruct the key and value for you. For a class which frees it's own resources like std::string, which is the only sane way to write C++, then you can erase without worrying about resource cleanup.
If you always use string constants as keys, you can use const char * as key type in map when you use proper comparator:
struct PCharCompare {
bool operator()( const char *s1, const char *s2 ) const { return strcmp( s1, s2 ) < 0; }
};
std::map< const char *, CVeryImportantObject, PCharCompare> theMap;
Note: you have to be careful and need to understand how it works, as it can easily lead to UB:
void foo() {
char buffer[256];
snprintf( buffer, sizeof( buffer ), "blah" );
theMap.insert( std::make_pair( buffer, Object ) );
} // ups dangled pointer in the map
As for optimization, it is very unlikely that std::string creation is a culprit. you may try to use std::unordered_map or something similar for optimization
Now, when using the find-function as theMap->find("second"), the
String is converted into std::string("second"), which causes new
string allocations
Not necessarily. VC uses Small-String Optimisation (SSO). This means that for a string as short as "second", no allocation on the heap should take place at all; the characters will instead be stored directly in the temporarily created std::string object.
This is still not free (because the std::string has to be created, albeit without any dynamic allocation happening inside), but should be good enough. Is it really a concern for you? Chances are very high that it does not cause any measurable performance decrease.
Is there a possibility to use a string-only class to avoid such allocations?
Not really, except of the C++14 fix mentioned in other answers. Using char const * as the key type is very dangerous, because std::map will only store the actual addresses, not copies of the keys.
If I were you and if I really experienced performance problems, I'd just not use std::map directly but create my own container class to wrap a std::map<char const *, T, CustomComparison> and do the hard pointer work inside.
template <class ValueType>
class FastStringMap
{
private:
struct Comparison
{
bool operator()(char const *lhs, char const *rhs) const
{
return strcmp(lhs, rhs) > 0;
}
};
typedef std::map<char const *, ValueType, Comparison> WrappedMap;
WrappedMap m_map;
public:
typedef typename WrappedMap::iterator iterator;
typedef typename WrappedMap::const_iterator const_iterator;
bool insert(char const *key, ValueType const &value)
{
if (m_map.find(key) != m_map.end())
{
return false;
}
else
{
char *copy = new char[strlen(key) + 1];
strcpy(copy, key);
try
{
return m_map.insert(std::make_pair(copy, value)).second;
}
catch (...)
{
delete copy;
throw;
}
}
}
~FastStringMap()
{
for (iterator iter = m_map.begin(); iter != m_map.end(); ++iter)
{
delete[] iter->first;
}
}
iterator find(char const *key)
{
return m_map.find(key);
}
const_iterator find(char const *key) const
{
return m_map.find(key);
}
// further operations
};
To be used like this:
FastStringMap<int> m;
m.insert("AAA", 1);
m.insert("BBB", 2);
m.insert("CCC", 3);
std::cout << m.find("AAA")->second;
Note that you can possibly make this more sophisticated by templatising also on the character type (for std::wstring support) or by providing "real" iterator classes (using Boost Iterator Facade).
And: If you remove an object from the map, I'll need to release both
the related object and the key, do I?
If you use std::string, no. If you use char const * and if the pointers point to memory allocated dynamically (as in my example), then yes.
Related
Basically Im wanting to fetch a pointer of a constant and anonymous object, such as an instance of a class, array or struct that is inialised with T {x, y, z...}. Sorry for my poor skills in wording.
The basic code that Im trying to write is as follows:
//Clunky, Im sure there is an inbuilt class that can replace this, any information would be a nice addition
template<class T> class TerminatedArray {
public:
T* children;
int length;
TerminatedArray(const T* children) {
this->children = children;
length = 0;
while ((unsigned long)&children[length] != 0)
length++;
}
TerminatedArray() {
length = 0;
while ((unsigned long)&children[length] != 0)
length++;
}
const T get(int i) {
if (i < 0 || i >= length)
return 0;
return children[i];
}
};
const TerminatedArray<const int> i = (const TerminatedArray<const int>){(const int[]){1,2,3,4,5,6,0}};
class Settings {
public:
struct Option {
const char* name;
};
struct Directory {
const char* name;
TerminatedArray<const int> const children;
};
const Directory* baseDir;
const TerminatedArray<const Option>* options;
Settings(const Directory* _baseDir, const TerminatedArray<const Option> *_options);
};
//in some init method's:
Settings s = Settings(
&(const Settings::Directory){
"Clock",
(const TerminatedArray<const int>){(const int[]){1,2,0}}
},
&(const TerminatedArray<const Settings::Option>){(const Settings::Option[]){
{"testFoo"},
{"foofoo"},
0
}}
);
The code that I refer to is at the very bottom, the definition of s. I seem to be able to initialize a constant array of integers, but when applying the same technique to classes, it fails with:
error: taking address of temporary [-fpermissive]
I don't even know if C++ supports such things, I want to avoid having to have separate const definitions dirtying and splitting up the code, and instead have them clean and anonymous.
The reason for wanting all these definitions as constants is that Im working on an Arduino project that requires efficient balancing of SRAM to Flash. And I have a lot of Flash to my disposal.
My question is this. How can I declare a constant anonymous class/struct using aggregate initialization?
The direct (and better) equivalent to TerminatedArray is std::initializer_list:
class Settings {
public:
struct Option {
const char* name;
};
struct Directory {
const char* name;
std::initializer_list<const int> const children;
};
const Directory* baseDir;
const std::initializer_list<const Option>* options;
Settings(const Directory& _baseDir, const std::initializer_list<const Option>& _options);
};
//in some init method's:
Settings s = Settings(
{
"Clock",
{1,2,0}
},
{
{"testFoo"},
{"foofoo"}
}
);
https://godbolt.org/z/8t7j0f
However, this will almost certainly have lifetime issues (which the compiler tried to warn you about with "taking address of temporary"). If you want to store a (non-owning) pointer (or reference) then somebody else should have ownership of the object. But when initializing with temporary objects like this, nobody else does. The temporaries die at the end of the full expression, so your stored pointers now point to dead objects. Fixing this is a different matter (possibly making your requirements conflicting).
Somewhat relatedly, I'm not sure whether storing a std::initializer_list as class member is a good idea might. But it's certainly the thing you can use as function parameter to make aggregate initialization nicer.
&children[length] != 0 is still true or UB.
If you don't want to allocate memory, you might take reference to existing array:
class Settings {
public:
struct Option {
const char* name;
};
struct Directory {
const char* name;
std::span<const int> const children;
};
const Directory baseDir;
const std::span<const Option> options;
Settings(Directory baseDir, span<const Option> options);
};
//in some method:
const std::array<int, 3> ints{{1,2,0}};
const std::array<Settings::Option> options{{"testFoo"}, {"foofoo"}};
Settings s{"Clock", {ints}}, options};
First, you're not aggregate-initializing anything. This is uniform initialization and you're calling constructors instead of directly initializing members. This is because your classes have user-defined constructors, and classes with constructors can't be aggregate-initialized.
Second, you're not really able to "initialize a constant array of integers". It merely compiles. Trying to run it gives undefined behavior - in my case, trying to construct i goes into an infinite search for element value 0.
In C++, there's values on the stack, there's values on the heap and there's temporary values (I genuinely apologize to anyone who knows C++ for this statement).
Values on the heap have permanent addresses which you can pass around freely.
Values on the stack have temporary addresses which are valid until
the end of the block.
Temporary values either don't have addresses
(as your compiler warns you) or have a valid address for the duration
of the expression they're used for.
You're using such a temporary to initialize i, and trying to store and use the address of a temporary. This is an error and to fix it you can create your "temporary" array on the stack if you don't plan to use i outside of the block where your array will be.
Or you can create your array on the heap, use its address to initialize i, and remember to explicitly delete your array when you're done with it.
I recommend reading https://isocpp.org/faq and getting familiar with lifetime of variables and memory management before attempting to fix this code. It should give you a much better idea of what you need to do to make your code do what you want it to do.
Best of luck.
I have C++ code that investigates a BIG string and matches lots of substrings. As much as possible, I avoid constructing std::strings, by encoding substrings like this:
char* buffer, size_t bufferSize
At some point, however, I'd like to look up a substring in one of these:
std::unordered_map<std::string, Info> stringToInfo = {...
So, to do that, I go:
stringToInfo.find(std::string(buffer, bufferSize))
That constructs a std::string for the sole purpose of the lookup.
I feel like there's an optimization I could do here, by... changing the key-type of the unordered_map to some kind of temporary string imposter, a class like this...
class SubString
{
char* buffer;
size_t bufferSize;
// ...
};
... that does the same logic as std::string to hash and compare, but then doesn't deallocate its buffer when it's destroyed.
So, my question is: is there a way to get the standard classes to do this, or do I write this class myself?
What you're wanting to do is called heterogeneous lookup. Since C++14 it's been supported for std::map::find and std::set::find (note versions (3) and (4) of the functions, which are templated on the lookup value type). It's more complicated for unordered containers because they need to be told of or find hash functions for all key types that will produce the same hash value for the same text. There's a proposal under consideration for a future Standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0919r0.html
Meanwhile, you could use another library that already supports heterogenous lookup, e.g. boost::unordered_map::find.
If you want to stick to std::unordered_map, you could avoid creating so many string temporaries by storing a std::string member alongside your unordered_map that you can reassign values to, then pass that string to find. You could encapsulate this in a custom container class.
Another route is to write a custom class to use as your unordered container key:
struct CharPtrOrString
{
const char* p_;
std::string s_;
explicit CharPtrOrString(const char* p) : p_{p} { }
CharPtrOrString(std::string s) : p_{nullptr}, s_{std::move(s)} { }
bool operator==(const CharPtrOrString& x) const
{
return p_ ? x.p_ ? std::strcmp(p_, x.p_) == 0
: p_ == x.s_
: x.p_ ? s_ == x.p_
: s_ == x.s_;
}
struct Hash
{
size_t operator()(const CharPtrOrString& x) const
{
std::string_view sv{x.p_ ? x.p_ : x.s_.c_str()};
return std::hash<std::string_view>()(sv);
}
};
};
You can then construct CharPtrOrString from std::strings for use in the unordered container keys, but construct one cheaply from your const char* each time you call find. Note that operator== above has to work out which you did (convention used is that if the pointer's nullptr then the std::string member's in use) so it compares the in-use members. The hash function has to make sure a std::string with a particular textual value will produce the same hash as a const char* (which it doesn't by default with GCC 7.3 and/or Clang 6 - I work with both and remember one had an issue but not which).
In C++20, you can now do this:
// struct is from "https://www.cppstories.com/2021/heterogeneous-access-cpp20/"
struct string_hash {
using is_transparent = void;
[[nodiscard]] size_t operator()(const char *txt) const {
return std::hash<std::string_view>{}(txt);
}
[[nodiscard]] size_t operator()(std::string_view txt) const {
return std::hash<std::string_view>{}(txt);
}
[[nodiscard]] size_t operator()(const std::string &txt) const {
return std::hash<std::string>{}(txt);
}
};
// Declaration of map
std::unordered_map<std::string, Info, string_hash, std::equal_to<>> map;
std::string_view key = "foo";
if (map.find(key))
{
// do something here
}
Just note that you will still need std::string when using []. There may be a way around that, but I'm not too sure
I found this code in a C++ header-only wrapper around a C API I'm working with:
static string GetString(const char* chString)
{
string strValue;
if (NULL != chString)
{
strValue.swap(string (chString));
releaseMemory((void*&)chString);
chString = NULL;
}
return strValue;
}
I suppose the author is trying to give the string strValue ownership of chString and then free the empty buffer. I suspect this is very wrong (including it being const char*), but it actually seems to work with MSVC 12. At least I haven't seen it crash spectacularly yet.
Assuming that the C API and the C++ library are using the same heap (so that the string can reallocate the buffer if necessary and eventually release it), is there a way to properly achieve this? How about this?
template <typename T> struct Deleter { void operator()(T o) { releaseMemory((void*&)o); } };
static std::string GetString(char* chString)
{
if (NULL == chString)
return std::string();
return std::string(std::unique_ptr<char[], Deleter<char[]>>(chString).get());
}
Again, assuming the C API is using the same heap as std::string.
If that's also very wrong, then is there an immutable, owning C-style string wrapper? Something like string_view but immutable (so const char* input would be ok) and owning (so it deletes the C string, possibly with a custom deleter, in its dtor)?
I suppose the author is trying to give the string strValue ownership of chString and then free the empty buffer.
No. It makes an (inefficient and error-prone) copy of the character data pointed to by chString, then releases the memory pointed to by chString (which will be skipped if the copy throws an exception), and then returns the copy.
Assuming that the C API and the C++ library are using the same heap
That is not a correct assumption, or even a necessary one. The copy can use whatever heap it wants.
is there a way to properly achieve this? How about this?
You are on the right track to use a std::unique_ptr with a custom deleter, but there is no reason to use the T[] array specialization of std::unique_ptr.
The code can be simplified to something more like this:
void Deleter(char* o) { releaseMemory((void*&)o); }
static std::string GetString(char* chString)
{
std::string strValue;
if (chString) {
std::unique_ptr<char, decltype(&Deleter)>(chString, &Deleter);
strValue = chString;
}
return strValue;
}
Or, just get rid of the check for chString being null, it is not actually needed. std::string can be constructed from a null char*, and std::unique_ptr will not call its deleter with a null pointer:
void Deleter(char* o) { releaseMemory((void*&)o); }
static std::string GetString(char* chString)
{
std::unique_ptr<char, decltype(&Deleter)>(chString, &Deleter);
return std::string(chString);
}
Does this seem like a good solution for my last question (and ultimate goal of being able to use a char* like a string without copying it)?
template <typename DeleterT = std::default_delete<const char*>>
class c_str_view
{
public:
unique_ptr<const char*, DeleterT> strPtr_;
size_t len_;
c_str_view() {}
c_str_view(const char* charPtr) : strPtr_(charPtr), len_(strlen(charPtr)) {}
c_str_view(const char* charPtr, size_t len) : strPtr_(charPtr), len_(len) {}
operator std::string_view () const
{
return string_view(strPtr_.get(), len_);
}
};
If so, is there a good reason this isn't in the upcoming standard since string_view is coming? It only makes sense with string_view of course, since any conversion to std::string would cause a copy and make the whole exercise pointless.
Here's a test:
http://coliru.stacked-crooked.com/a/9046eb22b10a1d87
I use QStrings to store passwords. If to be more precise, I use QStrings to fetch passwords from GUI.
The point is that after password usage/appliance I need to nullify (zero) internal QStrings data bytes with password to eliminate it from memory entirely.
Here are my investigations:
After QString destruction it's data remains in memory nonzeroed;
When I attempt to modify QString to fulfill it with zeroes it triggers copy-on-write idiom and allocates new memory for a modified variant of data. Old data remains untouched. Same happens even if I use QString::data() method. Not really sure why - probably because it returns not raw char * but QChar *;
QString::clear(), = "" does actually the same COW as described above.
Q: How can I implement proper QString cleanup to prevent passwords leaks?
I've got two possible ways of bypassing copy-on-write. I've tried them and they seem to work - didn't use the Qt Creator's memory viewer, but the implicitly shared QStrings used in my code both pointed to the same zeroed data afterwards.
Using constData()
As written in the Qt docs for QString::data() method:
For read-only access, constData() is faster because it never causes a
deep copy to occur.
So the possible solution looks like this:
QString str = "password";
QString str2 = str;
QChar* chars = const_cast<QChar*>(str.constData());
for (int i = 0; i < str.length(); ++i)
chars[i] = '0';
// str and str2 are now both zeroed
This is a legitimate use of const_cast since the underlying data is not really const, so no undefined behaviour here.
Using iterators
From the Qt docs for implicit sharing:
An implicitly shared class has control of its internal data. In any
member functions that modify its data, it automatically detaches
before modifying the data. Notice, however, the special case with
container iterators; see Implicit sharing iterator problem.
So let's move to the section describing this iterator problem:
Implicit sharing has another consequence on STL-style iterators: you
should avoid copying a container while iterators are active on that
container. The iterators point to an internal structure, and if you
copy a container you should be very careful with your iterators. E.g.:
QVector<int> a, b;
a.resize(100000); // make a big vector filled with 0.
QVector<int>::iterator i = a.begin();
// WRONG way of using the iterator i:
b = a;
/*
Now we should be careful with iterator i since it will point to shared data
If we do *i = 4 then we would change the shared instance (both vectors)
The behavior differs from STL containers. Avoid doing such things in Qt.
*/
a[0] = 5;
/*
Container a is now detached from the shared data,
and even though i was an iterator from the container a, it now works as an iterator in b.
*/
It is my understanding that, based on the above docs fragment, you should be able to exploit this "wrong usage" of iterators to manipulate your original string with iterators as they don't trigger copy-on-write. It's important that you "intercept" the begin() and end() before any copying occurs:
QString str = "password";
QString::iterator itr = str.begin();
QString::iterator nd = str.end();
QString str2 = str;
while (itr != nd)
{
*itr = '0';
++itr;
} // str and str2 still point to the same data and are both zeroed
You have to be aware of all the temporary copies that may be done. If you want to avoid this you must manually erase memory before deallocating each temporal copy. Unfortunately that cannot be done with the standard QString since the implementation is closed.
You can, however, specialise std::basic_string using a custom allocator, which, before deleting can clean up the memory block (see below for an example). You can use this new secure string to manipulate your password instead of a plain char array, if you find it more convenient. I'm not sure is std::basic_string can be specialised with QChar, but if not you can use any of the Unicode characters from C++11 (char16_t, char32_t...) instead if you need other than ANSI support.
Regarding the user interface, I think an option you have is to create your own text input widget, reimplementing the keyPressEvent / keyReleaseEvent to store the typed password into either the secure string or a char array. Also reimplement the paintEvent to display only stars, dots, or whatever other masking character you want. Once the password has been used just clear the array or empty the secure string.
Update: example of secure string
namespace secure {
template<class T>
class allocator : public std::allocator<T> {
public:
typedef typename std::allocator<T>::pointer pointer;
typedef typename std::allocator<T>::size_type size_type;
template<class U>
struct rebind {
typedef allocator<U> other;
};
allocator() throw() :
std::allocator<T>() {}
allocator(const allocator& other) throw() :
std::allocator<T>(other) {}
template <class U>
allocator(const allocator<U>& other) throw() :
std::allocator<T>(other) {}
void deallocate(pointer p, size_type num) {
memset(p, 0, num); // can be replaced by SecureZeroMemory(p, num) on Windows
std::allocator<T>::deallocate(p, num);
}
};
class string : public std::basic_string<char, std::char_traits<char>, allocator<char>> {
public:
string() :
basic_string() {}
string(const string& str) :
basic_string(str.data(), str.length()) {}
template<class _Elem, class _Traits, class _Ax>
string(const std::basic_string<_Elem, _Traits, _Ax>& str) :
basic_string(str.begin(), str.end()) {}
string(const char* chars) :
basic_string(chars) {}
string(const char* chars, size_type sz) :
basic_string(chars, sz) {}
template<class _It>
string(_It a, _It b) :
basic_string(a, b) {}
};
}
I have the following issue related to iterating over an associative array of strings defined using std::map.
-- snip --
class something
{
//...
private:
std::map<std::string, std::string> table;
//...
}
In the constructor I populate table with pairs of string keys associated to string data. Somewhere else I have a method toString that returns a string object that contains all the keys and associated data contained in the table object(as key=data format).
std::string something::toString()
{
std::map<std::string, std::string>::iterator iter;
std::string* strToReturn = new std::string("");
for (iter = table.begin(); iter != table.end(); iter++) {
strToReturn->append(iter->first());
strToReturn->append('=');
strToRetunr->append(iter->second());
//....
}
//...
}
When I'm trying to compile I get the following error:
error: "error: no match for call to ‘(std::basic_string<char,
std::char_traits<char>, std::allocator<char> >) ()’".
Could somebody explain to me what is missing, what I'm doing wrong?
I only found some discussion about a similar issue in the case of hash_map where the user has to define a hashing function to be able to use hash_map with std::string objects. Could be something similar also in my case?
Your main problem is that you are calling a method called first() in the iterator. What you are meant to do is use the property called first:
...append(iter->first) rather than ...append(iter->first())
As a matter of style, you shouldn't be using new to create that string.
std::string something::toString()
{
std::map<std::string, std::string>::iterator iter;
std::string strToReturn; //This is no longer on the heap
for (iter = table.begin(); iter != table.end(); ++iter) {
strToReturn.append(iter->first); //Not a method call
strToReturn.append("=");
strToReturn.append(iter->second);
//....
// Make sure you don't modify table here or the iterators will not work as you expect
}
//...
return strToReturn;
}
edit: facildelembrar pointed out (in the comments) that in modern C++ you can now rewrite the loop
for (auto& item: table) {
...
}
Don't write a toString() method. This is not Java. Implement the stream operator for your class.
Prefer using the standard algorithms over writing your own loop. In this situation, std::for_each() provides a nice interface to what you want to do.
If you must use a loop, but don't intend to change the data, prefer const_iterator over iterator. That way, if you accidently try and change the values, the compiler will warn you.
Then:
std::ostream& operator<<(std::ostream& str,something const& data)
{
data.print(str)
return str;
}
void something::print(std::ostream& str) const
{
std::for_each(table.begin(),table.end(),PrintData(str));
}
Then when you want to print it, just stream the object:
int main()
{
something bob;
std::cout << bob;
}
If you actually need a string representation of the object, you can then use lexical_cast.
int main()
{
something bob;
std::string rope = boost::lexical_cast<std::string>(bob);
}
The details that need to be filled in.
class somthing
{
typedef std::map<std::string,std::string> DataMap;
struct PrintData
{
PrintData(std::ostream& str): m_str(str) {}
void operator()(DataMap::value_type const& data) const
{
m_str << data.first << "=" << data.second << "\n";
}
private: std::ostream& m_str;
};
DataMap table;
public:
void something::print(std::ostream& str);
};
Change your append calls to say
...append(iter->first)
and
... append(iter->second)
Additionally, the line
std::string* strToReturn = new std::string("");
allocates a string on the heap. If you intend to actually return a pointer to this dynamically allocated string, the return should be changed to std::string*.
Alternatively, if you don't want to worry about managing that object on the heap, change the local declaration to
std::string strToReturn("");
and change the 'append' calls to use reference syntax...
strToReturn.append(...)
instead of
strToReturn->append(...)
Be aware that this will construct the string on the stack, then copy it into the return variable. This has performance implications.
Note that the result of dereferencing an std::map::iterator is an std::pair. The values of first and second are not functions, they are variables.
Change:
iter->first()
to
iter->first
Ditto with iter->second.
iter->first and iter->second are variables, you are attempting to call them as methods.
Use:
std::map<std::string, std::string>::const_iterator
instead:
std::map<std::string, std::string>::iterator
Another worthy optimization is the c_str ( ) member of the STL string classes, which returns an immutable null terminated string that can be passed around as a LPCTSTR, e. g., to a custom function that expects a LPCTSTR. Although I haven't traced through the destructor to confirm it, I suspect that the string class looks after the memory in which it creates the copy.
In c++11 you can use:
for ( auto iter : table ) {
key=iter->first;
value=iter->second;
}