Nullify QString data bytes - c++

I use QStrings to store passwords. If to be more precise, I use QStrings to fetch passwords from GUI.
The point is that after password usage/appliance I need to nullify (zero) internal QStrings data bytes with password to eliminate it from memory entirely.
Here are my investigations:
After QString destruction it's data remains in memory nonzeroed;
When I attempt to modify QString to fulfill it with zeroes it triggers copy-on-write idiom and allocates new memory for a modified variant of data. Old data remains untouched. Same happens even if I use QString::data() method. Not really sure why - probably because it returns not raw char * but QChar *;
QString::clear(), = "" does actually the same COW as described above.
Q: How can I implement proper QString cleanup to prevent passwords leaks?

I've got two possible ways of bypassing copy-on-write. I've tried them and they seem to work - didn't use the Qt Creator's memory viewer, but the implicitly shared QStrings used in my code both pointed to the same zeroed data afterwards.
Using constData()
As written in the Qt docs for QString::data() method:
For read-only access, constData() is faster because it never causes a
deep copy to occur.
So the possible solution looks like this:
QString str = "password";
QString str2 = str;
QChar* chars = const_cast<QChar*>(str.constData());
for (int i = 0; i < str.length(); ++i)
chars[i] = '0';
// str and str2 are now both zeroed
This is a legitimate use of const_cast since the underlying data is not really const, so no undefined behaviour here.
Using iterators
From the Qt docs for implicit sharing:
An implicitly shared class has control of its internal data. In any
member functions that modify its data, it automatically detaches
before modifying the data. Notice, however, the special case with
container iterators; see Implicit sharing iterator problem.
So let's move to the section describing this iterator problem:
Implicit sharing has another consequence on STL-style iterators: you
should avoid copying a container while iterators are active on that
container. The iterators point to an internal structure, and if you
copy a container you should be very careful with your iterators. E.g.:
QVector<int> a, b;
a.resize(100000); // make a big vector filled with 0.
QVector<int>::iterator i = a.begin();
// WRONG way of using the iterator i:
b = a;
/*
Now we should be careful with iterator i since it will point to shared data
If we do *i = 4 then we would change the shared instance (both vectors)
The behavior differs from STL containers. Avoid doing such things in Qt.
*/
a[0] = 5;
/*
Container a is now detached from the shared data,
and even though i was an iterator from the container a, it now works as an iterator in b.
*/
It is my understanding that, based on the above docs fragment, you should be able to exploit this "wrong usage" of iterators to manipulate your original string with iterators as they don't trigger copy-on-write. It's important that you "intercept" the begin() and end() before any copying occurs:
QString str = "password";
QString::iterator itr = str.begin();
QString::iterator nd = str.end();
QString str2 = str;
while (itr != nd)
{
*itr = '0';
++itr;
} // str and str2 still point to the same data and are both zeroed

You have to be aware of all the temporary copies that may be done. If you want to avoid this you must manually erase memory before deallocating each temporal copy. Unfortunately that cannot be done with the standard QString since the implementation is closed.
You can, however, specialise std::basic_string using a custom allocator, which, before deleting can clean up the memory block (see below for an example). You can use this new secure string to manipulate your password instead of a plain char array, if you find it more convenient. I'm not sure is std::basic_string can be specialised with QChar, but if not you can use any of the Unicode characters from C++11 (char16_t, char32_t...) instead if you need other than ANSI support.
Regarding the user interface, I think an option you have is to create your own text input widget, reimplementing the keyPressEvent / keyReleaseEvent to store the typed password into either the secure string or a char array. Also reimplement the paintEvent to display only stars, dots, or whatever other masking character you want. Once the password has been used just clear the array or empty the secure string.
Update: example of secure string
namespace secure {
template<class T>
class allocator : public std::allocator<T> {
public:
typedef typename std::allocator<T>::pointer pointer;
typedef typename std::allocator<T>::size_type size_type;
template<class U>
struct rebind {
typedef allocator<U> other;
};
allocator() throw() :
std::allocator<T>() {}
allocator(const allocator& other) throw() :
std::allocator<T>(other) {}
template <class U>
allocator(const allocator<U>& other) throw() :
std::allocator<T>(other) {}
void deallocate(pointer p, size_type num) {
memset(p, 0, num); // can be replaced by SecureZeroMemory(p, num) on Windows
std::allocator<T>::deallocate(p, num);
}
};
class string : public std::basic_string<char, std::char_traits<char>, allocator<char>> {
public:
string() :
basic_string() {}
string(const string& str) :
basic_string(str.data(), str.length()) {}
template<class _Elem, class _Traits, class _Ax>
string(const std::basic_string<_Elem, _Traits, _Ax>& str) :
basic_string(str.begin(), str.end()) {}
string(const char* chars) :
basic_string(chars) {}
string(const char* chars, size_type sz) :
basic_string(chars, sz) {}
template<class _It>
string(_It a, _It b) :
basic_string(a, b) {}
};
}

Related

How to represent existing data as std::vector

I have to pass existing data (unsigned char memory area with known size) to the library function expecting const std::vector<std::byte>& . Is there any way to "fool" the library function to believe that it received a vector while operating on existing data?
I have data from the old legacy as a pointer and size, not as a std::vector. Legacy C code allocates memory by malloc() and provides pointer and size. Please do not suggest touching the legacy code - by the end of the phrase I'll cease to be an employee of the company.
I don't want to create temporary vector and copy data because memory throughtput is huge (> 5GB/sec).
Placement new creates vector - but with the first bytes used for the vector data itself. I cannot use few bytes before the memory area - legacy code didn't expect that (see above - memory area is allocated by malloc()).
Changing third party library is out of question. It expects const std::vectorstd::byte& - not span iterators etc.
It looks that I have no way but to go with temporary vector but maybe there are other ideas... I wouldn't care but it is about intensive video processing and there will be a lot of data to copy for nothing.
Is there any way to "fool" the library function to believe that it received a vector while operating on existing data?
No.
The potential options are:
Put the data in a vector in the first place.
Or change the function expecting a vector to not expect a vector.
Or create a vector and copy the data.
If 1. and 2. are not valid options for you, that leaves you with 3. whether you want it or not.
As the top answer mentions, this is impossible to do in standard C++. And you should not try to do it.
If you can tolerate only using libstdc++ and getting potentially stuck with a specific standard library version, it looks like you can do it. Again, you should not do this. I'm only writing this answer as it seems to be possible without UB in this specific circumstance.
It appears that the current version of libstdc++ exposes their vectors' important members as protected: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_vector.h#L422
All you need to do is inherit from std::vector (it's not forbidden), write your own constructor for setting these protected members, and write a destructor to reset the members so that the actual vector destructor does not delete your memory.
#include <vector>
#include <cstddef>
template <class T>
struct dont_use_me_in_prod : std::vector<T>
{
dont_use_me_in_prod(T* data, size_t n) {
this->_M_impl._M_start = data;
this->_M_impl._M_finish = data + n;
this->_M_impl._M_end_of_storage = this->_M_impl._M_finish;
}
~dont_use_me_in_prod() {
this->_M_impl._M_start = nullptr;
this->_M_impl._M_finish = nullptr;
this->_M_impl._M_end_of_storage = nullptr;
}
};
void innocent_function(const std::vector<int>& v);
void please_dont_do_this_in_prod(int* vals, int n) {
dont_use_me_in_prod evil_vector(vals, n);
innocent_function(evil_vector);
}
Note that this is not compiler, but standard library dependent, meaning that it'll work with clang as well as long as you use libstdc++ with it. But this is not conforming, so you gotta fix innocent_function somehow soon:
https://godbolt.org/z/Tfcn7rdKq
The problem is std::vector is not a reference class like std::string_view or std::span. std::vector owns the managed memory. It allocates the memory and releases the owned memory. It is not designed to acquire the external buffer and release the managed buffer.
What you can do is a very dirty hack. You can create new structure with exactly the same layout as a std::vector, assign the data and size fields with what you get from external lib, and then pass this struct as a std::vector const& using reinterpret_cast. It can work as your library does not modify the vector (I assume they do not perform const_cast on std::vector const&).
The drawback is that code is unmaintainable. The next STL update can cause application crash, if the layout of the std::vector is changed.
Following is a pseudo code
struct FakeVector
{
std::byte* Data;
std::size Size;
std::size Capacity;
};
void onNewData(std::byte* ptr, size_t size)
{
auto vectorRef = FakeVector{ptr, size, size};
doSomething(*reinterpret_cast<std::vector<std::byte>*>(&vectorRef));
}
Well, I've found the way working for me. I must admit that it is not fully standard compliant because casting of vector results in undefined behavior but for the foreseeable future I wouldn't expect this to fail. Idea is to use my own Allocator for the vector that accepts the buffer from the legacy code and works on it. The problem is that std::vector<std::byte> calls default initialization on resize() that zeroes the buffer. If there is a way to disable that - it would be a perfect solution but I have not found... So here the ugly cast comes - from the std::vector<InnerType> where InnerType is nothing but std::byte with default constructor disabled to the std::vector<std::byte> that library expects. Working code is shown at https://godbolt.org/z/7jME79EE9 , also here:
#include <cstdlib>
#include <iostream>
#include <vector>
#include <cstddef>
struct InnerType {
std::byte value;
InnerType() {}
InnerType(std::byte v) : value(v) {}
};
static_assert(sizeof(InnerType) == sizeof(std::byte));
template <class T> class AllocatorExternalBufferT {
T* const _buffer;
const size_t _size;
public:
typedef T value_type;
constexpr AllocatorExternalBufferT() = delete;
constexpr AllocatorExternalBufferT(T* buf, size_t size) : _buffer(buf), _size(size) {}
[[nodiscard]] T* allocate(std::size_t n) {
if (n > _size / sizeof(T)) {
throw std::bad_array_new_length();
}
return _buffer;
}
void deallocate(T*, std::size_t) noexcept {}
};
template <class T, class U> bool operator==(const AllocatorExternalBufferT <T>&, const AllocatorExternalBufferT <U>&) { return true; }
template <class T, class U> bool operator!=(const AllocatorExternalBufferT <T>&, const AllocatorExternalBufferT <U>&) { return false; }
typedef std::vector<InnerType, AllocatorExternalBufferT<InnerType>> BufferDataVector;
typedef std::vector<std::byte, AllocatorExternalBufferT<std::byte>> InterfaceVector;
static void report(const InterfaceVector& vec) {
std::cout << "size=" << vec.size() << " capacity=" << vec.capacity() << " ";
for(const auto& el : vec) {
std::cout << static_cast<int>(el) << " ";
}
std::cout << "\n";
}
int main() {
InnerType buffer4allocator[16] ;
BufferDataVector v((AllocatorExternalBufferT<InnerType>(buffer4allocator, sizeof(buffer4allocator)))); // double parenthesis here for "most vexing parse" nonsense
v.resize(sizeof(buffer4allocator));
std::cout << "memory area kept intact after resizing vector:\n";
report(*reinterpret_cast<InterfaceVector*>(&v));
}
Yes you can do this. Not in a nice safe way but it's certainly possible.
All you need to do is create a fake std::vector that has the same ABI (memory layout) as std::vector. Then set it's internal pointer to point to your data and reinterpet_cast your fake vector back to a std::vector.
I wouldn't recommend it unless you really need to do it because any time your compiler changes its std::vector ABI (field layout basically) it will break. Though to be fair that is very unlikely to happen these days.

Shallow copy a string array in the constructor

I have a string array (string references[10]) in my header file as a private variable of a class.
How can I shallow copy if I have a constructor in that class tome(string *initialList)?
I want to set references = initialList;
What is the best way to do it?
Header file:
#ifndef TOME_H
#define TOME_H
#include <string>
using namespace std;
class tome;
ostream &operator << (ostream &, const tome &);
class tome
{
public:
tome(string , int, string);
tome(string, int, string , string*);
~tome();
int getTomeSize();
string getSpell(int) const;
string* getReferences();
string getName();
string getAuthor();
tome operator+(string* add);
friend ostream &operator << (ostream &output, const tome &t);
void operator=(const tome &oldTome);
private:
string references[10];
string tomeName;
string author;
int spellsStored;
friend ostream &operator << (ostream &, const tome &);
};
#endif
tome.cpp Constructor:
tome::tome(string name, int tomeSize, string authorName, string* initialList)
{
tomeName = name;
author = authorName;
spellsStored = tomeSize;
}
An array, either raw or in form of std::array, always contains the data (in case of an array of pointers, the "data" is the pointers!), so if you have an array of std::string, you cannot shallow copy as std::string does not provide shallow copies.
For shallow copies, you need references or pointers (not considering visibility, adjust yourself as needed):
class A
{
std::array<std::string, 10> data; // using std::array for its superior interface...
};
class B
{
std::array<std::string, 10>* data; // references an array of some A
}
Obviously, you now need some life time management of whatever form to assure that the referenced A is not destroyed as long as the referencing B is still alive, or at least, as long as B still uses this reference. If you don't do this right, you end up either in undefined behaviour or with memory leaks...
You get this memory management for free if you use a smart pointer:
class C
{
std::shared_ptr<std::array<std::string, 10>> data;
};
Now different C (as many as you like) can share arbitrary data, it will be deleted as soon as all C referencing it are destroyed, but not earlier, and you are safe from both problems above. Shallow copies now are done by simply assigning the smart pointer to another one:
C::C(std::shared_ptr<std::array<std::string, 10>>& data) : data(data) { }
// ^^^^^^^^^^
// std::shared_ptr's constructor does the necessary stuff...
However, changes to the data in one C get visible to all other C sharing the same array. This can be desired in some cases, might lead to great surprises in other ones if you don't handle the matter carefully.
You might prefer deep copies instead to avoid trouble. I recommend using std::array because it has a superior interface similar to the one of std::vector, so you can easily assign correctly; let's extend above class A appropriately:
A::A(std::array<std::string, 10>& data) : data(data) { }
// ^^^^^^^^^^
// simply assign, std::array's constructor does the rest...
If you insist on having raw arrays:
class D
{
std::string[10] data;
D(std::string* data)
{
std::copy(data, data + 10, this->data);
}
};
Assuming we always have arrays of length 10 – you can get into great trouble if this condition is violated at some time somewhere. Better is passing the size together with the array and having appropriate checks. You see, std::array avoids all this trouble and additionally a mismatch between raw array and length being passed (on the other hand, you cannot pass sub-arrays this way; you could, though, provide an overload with two additional parameters size_t offset, size_t length to the approach below allowing to select sub ranges). If you want to be able to pass arrays of arbitrary lengths:
template <size_t N>
A::A(std::array<std::string, N>& data)
{
//static_assert(N <= 10); // if you don't want to discard surplus data silently...
//std::copy(data.begin(), data.end(), this->data.begin());
std::copy
(
data.data(),
data.data() + std::min(N, this->data.size()),
this->data.begin()
);
}
Finally: Appropriate typedefs can safe you quite some typing on one hand and, but more important, prevent you from errors (use constants as well):
class C
{
using Data = std::shared_ptr<std::array<std::string, 10>>;
Data data;
C(Data& data) : data(data) { }
};
class A
{
static size_t const DataLength = 10;
template <size_t N>
A(std::array<std::string, N>& data)
{
static_assert(N <= DataLength);
std::copy(data.begin(), data.end(), this->data.begin());
}
};

C++ unordered_map<string, ...> lookup without constructing string

I have C++ code that investigates a BIG string and matches lots of substrings. As much as possible, I avoid constructing std::strings, by encoding substrings like this:
char* buffer, size_t bufferSize
At some point, however, I'd like to look up a substring in one of these:
std::unordered_map<std::string, Info> stringToInfo = {...
So, to do that, I go:
stringToInfo.find(std::string(buffer, bufferSize))
That constructs a std::string for the sole purpose of the lookup.
I feel like there's an optimization I could do here, by... changing the key-type of the unordered_map to some kind of temporary string imposter, a class like this...
class SubString
{
char* buffer;
size_t bufferSize;
// ...
};
... that does the same logic as std::string to hash and compare, but then doesn't deallocate its buffer when it's destroyed.
So, my question is: is there a way to get the standard classes to do this, or do I write this class myself?
What you're wanting to do is called heterogeneous lookup. Since C++14 it's been supported for std::map::find and std::set::find (note versions (3) and (4) of the functions, which are templated on the lookup value type). It's more complicated for unordered containers because they need to be told of or find hash functions for all key types that will produce the same hash value for the same text. There's a proposal under consideration for a future Standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0919r0.html
Meanwhile, you could use another library that already supports heterogenous lookup, e.g. boost::unordered_map::find.
If you want to stick to std::unordered_map, you could avoid creating so many string temporaries by storing a std::string member alongside your unordered_map that you can reassign values to, then pass that string to find. You could encapsulate this in a custom container class.
Another route is to write a custom class to use as your unordered container key:
struct CharPtrOrString
{
const char* p_;
std::string s_;
explicit CharPtrOrString(const char* p) : p_{p} { }
CharPtrOrString(std::string s) : p_{nullptr}, s_{std::move(s)} { }
bool operator==(const CharPtrOrString& x) const
{
return p_ ? x.p_ ? std::strcmp(p_, x.p_) == 0
: p_ == x.s_
: x.p_ ? s_ == x.p_
: s_ == x.s_;
}
struct Hash
{
size_t operator()(const CharPtrOrString& x) const
{
std::string_view sv{x.p_ ? x.p_ : x.s_.c_str()};
return std::hash<std::string_view>()(sv);
}
};
};
You can then construct CharPtrOrString from std::strings for use in the unordered container keys, but construct one cheaply from your const char* each time you call find. Note that operator== above has to work out which you did (convention used is that if the pointer's nullptr then the std::string member's in use) so it compares the in-use members. The hash function has to make sure a std::string with a particular textual value will produce the same hash as a const char* (which it doesn't by default with GCC 7.3 and/or Clang 6 - I work with both and remember one had an issue but not which).
In C++20, you can now do this:
// struct is from "https://www.cppstories.com/2021/heterogeneous-access-cpp20/"
struct string_hash {
using is_transparent = void;
[[nodiscard]] size_t operator()(const char *txt) const {
return std::hash<std::string_view>{}(txt);
}
[[nodiscard]] size_t operator()(std::string_view txt) const {
return std::hash<std::string_view>{}(txt);
}
[[nodiscard]] size_t operator()(const std::string &txt) const {
return std::hash<std::string>{}(txt);
}
};
// Declaration of map
std::unordered_map<std::string, Info, string_hash, std::equal_to<>> map;
std::string_view key = "foo";
if (map.find(key))
{
// do something here
}
Just note that you will still need std::string when using []. There may be a way around that, but I'm not too sure

Why does std::string_view::data not include a null terminator?

This code has undefined behavior:
#include <string_view>
#include <iostream>
using namespace std::string_view_literals;
void foo(std::string_view msg) {
std::cout << msg.data() << '\n'; // undefined behavior if 'msg' is not null-
// terminated
// std::cout << msg << '\n'; is not undefined because operator<< uses
// iterators to print 'msg', but that's not the point
}
int main() {
foo("hello"sv); // not null-terminated - undefined behavior
foo("foo"); // same, even more dangerous
}
The reason why is that std::string_view can store non-null terminated strings, and doesn't include a null terminator when calling data. That's really limiting, as to make the above code defined behavior, I have to construct a std::string out of it:
std::string str{ msg };
std::cout << str.data() << '\n';
This really makes std::string_view unnecessary in this case, I still have to copy the string passed to foo, so why not use move semantics and change msg to a std::string? This might be faster, but I didn't measure.
Either way, having to construct a std::string every time I want to pass a const char* to a function which only accepts a const char* is a bit unnecessary, but there has to be a reason why the Committee decided it this way.
So, why does std::string_view::data not return a null-terminated string like std::string::data?
So, why does std::string_view::data not return a null-terminated
string like std::string::data
Simply because it can't. A string_view can be a narrower view into a larger string (a substring of a string). That means that the string viewed will not necessary have the null termination at the end of a particular view. You can't write the null terminator into the underlying string for obvious reasons and you can't create a copy of the string and return char * without a memory leak.
If you want a null terminating string, you would have to create a std::string copy out of it.
Let me show a good use of std::string_view:
auto tokenize(std::string_view str, Pred is_delim) -> std::vector<std::string_view>
Here the resulting vector contains tokens as views into the larger string.
The purpose of string_view is to be a range representing a contiguous sequence of characters. Limiting such a range to one that ends in a NUL-terminator limits the usefulness of the class.
That being said, it would still be useful to have an alternate version of string_view which is intended only to be created from strings that truly are NUL-terminated.
My zstring_view class is privately inherited from string_view, and it provides support for removing elements from the front and other operations that cannot make the string non-NUL-terminated. It provides the rest of the operations, but they return a string_view, not a zstring_view.
You'd be surprised how few operations you have to lose from string_view to make this work:
template<typename charT, typename traits = std::char_traits<charT>>
class basic_zstring_view : private basic_string_view<charT, traits>
{
public:
using base_view_type = basic_string_view<charT, traits>;
using base_view_type::traits_type;
using base_view_type::value_type;
using base_view_type::pointer;
using base_view_type::const_pointer;
using base_view_type::reference;
using base_view_type::const_reference;
using base_view_type::const_iterator;
using base_view_type::iterator;
using base_view_type::const_reverse_iterator;
using base_view_type::reverse_iterator;
using typename base_view_type::size_type;
using base_view_type::difference_type;
using base_view_type::npos;
basic_zstring_view(const charT* str) : base_view_type(str) {}
constexpr explicit basic_zstring_view(const charT* str, size_type len) : base_view_type(str, len) {}
constexpr explicit basic_zstring_view(const base_view_type &view) : base_view_type(view) {}
constexpr basic_zstring_view(const basic_zstring_view&) noexcept = default;
basic_zstring_view& operator=(const basic_zstring_view&) noexcept = default;
using base_view_type::begin;
using base_view_type::end;
using base_view_type::cbegin;
using base_view_type::cend;
using base_view_type::rbegin;
using base_view_type::rend;
using base_view_type::crbegin;
using base_view_type::crend;
using base_view_type::size;
using base_view_type::length;
using base_view_type::max_size;
using base_view_type::empty;
using base_view_type::operator[];
using base_view_type::at;
using base_view_type::front;
using base_view_type::back;
using base_view_type::data;
using base_view_type::remove_prefix;
//`using base_view_type::remove_suffix`; Intentionally not provided.
///Creates a `basic_string_view` that lacks the last few characters.
constexpr basic_string_view<charT, traits> view_suffix(size_type n) const
{
return basic_string_view<charT, traits>(data(), size() - n);
}
using base_view_type::swap;
template<class Allocator = std::allocator<charT> >
std::basic_string<charT, traits, Allocator> to_string(const Allocator& a = Allocator()) const
{
return std::basic_string<charT, traits, Allocator>(begin(), end(), a);
}
constexpr operator base_view_type() const {return base_view_type(data(), size());}
using base_view_type::to_string;
using base_view_type::copy;
using base_view_type::substr;
using base_view_type::operator==;
using base_view_type::operator!=;
using base_view_type::compare;
};
When dealing with string literals with known null terminators I usually use something like this to make sure the null is included in the counted chars.
template < size_t L > std::string_view string_viewz(const char (&t) [L])
{
return std::string_view(t, L);
}
The aim here is not to try to fix the compatibility issue, there are too many. But if you know what you are doing at want the string_view span to have a null ( Serialization ) then it is a nice trick.
auto view = string_viewz("Surrogate String");

Is std::string an object?

just looking in optimizing some std::map code. The map contains objects, accessed via the string-identifier.
Example:
std::map<std::string, CVeryImportantObject> theMap;
...
theMap["second"] = new CVeryImportantObject();
Now, when using the find-function as theMap->find("second"), the String is converted into std::string("second"), which causes new string allocations (over all when using IDL=2 with Visual Studio).
1. Is there a possibility to use a string-only class to avoid such allocations?
Intentionally I've tried to use another String-Class as well:
std::map<CString, CVeryImportantObject> theMap;
This code works also. But CString indeed is an object.
And: If you remove an object from the map, I'll need to release both the related object and the key, do I?
Any suggestions?
Now, when using the find-function as theMap->find("second"), the
String is converted into std::string("second"), which causes new
string allocations (over all when using IDL=2 with Visual Studio).
This is a Standard issue, which is fixed in C++14 for ordered containers. The newest version of VS, VS 14 CTP (which is a pre-release) contains a fix for this issue, as will new versions of other implementations.
If you need to avoid allocations, you can try a class like llvm::StringRef which can refer to std::string or string literals interchangably, but then you will be left trying to handle the ownership externally.
You can try something like unique_ptr<char[], maybe_delete> that sometimes deletes the contents. This is a bit of a mess to interface with though.
And: If you remove an object from the map, I'll need to release both
the related object and the key, do I?
The map will automatically destruct the key and value for you. For a class which frees it's own resources like std::string, which is the only sane way to write C++, then you can erase without worrying about resource cleanup.
If you always use string constants as keys, you can use const char * as key type in map when you use proper comparator:
struct PCharCompare {
bool operator()( const char *s1, const char *s2 ) const { return strcmp( s1, s2 ) < 0; }
};
std::map< const char *, CVeryImportantObject, PCharCompare> theMap;
Note: you have to be careful and need to understand how it works, as it can easily lead to UB:
void foo() {
char buffer[256];
snprintf( buffer, sizeof( buffer ), "blah" );
theMap.insert( std::make_pair( buffer, Object ) );
} // ups dangled pointer in the map
As for optimization, it is very unlikely that std::string creation is a culprit. you may try to use std::unordered_map or something similar for optimization
Now, when using the find-function as theMap->find("second"), the
String is converted into std::string("second"), which causes new
string allocations
Not necessarily. VC uses Small-String Optimisation (SSO). This means that for a string as short as "second", no allocation on the heap should take place at all; the characters will instead be stored directly in the temporarily created std::string object.
This is still not free (because the std::string has to be created, albeit without any dynamic allocation happening inside), but should be good enough. Is it really a concern for you? Chances are very high that it does not cause any measurable performance decrease.
Is there a possibility to use a string-only class to avoid such allocations?
Not really, except of the C++14 fix mentioned in other answers. Using char const * as the key type is very dangerous, because std::map will only store the actual addresses, not copies of the keys.
If I were you and if I really experienced performance problems, I'd just not use std::map directly but create my own container class to wrap a std::map<char const *, T, CustomComparison> and do the hard pointer work inside.
template <class ValueType>
class FastStringMap
{
private:
struct Comparison
{
bool operator()(char const *lhs, char const *rhs) const
{
return strcmp(lhs, rhs) > 0;
}
};
typedef std::map<char const *, ValueType, Comparison> WrappedMap;
WrappedMap m_map;
public:
typedef typename WrappedMap::iterator iterator;
typedef typename WrappedMap::const_iterator const_iterator;
bool insert(char const *key, ValueType const &value)
{
if (m_map.find(key) != m_map.end())
{
return false;
}
else
{
char *copy = new char[strlen(key) + 1];
strcpy(copy, key);
try
{
return m_map.insert(std::make_pair(copy, value)).second;
}
catch (...)
{
delete copy;
throw;
}
}
}
~FastStringMap()
{
for (iterator iter = m_map.begin(); iter != m_map.end(); ++iter)
{
delete[] iter->first;
}
}
iterator find(char const *key)
{
return m_map.find(key);
}
const_iterator find(char const *key) const
{
return m_map.find(key);
}
// further operations
};
To be used like this:
FastStringMap<int> m;
m.insert("AAA", 1);
m.insert("BBB", 2);
m.insert("CCC", 3);
std::cout << m.find("AAA")->second;
Note that you can possibly make this more sophisticated by templatising also on the character type (for std::wstring support) or by providing "real" iterator classes (using Boost Iterator Facade).
And: If you remove an object from the map, I'll need to release both
the related object and the key, do I?
If you use std::string, no. If you use char const * and if the pointers point to memory allocated dynamically (as in my example), then yes.