I have recently been relearning C++ as I develop a game in the Unreal engine. Its been about 3 years since I have touched C++, and I have been mostly using Java since then.
Due to the differences between java and c++ I can already tell there are different best practices for similar concepts.
I have 2 methods like this.
void UMarchingSquares::Generate(std::map<Vector2, int> automata) {
std::map<Vector2,ControlNode*> controlNodes = getControlNodes(automata);
}
std::map<Vector2,ControlNode*> UMarchingSquares::getControlNodes(std::map<Vector2, int> automata) {
std::map<Vector2,ControlNode*> controlNodes = std::map<Vector2, ControlNode*>();
for(pair<Vector2,int> pair : automata) {
Vector2 pos = pair.first;
ControlNode node = ControlNode(pos,pair.second);
controlNodes[pos] = &node;
}
return controlNodes;
}
I probably am breaking a few different C++ best practices, but there is one that I really want clarifications on one specific area.
I am initializing the ControlNode object in the getControlNodes() method's for loop. I know now that doing it this way is bad, because I am storing a pointer to a local variable, which then goes out of scope every loop iteration. I would prefer to store pointers instead of the actual Control node (though I may be convinced otherwise, since a Control Node holds a position [2 floats], a material [1 integer], and two other objects that both have a position and material of their own.)
What is the best way to create a non-local variable pointer? I know I can just use "new ControlNode()", but from what I know, that ends up being a fairly expensive call, and requires cleanup (which may be expensive as well).
I am going to be calling this part of the code fairly frequently, so I would like it to be efficient.
Thank you!
C++ changed a lot in the last few years to make life easier for those using it.
Looking at your code, it raises a lot of questions:
Why is the value of your map a raw pointer to ControlNode instead of a ControlNode by value or a unique_ptr to it
In your for-loop, why do you write the explicit type of pair (which is different from the iterator), auto could help you here to have fewer copies
As your question is about the first, I'll ignore the second one.
Looking at this, you have 3 ways of fixing the code:
std::map<Vector2,ControlNode> getControlNodes(std::map<Vector2, int> automata) {
auto controlNodes = std::map<Vector2, ControlNode>{};
for(auto &&pair : automata) {
auto &&pos = pair.first;
auto node = ControlNode(pos,pair.second);
controlNodes[pos] = std::move(node);
}
return controlNodes;
}
In this code, you can see that the * has been removed from the map. This implies that that ownership of the ControlNode is moved instead of the map. (Also note the std::move) This would be similar to how you have the int stored in the map that comes in as an argument.
If you however require memory allocation, as you will be moving this around and the address needs to be stable, std::unique_ptr is a good solution.
std::map<Vector2,std::unique_ptr<ControlNode>> getControlNodes(std::map<Vector2, int> automata) {
auto controlNodes = std::map<Vector2, std::unique_ptr<ControlNode>>{};
for(auto &&pair : automata) {
auto &&pos = pair.first;
auto node = std::make_unique<ControlNode>(pos,pair.second);
controlNodes[pos] = std::move(node);
}
return controlNodes;
}
As you can see, this code is very similar to the previous, I've replaced the type in the map and changed the construction of the ControlNode to std::make_unique. Hence, you have a unique_ptr containing the ownership to the allocated memory (and as long as you have the unique_ptr, things are staying valid).
The third solution should only be used if you can't change the signature and is considered bad practice in C++ as it passes ownership via raw pointers. Now your caller is responsible for explicitly cleaning up the memory as C++ doesn't have garbage collection.
std::map<Vector2,ControlNode*> getControlNodes(std::map<Vector2, int> automata) {
auto controlNodes = std::map<Vector2, ControlNode*>{};
for(auto &&pair : automata) {
auto &&pos = pair.first;
auto node = new ControlNode(pos,pair.second);
controlNodes[pos] = node;
}
return controlNodes;
}
PS: I've added some auto to the code to make the changes between the snippets minimal.
Use a vector of control nodes for storage. Whenever you need a new control node, append a one to that vector. Instead of using a pointer, use an iterator to that vector. Make sure you have reserved enough slots in that vector up front, or else your iterators will get invalidated.
Related
Suppose I own a list of edges saved inside a vector like:
typedef struct edge
{
int v;
size_t start;
size_t end;
}e;
typedef vector<list<e>> adj_list;
adj_list tree;
I have to do logic on this tree object, but the logic is too complicated to do it in place (constricted to not recurse). I need an extra data structure to handle each node. As a simple example, lets consider incrementing each edge's v value:
list<e> aux;
aux.insert(aux.begin(), tree[0].begin(), tree[0].end());
while (!aux.empty())
{
e& now = aux.front();
aux.pop_front();
now.v++;
aux.insert(aux.begin(), tree[now.v].begin(), tree[now.v].end());
}
The problem in doing this is that the changes made to the now variable does not reflect the value in tree. I need a list(can be any list(vector,linked,queue,stack) that has an empty() boolean like Dijkstra) ds to handle my edge objects in tree. Is there an elegant way to do this? Can I use a list of iterators? I'm specifically asking an "elegant" approach in hopes that it does not involve pointers.
As discussed in the comments, the solution is to store iterators instead of copies, e.g.:
list<list<e>::iterator> aux;
aux.insert(aux.begin(), tree[0].begin(), tree[0].end());
while (!aux.empty())
{
e& now = *(aux.front());
aux.pop_front();
now.v++;
aux.insert(aux.begin(), tree[now.v].begin(), tree[now.v].end());
}
This works only if you can guarantee that nothing will invalidate the stored iterators, such as certain operations on tree could do.
As pointed out by n. 'pronouns' m., iterators can be considered as "generalized pointers", so many problems that regular pointers have also apply to iterators.
Another (slightly safer) approach would be to store std::shared_ptrs in the inner list of tree - then you can simply store another std::shared_ptr to the same object in aux which makes sure that the object cannot be accidentally deleted while it is still being referenced
This question is about owning pointers, consuming pointers, smart pointers, vectors, and allocators.
I am a little bit lost on my thoughts about code architecture. Furthermore, if this question has already an answer somewhere, 1. sorry, but I haven't found a satisfying answer so far and 2. please point me to it.
My problem is the following:
I have several "things" stored in a vector and several "consumers" of those "things". So, my first try was like follows:
std::vector<thing> i_am_the_owner_of_things;
thing* get_thing_for_consumer() {
// some thing-selection logic
return &i_am_the_owner_of_things[5]; // 5 is just an example
}
...
// somewhere else in the code:
class consumer {
consumer() {
m_thing = get_thing_for_consumer();
}
thing* m_thing;
};
In my application, this would be safe because the "things" outlive the "consumers" in any case. However, more "things" can be added during runtime and that can become a problem because if the std::vector<thing> i_am_the_owner_of_things; gets reallocated, all the thing* m_thing pointers become invalid.
A fix to this scenario would be to store unique pointers to "things" instead of "things" directly, i.e. like follows:
std::vector<std::unique_ptr<thing>> i_am_the_owner_of_things;
thing* get_thing_for_consumer() {
// some thing-selection logic
return i_am_the_owner_of_things[5].get(); // 5 is just an example
}
...
// somewhere else in the code:
class consumer {
consumer() {
m_thing = get_thing_for_consumer();
}
thing* m_thing;
};
The downside here is that memory coherency between "things" is lost. Can this memory coherency be re-established by using custom allocators somehow? I am thinking of something like an allocator which would always allocate memory for, e.g., 10 elements at a time and whenever required, adds more 10-elements-sized chunks of memory.
Example:
initially:
v = ☐☐☐☐☐☐☐☐☐☐
more elements:
v = ☐☐☐☐☐☐☐☐☐☐ 🡒 ☐☐☐☐☐☐☐☐☐☐
and again:
v = ☐☐☐☐☐☐☐☐☐☐ 🡒 ☐☐☐☐☐☐☐☐☐☐ 🡒 ☐☐☐☐☐☐☐☐☐☐
Using such an allocator, I wouldn't even have to use std::unique_ptrs of "things" because at std::vector's reallocation time, the memory addresses of the already existing elements would not change.
As alternative, I can only think of referencing the "thing" in "consumer" via a std::shared_ptr<thing> m_thing, as opposed to the current thing* m_thing but that seems like the worst approach to me, because a "thing" shall not own a "consumer" and with shared pointers I would create shared ownership.
So, is the allocator-approach a good one? And if so, how can it be done? Do I have to implement the allocator by myself or is there an existing one?
If you are able to treat thing as a value type, do so. It simplifies things, you don't need a smart pointer for circumventing the pointer/reference invalidation issue. The latter can be tackled differently:
If new thing instances are inserted via push_front and push_back during the program, use std::deque instead of std::vector. Then, no pointers or references to elements in this container are invalidated (iterators are invalidated, though - thanks to #odyss-jii for pointing that out). If you fear that you heavily rely on the performance benefit of the completely contiguous memory layout of std::vector: create a benchmark and profile.
If new thing instances are inserted in the middle of the container during the program, consider using std::list. No pointers/iterators/references are invalidated when inserting or removing container elements. Iteration over a std::list is much slower than a std::vector, but make sure this is an actual issue in your scenario before worrying too much about that.
There is no single right answer to this question, since it depends a lot on the exact access patterns and desired performance characteristics.
Having said that, here is my recommendation:
Continue storing the data contiguously as you are, but do not store aliasing pointers to that data. Instead, consider a safer alternative (this is a proven method) where you fetch the pointer based on an ID right before using it -- as a side-note, in a multi-threaded application you can lock attempts to resize the underlying store whilst such a weak reference lives.
So your consumer will store an ID, and will fetch a pointer to the data from the "store" on demand. This also gives you control over all "fetches", so that you can track them, implement safety measure, etc.
void consumer::foo() {
thing *t = m_thing_store.get(m_thing_id);
if (t) {
// do something with t
}
}
Or more advanced alternative to help with synchronization in multi-threaded scenario:
void consumer::foo() {
reference<thing> t = m_thing_store.get(m_thing_id);
if (!t.empty()) {
// do something with t
}
}
Where reference would be some thread-safe RAII "weak pointer".
There are multiple ways of implementing this. You can either use an open-addressing hash table and use the ID as a key; this will give you roughly O(1) access time if you balance it properly.
Another alternative (best-case O(1), worst-case O(N)) is to use a "reference" structure, with a 32-bit ID and a 32-bit index (so same size as 64-bit pointer) -- the index serves as a sort-of cache. When you fetch, you first try the index, if the element in the index has the expected ID you are done. Otherwise, you get a "cache miss" and you do a linear scan of the store to find the element based on ID, and then you store the last-known index value in your reference.
IMO best approach would be create new container which will behave is safe way.
Pros:
change will be done on separate level of abstraction
changes to old code will be minimal (just replace std::vector with new container).
it will be "clean code" way to do it
Cons:
it may look like there is a bit more work to do
Other answer proposes use of std::list which will do the job, but with larger number of allocation and slower random access. So IMO it is better to compose own container from couple of std::vectors.
So it may start look more or less like this (minimum example):
template<typename T>
class cluster_vector
{
public:
static const constexpr cluster_size = 16;
cluster_vector() {
clusters.reserve(1024);
add_cluster();
}
...
size_t size() const {
if (clusters.empty()) return 0;
return (clusters.size() - 1) * cluster_size + clusters.back().size();
}
T& operator[](size_t index) {
thowIfIndexToBig(index);
return clusters[index / cluster_size][index % cluster_size];
}
void push_back(T&& x) {
if_last_is_full_add_cluster();
clusters.back().push_back(std::forward<T>(x));
}
private:
void thowIfIndexToBig(size_t index) const {
if (index >= size()) {
throw std::out_of_range("cluster_vector out of range");
}
}
void add_cluster() {
clusters.push_back({});
clusters.back().reserve(cluster_size);
}
void if_last_is_full_add_cluster() {
if (clusters.back().size() == cluster_size) {
add_cluster();
}
}
private:
std::vector<std::vector<T>> clusters;
}
This way you will provide container which will not reallocate items. It doesn't meter what T does.
[A shared pointer] seems like the worst approach to me, because a "thing" shall not own a "consumer" and with shared pointers I would create shared ownership.
So what? Maybe the code is a little less self-documenting, but it will solve all your problems.
(And by the way you are muddling things by using the word "consumer", which in a traditional producer/consumer paradigm would take ownership.)
Also, returning a raw pointer in your current code is already entirely ambiguous as to ownership. In general, I'd say it's good practice to avoid raw pointers if you can (like you don't need to call delete.) I would return a reference if you go with unique_ptr
std::vector<std::unique_ptr<thing>> i_am_the_owner_of_things;
thing& get_thing_for_consumer() {
// some thing-selection logic
return *i_am_the_owner_of_things[5]; // 5 is just an example
}
I am currently running into a disgusting problem. Suppose there is a list aList of objects(whose type we call Object), and I want to iterate through it. Basically, the code would be like this:
for(int i = 0; i < aList.Size(); ++i)
{
aList[i].DoSth();
}
The difficult part here is, the DoSth() method could change the caller's position in the list! So two consequences could occur: first, the iteration might never be able to come to an end; second, some elements might be skipped (the iteration is not necessarily like above, since it might be a linked list). Of course, the first one is the major concern.
The problem must be solved with these constraints:
1) The possibility of doing position-exchanging operations cannot be excluded;
2) The position-exchanging operations can be delayed until the iteration finishes, if necessary and doable;
3) Since it happens quite often, the iteration can be modified only minimally (so actions like creating a copy of the list is not recommended).
The language I'm using is C++, but I think there are similar problems in JAVA and C#, etc.
The following are what I've tried:
a) Try forbidding the position-exchanging operations during the iteration. However, that involves too many client code files and it's just not practical to find and modify all of them.
b) Modify every single method(e.g., Method()) of Object that can change the position of itself and will be called by DoSth() directly or indirectly, in this way: first we can know that aList is doing the iteration, and we'll treat Method() accordingly. If the iteration is in progress, then we delay what Method() wants to do; otherwise, it does what it wants to right now. The question here is: what is the best (easy-to-use, yet efficient enough) way of delaying a function call here? The parameters of Method() could be rather complex. Moreover, this approach will involve quite a few functions, too!
c) Try modifying the iteration process. The real situation I encounter here is quite complex because it involves two layers of iterations: the first of them is a plain array iteration, while the second is a typical linked list iteration lying in a recursive function. The best I can do about the second layer of iteration for now, is to limit its iteration times and prevent the same element from being iterated more than once.
So I guess there could be some better way to tackle this problem? Maybe some awesome data structure will help?
Your question is a little light on detail, but from what you have written it seems that you are making the mistake of mixing concerns.
It is likely that your object can perform some action that causes it to either continue to exist or not. The decision that it should no longer exist is a separate concern to that of actually storing it in a container.
So let's split those concerns out:
#include <vector>
enum class ActionResult {
Dies,
Lives,
};
struct Object
{
ActionResult performAction();
};
using Container = std::vector<Object>;
void actions(Container& cont)
{
for (auto first = begin(cont), last = end(cont)
; first != last
; )
{
auto result = first->performAction();
switch(result)
{
case ActionResult::Dies:
first = cont.erase(first); // object wants to die so remove it
break;
case ActionResult::Lives: // object wants to live to continue
++first;
break;
}
}
}
If there are indeed only two results of the operation, lives and dies, then we could express this iteration idiomatically:
#include <algorithm>
// ...
void actions(Container& cont)
{
auto actionResultsInDeath = [](Object& o)
{
auto result = o.performAction();
return result == ActionResult::Dies;
};
cont.erase(remove_if(begin(cont), end(cont),
actionResultsInDeath),
end(cont));
}
Well, problem solved, at least in regard to the situation I'm interested in right now. In my situation, aList is really a linked list and the Object elements are accessed through pointers. If the size of aList is relatively small, then we have an elegant solution just like this:
Object::DoSthBig()
{
Object* pNext = GetNext();
if(pNext)
pNext->DoSthBig();
DoSth();
}
This has the underlying hypothesis that each pNext keeps being valid during the process. But if the element-deletion operation has already been dealt with discreetly, then everything is fine.
Of course, this is a very special example and is unable to be applied to other situations.
I once landed an interview and was asked what the purpose of assigning a variable by reference would be (as in the following case):
int i = 0;
int &j = i;
My answer was that C++ references work like C pointers, but cannot assume the NULL value, they must always point to a concrete object in memory. Of course, the syntax is different when using references (no need for the pointer indirection operator, and object properties will be accessed via the dot (.) rather than the arrow (->) operator). Perhaps the most important difference, is that unlike with pointers, where you can make a pointer point to something different (even after it was pointing to the same thing as another pointer), with references, if one reference is updated, then the other references which pointed to the same thing are also updated to point to the very same object.
But then I went on to say that the above use of references is pretty useless (and perhaps this is where I went wrong), because I couldn't see a practical advantage to assigning by reference: since both references end up pointing to the same thing, you could easily do with one reference, and couldn't think of a case where this wouldn't be the case. I went on to explain that references are useful as pass-by-reference function parameters, but not in assignments. But the interviewer said they assign by reference in their code all the time, and flunked me (I then went on to work for a company that this company was a client of, but that's besides the point).
Anyways, several years later, I would like to know where I could have gone wrong.
To begin with, I'd hope for that company's sake that wasn't the ONLY reason they didn't hire you, since it's a petty detail (and no, you don't really know exactly why a company doesn't hire you).
As touched on in the comment, references NEVER change what they refer to within their lifetime. Once set, a reference refers to that same location, until it "dies".
Now, references are quite useful to simplify an expression. Say we have a class or structure with a fair amount of complicated content. Say something like this:
struct A
{
int x, y, z;
};
struct B
{
A arr[100];
};
class C
{
public:
void func();
B* list[20];
};
void C::func()
{
...
if (list[i]->arr[j].x == 4 && list[i]->arr[j].y == 5 &&
(list[i]->arr[j].z < 10 || list[i]->arr[j].z > 90))
{
... do stuff ...
}
}
That's a lot of repeats of list[i]->arr[j] in there. So we could rewrite it using a reference:
void C::func()
{
...
A &cur = list[i]->arr[j];
if (cur.x == 4 && cur.y == 5 &&
(cur.z < 10 || cur.z > 90))
{
... do stuff ...
}
}
The above code assumes do stuff is actually mofidying the cur element in some way, if not, you should probably use const A &cur =... instead.
I use this technique quite a bit to make it clearer and less repetitive.
In this particular case of assigning a reference to a local variable of primitive type in the same scope, the assignment is very much useless: there is nothing you can do using j that you could not do using i. There are several mildly negative consequences to it, too, because the readability would suffer, and the optimizer may get confused.
Here is one legitimate use of assigning a reference:
class demo {
private:
map<int,string> cache;
string read_resource(int id) {
string resource_string;
... // Lengthy process for getting a non-empty resource string
return resource_string;
}
public:
string& get_by_id(int id) {
// Here is a nice trick
string &res = cache[id];
if (res.size() == 0) {
// Assigning res modifies the string in the map
res = read_resource(id);
}
return res;
}
};
Above, variable res of reference type refers to an element of the map that is either retrieved, or created new. If the string is created new, the code calls the "real" getter, and assigns its result to res. This automatically updates the cache, too, saving us another lookup in the cache map.
The underlying data structure I am using is:
map<int, Cell> struct Cell{ char c; Cell*next; };
In effect the data structure maps an int to a linked list. The map(in this case implemented as a hashmap) ensures that finding a value in the list runs in constant time. The Linked List ensures that insertion and deletion also run in constant time. At each processing iteration I am doing something like:
Cell *cellPointer1 = new Cell;
//Process cells, build linked list
Once the list is built I put the elements Cell in map. The structure was working just fine and after my program I deallocate memory. For each Cell in the list.
delete cellPointer1
But at the end of my program I have a memory leak!!
To test memory leak I use:
#include <stdlib.h>
#include <crtdbg.h>
#define _CRTDBG_MAP_ALLOC
_CrtDumpMemoryLeaks();
I'm thinking that somewhere along the way the fact that I am putting the Cells in the map does not allow me to deallocate the memory correctly. Does anyone have any ideas on how to solve this problem?
We'll need to see your code for insertion and deletion to be sure about it.
What I'd see as a memleak-free insert / remove code would be:
( NOTE: I'm assuming you don't store the Cells that you allocate in the map )
//
// insert
//
std::map<int, Cell> _map;
Cell a; // no new here!
Cell *iter = &a;
while( condition )
{
Cell *b = new Cell();
iter->next = b;
iter = b;
}
_map[id] = a; // will 'copy' a into the container slot of the map
//
// cleanup:
//
std::map<int,Cell>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell &a = i->second;
Cell *iter = a.next; // list of cells associated to 'a'.
while( iter != NULL )
{
Cell *to_delete = iter;
iter = iter->next;
delete to_delete;
}
_map.erase(i); // will remove the Cell from the map. No need to 'delete'
i++;
}
Edit: there was a comment indicating that I might not have understood the problem completely. If you insert ALL the cells you allocate in the map, then the faulty thing is that your map contains Cell, not Cell*.
If you define your map as: std::map<int, Cell *>, your problem would be solved at 2 conditions:
you insert all the Cells that you allocate in the map
the integer (the key) associated to each cell is unique (important!!)
Now the deletion is simply a matter of:
std::map<int, Cell*>::iterator i = _map.begin();
while( i != _map.end() )
{
Cell *c = i->second;
if ( c != NULL ) delete c;
}
_map.clear();
I've built almost the exact same hybrid data structure you are after (list/map with the same algorithmic complexity if I were to use unordered_map instead) and have been using it from time to time for almost a decade though it's a kind of bulky structure (something I'd use with convenience in mind more than efficiency).
It's worth noting that this is quite different from just using std::unordered_map directly. For a start, it preserves the original order in which one inserts elements. Insertion, removal, and searches are guaranteed to happen in logarithmic time (or constant time depending on whether key searching is involved and whether you use a hash table or BST), iterators do not get invalidated on insertion/removal (the main requirement I needed which made me favor std::map over std::unordered_map), etc.
The way I did it was like this:
// I use this as the iterator for my container with
// the list being the main 'focal point' while I
// treat the map as a secondary structure to accelerate
// key searches.
typedef typename std::list<Value>::iterator iterator;
// Values are stored in the list.
std::list<Value> data;
// Keys and iterators into the list are stored in a map.
std::map<Key, iterator> accelerator;
If you do it like this, it becomes quite easy. push_back is a matter of pushing back to the list and adding the last iterator to the map, iterator removal is a matter of removing the key pointed to by the iterator from the map before removing the element from the list as the list iterator, finding a key is a matter of searching the map and returning the associated value in the map which happens to be the list iterator, key removal is just finding a key and then doing iterator removal, etc.
If you want to improve all methods to constant time, then you can use std::unordered_map instead of std::map as I did here (though that comes with some caveats).
Taking an approach like this should simplify things considerably over an intrusive list-based solution where you're manually having to free memory.
Is there a reason why you are not using built-in containers like, say, STL?
Anyhow, you don't show the code where the allocation takes place, nor the map definition (is this coming from a library?).
Are you sure you deallocate all of the previously allocated Cells, starting from the last one and going backwards up to the first?
You could do this using the STL (remove next from Cell):
std::unordered_map<int,std::list<Cell>>
Or if cell only contains a char
std::unordered_map<int,std::string>
If your compiler doesn't support std::unordered_map then try boost::unordered_map.
If you really want to use intrusive data structures, have a look at Boost Intrusive.
As others have pointed out, it may be hard to see what you're doing wrong without seeing your code.
Someone should mention, however, that you're not helping yourself by overlaying two container types here.
If you're using a hash_map, you already have constant insertion and deletion time, see the related Hash : How does it work internally? post. The only exception to the O(c) lookup time is if your implementation decides to resize the container, in which case you have added overhead regardless of your linked list addition. Having two addressing schemes is only going to make things slower (not to mention buggier).
Sorry if this doesn't point you to the memory leak, but I'm sure a lot of memory leaks / bugs come from not using stl / boost containers to their full potential. Look into that first.
You need to be very careful with what you are doing, because values in a C++ map need to be copyable and with your structure that has raw pointers, you must handle your copy semantics properly.
You would be far better off using std::list where you won't need to worry about your copy semantics.
If you can't change that then at least std::map<int, Cell*> will be a bit more manageable, although you would have to manage the pointers in your map because std::map will not manage them for you.
You could of course use std::map<int, shared_ptr<Cell> >, probably easiest for you for now.
If you also use shared_ptr within your Cell object itself, you will need to beware of circular references, and as Cell will know it's being shared_ptr'd you could derive it from enable_shared_from_this
My final point will be that list is very rarely the correct collection type to use. It is the correct one to use sometimes, especially when you have an LRU cache situation and you want to move accessed elements to the end of the list fast. However that is the minority case and it probably doesn't apply here. Think of an alternative collection you really want. map< int, set<char> > perhaps? or map< int, vector< char > > ?
Your list has a lot of overheads to store a few chars