My understanding is that C++ best practice is to define variables with the smallest scope possible.
My understanding is that the primary reason for this is that it will help prevent accidental reuse. In addition, there almost never a performance hit for doing so (or so I have been told). To the contrary, people seem to indicate that a compiler may actually be able to produce equivalent or better code when variables are defined locally. For example, the following two functions produce the same binaries in Godbolt:
#include <cstdio>
#include <cstdlib>
void printRand1() {
char val;
for (size_t i = 0; i < 100 ; ++i) {
val = rand();
puts(&val);
}
}
void printRand2() {
for (size_t i = 0; i < 100 ; ++i) {
const char val = rand();
puts(&val);
}
}
So in this case, version 2 is clearly preferable. This context I can totally agree with and understand.
What is not clear to me is whether the same logic should be applied to larger data types such as arrays or vectors. One particular thing that I find a lot in code is something like this:
#include <cstdio>
#include <cstdlib>
#include <vector>
struct Bob {
std::vector<char> buffer;
void bar(int N) {
buffer.resize(N);
for (auto & elem : buffer) {
elem = rand();
puts(&elem);
}
}
};
void bob() {
Bob obj;
obj.bar(100);
}
despite the fact that we could have localized the data better in this dumb example:
#include <cstdio>
#include <cstdlib>
#include <vector>
struct Bob {
void bar(int N) {
std::vector<char> buffer(N);
for (auto & elem : buffer) {
elem = rand();
puts(&elem);
}
}
};
void bob() {
Bob obj;
obj.bar(100);
}
Note: Before you guys jump on this, I totally realize that you don't actually need a vector in this example. I am just making a stupid example so that the binary code is not too large on Godbolt.
The rationale here for NOT localizing the data (aka. Snippet 1) is that the buffer could be some large vector, and we don't want to keep reallocating it every time we call the function.
The rationale for Snippet 2 is to localize the data better.
So what logic should I apply for this scenario? I am interested in the case where you actually need a vector (in this case you don't).
Should I follow the localization logic? or should I ride by the logic that I should try to prevent repeated reallocations?
I realize that in an actual application, you would want to benchmark the performance, not the compiled size in Godbolt. But I wonder what should be my default style for this scenario (before I start profiling the code).
The main consideration in the scenario you describe is: "Is the buffer an integral part of what a Bob is? Or is it just something we use in the implementation of bar()?"
If every Bob has a sequence of contiguous chars throughout its life as a Bob, then - that should be a member variable. If you only form that sequence to run bar(), then by the "smallest relevant scope" rule, that vector will only exist as a local variable inside bar().
Now, the above is the general-case answer. Sometimes, for reasons of performance, you may end up breaking your clean and reasonable abstractions. For example: You might have some single vector allocated and just it associated with a Bob for a period of time, then dissociate the buffer from your Bob but keep it in some buffer cache. But don't think about these kinds of contortions unless you have a very good reason to.
In version 2, memory will be allocated and deallocated with each bar() call. While version one will reuse already allocated chunk. For single call it doesn't matter, for multiple - version 1 would be preferred.
Related
I'm in the mood for some premature optimization and was wondering the following.
If one has a for-loop, and inside that loop there is a call to a function that returns a container, say a vector, of which the value is caught as an rvalue into a variable in the loop using move semantics, for instance:
std::vector<any_type> function(int i)
{
std::vector<any_type> output(3);
output[0] = i;
output[1] = i*2;
output[2] = i-3;
return(output);
}
int main()
{
for (int i = 0; i < 10; ++i)
{
// stuff
auto value = function(i);
// do stuff with value ...
// ... but in such a way that it can be discarded in the next iteration
}
}
How do compilers handle this memory-wise in the case that move semantics are applied (and that the function will not be inlined)? I would imagine that the most efficient thing to do is to allocate a single piece of memory for all the values, both inside the function and outside in the for-loop, that will get overwritten in each iteration.
I am mainly interested in this, because in my real-life application the vectors I'm creating are a lot larger than in the example given here. I am concerned that if I use functions like this, the allocation and destruction process will take up a lot of useless time, because I already know that I'm going to use that fixed amount of memory a lot of times. So, what I'm actually asking is whether there's some way that compilers would optimize to something of this form:
void function(int i, std::vector<any_type> &output)
{
// fill output
}
int main()
{
std::vector<any_type> dummy; // allocate memory only once
for (int i = 0; i < 10; ++i)
{
// stuff
function(i, dummy);
// do stuff with dummy
}
}
In particular I'm interested in the GCC implementation, but would also like to know what, say, the Intel compiler does.
Here, the most predictable optimization is RVO. When a function return an object, if it is used to initialize a new variable, the compiler can elide additional copy and move to construct directly on the destination ( it means that a program can contains two versions of the function depending on the use case ).
Here, you will still pay for allocating and destroying a buffer inside the vector at each loo iteration. If it is unacceptable, you will have to rely on an other solution, like std::array as your function seems to use fixed size dimension or move the vector before the loop and reuse it.
I would imagine that the most efficient thing to do is to allocate a
single piece of memory for all the values, both inside the function
and outside in the for-loop, that will get overwritten in each
iteration.
I don't think that any of the current compilers can do that. (I would be stunned to see that.) If you want to get insights, watch Chandler Carruth's talk.
If you need this kind of optimization, you need to do it yourself: Allocate the vector outside the loop and pass it by non-const reference to function() as argument. Of course, don't forget to call clear() when you are done or call clear() first inside function().
All this has nothing to do with move semantics, nothing has changed with C++11 in this respect.
If your loop is a busy loop, than allocating a container in each iteration can cost you a lot. It's easier to find yourself in such a situation than you would probably expect. Andrei Alexandrescu presents an example in his talk Writing Quick Code in C++, Quickly. The surprising thing is that doing unnecessary heap allocations in a tight loop like the one in his example can be slower than the actual file IO. I was surprised to see that. By the way, the container was std::string.
The "naive" solution is;
std::vector<T> vector_of_objects;
vector_of_objects.reserve(vector_of_pointers.size());
for (T const * p : vector_of_pointers)
vector_of_objects.push_back(*p);
The above seems cumbersome and perhaps not immediately obvious.
Is there a solution that is at least not significantly less efficient and perhaps a little quicker and more intuitive? I'm thinking C++11 might have a solution that I am not aware of...
Does writing everything in one line mean shorter code? No.
In my opinion these two lines are shorter and more readable:
for (auto p : vector_of_pointers)
vector_of_objects.emplace_back(*p);
Function std::for_each is not shorter than ranged-based loop, sometime it's bigger due to passing lambda expressions.
Function std::transform is even longer than std::for_each, however the word transform is an advantage to find out what's happening in the following.
You are doing the correct way. Another way is to use built-in algorithms library, like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
// Create a vector of pointers
std::vector<int*> vptr;
vptr.push_back(new int(1));
vptr.push_back(new int(2));
// Copy to vector of objects
std::vector<int> vobj;
std::for_each(vptr.begin(), vptr.end(), [&](int *n) { vobj.emplace_back(*n); });
// Free the pointers
std::for_each(vptr.begin(), vptr.end(), [&](int *n) { delete n; });
// Print out the vector of objects
std::copy(vobj.begin(), vobj.end(), std::ostream_iterator<int>(std::cout, " "));
return 0;
}
The idiomatic way would be to use std::transform:
std::transform( vector_of_pointers.begin(),
vector_of_pointers.end(),
std::back_inserter( vector_of_objects ),
[]( T* p ) { return *p; } );
Whether this is "better" than what you've written is another
question: it has the advantage of being idiomatic, and of
actually naming what is going on (which makes the code slightly
clearer). On the other hand, the "transformation" is very, very
simple, so it would be easily recognized in the loop, and the
new form for writing such loops makes things fairly clear as
well.
No, you have to call the copy-ctor as in your solution, there's no way around that.
std::vector<T*> vecPointers;
std::vector<T> vecValues;
for(size_t x=0;x<vecPointers.size();x++)
{
vecValues.push_back(*vecPointers[x]);
}
I believe that if type T is a custom object then you will need to create a copy constructor for class T.
class T
{
private:
int someValue;
public:
T()
{
}
T(const T &o)// copy constructor
{
someValue = o.someValue;
}
virtual ~T()
{
}
};
It seems to me that the real question is whether you're doing this often enough for it to be worth writing extra code in one place to clean up the code in the other places.
I can imagine writing a deref_iterator that would allow you to do something like this:
std::vector<T> vector_of_objects{
deref_iterator(std::begin(vector_of_pointers)),
deref_iterator(std::end(vector_of_pointers))};
Now, we're left with the question of whether this is really shorter than the original loop or not. In terms of simple number of key strokes, it's probably going to depend on the names you give things. If you didn't care about readable names, it could be:
vector<T> v_o{d(begin(v_p)), d(end(v_p))};
The short names obviously make it short, but I certainly wouldn't advise them -- if I hadn't just typed this in, I'd have no clue in the world what it meant. A longer name (that needs to be repeated a couple of times) obviously adds more key-strokes, but I can't imagine anybody thinking the readability wasn't worth it.
In any case, the deref_iterator itself would clearly take up some code. An iterator has enough boiler-plate that it typically takes around 100 lines of code or so. Let's (somewhat arbitrarily) decide that this saves one line of code every time you use it. On that basis, you'd have to use it 100 times to break even.
I'm not sure that's accurate in characterizing the code overall -- the code for an iterator is mostly boiler-plate, and other than a typo, there's not much that could really go wrong with it. For the most part, it would be a matter of including the right header, and using it, not of virtually ever having to look at the code for the iterator itself.
That being the case, I might accept it as an improvement even if the total number of lines of code increased. Writing it to use only once would clearly be a loss, but I don't think it'd need to be a full 100 times to qualify as breaking even either.
Problem
Suppose I have a large array of bytes (think up to 4GB) containing some data. These bytes correspond to distinct objects in such a way that every s bytes (think s up to 32) will constitute a single object. One important fact is that this size s is the same for all objects, not stored within the objects themselves, and not known at compile time.
At the moment, these objects are logical entities only, not objects in the programming language. I have a comparison on these objects which consists of a lexicographical comparison of most of the object data, with a bit of different functionality to break ties using the remaining data. Now I want to sort these objects efficiently (this is really going to be a bottleneck of the application).
Ideas so far
I've thought of several possible ways to achieve this, but each of them appears to have some rather unfortunate consequences. You don't necessarily have to read all of these. I tried to print the central question of each approach in bold. If you are going to suggest one of these approaches, then your answer should respond to the related questions as well.
1. C quicksort
Of course the C quicksort algorithm is available in C++ applications as well. Its signature matches my requirements almost perfectly. But the fact that using that function will prohibit inlining of the comparison function will mean that every comparison carries a function invocation overhead. I had hoped for a way to avoid that. Any experience about how C qsort_r compares to STL in terms of performance would be very welcome.
2. Indirection using Objects pointing at data
It would be easy to write a bunch of objects holding pointers to their respective data. Then one could sort those. There are two aspects to consider here. On the one hand, just moving around pointers instead of all the data would mean less memory operations. On the other hand, not moving the objects would probably break memory locality and thus cache performance. Chances that the deeper levels of quicksort recursion could actually access all their data from a few cache pages would vanish almost completely. Instead, each cached memory page would yield only very few usable data items before being replaced. If anyone could provide some experience about the tradeoff between copying and memory locality I'd be very glad.
3. Custom iterator, reference and value objects
I wrote a class which serves as an iterator over the memory range. Dereferencing this iterator yields not a reference but a newly constructed object to hold the pointer to the data and the size s which is given at construction of the iterator. So these objects can be compared, and I even have an implementation of std::swap for these. Unfortunately, it appears that std::swap isn't enough for std::sort. In some parts of the process, my gcc implementation uses insertion sort (as implemented in __insertion_sort in file stl_alog.h) which moves a value out of the sequence, moves a number items by one step, and then moves the first value back into the sequence at the appropriate position:
typename iterator_traits<_RandomAccessIterator>::value_type
__val = _GLIBCXX_MOVE(*__i);
_GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);
*__first = _GLIBCXX_MOVE(__val);
Do you know of a standard sorting implementation which doesn't require a value type but can operate with swaps alone?
So I'd not only need my class which serves as a reference, but I would also need a class to hold a temporary value. And as the size of my objects is dynamic, I'd have to allocate that on the heap, which means memory allocations at the very leafs of the recusrion tree. Perhaps one alternative would be a vaue type with a static size that should be large enough to hold objects of the sizes I currently intend to support. But that would mean that there would be even more hackery in the relation between the reference_type and the value_type of the iterator class. And it would mean I would have to update that size for my application to one day support larger objects. Ugly.
If you can think of a clean way to get the above code to manipulate my data without having to allocate memory dynamically, that would be a great solution. I'm using C++11 features already, so using move semantics or similar won't be a problem.
4. Custom sorting
I even considered reimplementing all of quicksort. Perhaps I could make use of the fact that my comparison is mostly a lexicographical compare, i.e. I could sort sequences by first byte and only switch to the next byte when the firt byte is the same for all elements. I haven't worked out the details on this yet, but if anyone can suggest a reference, an implementation or even a canonical name to be used as a keyword for such a byte-wise lexicographical sorting, I'd be very happy. I'm still not convinced that with reasonable effort on my part I could beat the performance of the STL template implementation.
5. Completely different algorithm
I know there are many many kinds of sorting algorithms out there. Some of them might be better suited to my problem. Radix sort comes to my mind first, but I haven't really thought this through yet. If you can suggest a sorting algorithm more suited to my problem, please do so. Preferrably with implementation, but even without.
Question
So basically my question is this:
“How would you efficiently sort objects of dynamic size in heap memory?”
Any answer to this question which is applicable to my situation is good, no matter whether it is related to my own ideas or not. Answers to the individual questions marked in bold, or any other insight which might help me decide between my alternatives, would be useful as well, particularly if no definite answer to a single approach turns up.
The most practical solution is to use the C style qsort that you mentioned.
template <unsigned S>
struct my_obj {
enum { SIZE = S; };
const void *p_;
my_obj (const void *p) : p_(p) {}
//...accessors to get data from pointer
static int c_style_compare (const void *a, const void *b) {
my_obj aa(a);
my_obj bb(b);
return (aa < bb) ? -1 : (bb < aa);
}
};
template <unsigned N, typename OBJ>
void my_sort (const char (&large_array)[N], const OBJ &) {
qsort(large_array, N/OBJ::SIZE, OBJ::SIZE, OBJ::c_style_compare);
}
(Or, you can call qsort_r if you prefer.) Since STL sort inlines the comparision calls, you may not get the fastest possible sorting. If all your system does is sorting, it may be worth it to add the code to get custom iterators to work. But, if most of the time your system is doing something other than sorting, the extra gain you get may just be noise to your overall system.
Since there are only 31 different object variations (1 to 32 bytes), you could easily create an object type for each and select a call to std::sort based on a switch statement. Each call will get inlined and highly optimized.
Some object sizes might require a custom iterator, as the compiler will insist on padding native objects to align to address boundaries. Pointers can be used as iterators in the other cases since a pointer has all the properties of an iterator.
I'd agree with std::sort using a custom iterator, reference and value type; it's best to use the standard machinery where possible.
You worry about memory allocations, but modern memory allocators are very efficient at handing out small chunks of memory, particularly when being repeatedly reused. You could also consider using your own (stateful) allocator, handing out length s chunks from a small pool.
If you can overlay an object onto your buffer, then you can use std::sort, as long as your overlay type is copyable. (In this example, 4 64bit integers). With 4GB of data, you're going to need a lot of memory though.
As discussed in the comments, you can have a selection of possible sizes based on some number of fixed size templates. You would have to have pick from these types at runtime (using a switch statement, for example). Here's an example of the template type with various sizes and example of sorting the 64bit size.
Here's a simple example:
#include <vector>
#include <algorithm>
#include <iostream>
#include <ctime>
template <int WIDTH>
struct variable_width
{
unsigned char w_[WIDTH];
};
typedef variable_width<8> vw8;
typedef variable_width<16> vw16;
typedef variable_width<32> vw32;
typedef variable_width<64> vw64;
typedef variable_width<128> vw128;
typedef variable_width<256> vw256;
typedef variable_width<512> vw512;
typedef variable_width<1024> vw1024;
bool operator<(const vw64& l, const vw64& r)
{
const __int64* l64 = reinterpret_cast<const __int64*>(l.w_);
const __int64* r64 = reinterpret_cast<const __int64*>(r.w_);
return *l64 < *r64;
}
std::ostream& operator<<(std::ostream& out, const vw64& w)
{
const __int64* w64 = reinterpret_cast<const __int64*>(w.w_);
std::cout << *w64;
return out;
}
int main()
{
srand(time(NULL));
std::vector<unsigned char> buffer(10 * sizeof(vw64));
vw64* w64_arr = reinterpret_cast<vw64*>(&buffer[0]);
for(int x = 0; x < 10; ++x)
{
(*(__int64*)w64_arr[x].w_) = rand();
}
std::sort(
w64_arr,
w64_arr + 10);
for(int x = 0; x < 10; ++x)
{
std::cout << w64_arr[x] << '\n';
}
std::cout << std::endl;
return 0;
}
Given the enormous size (4GB), I would seriously consider dynamic code generation. Compile a custom sort into a shared library, and dynamically load it. The only non-inlined call should be the call into the library.
With precompiled headers, the compilation times may actually be not that bad. The whole <algorithm> header doesn't change, nor does your wrapper logic. You just need to recompile a single predicate each time. And since it's a single function you get, linking is trivial.
#define OBJECT_SIZE 32
struct structObject
{
unsigned char* pObject;
bool operator < (const structObject &n) const
{
for(int i=0; i<OBJECT_SIZE; i++)
{
if(*(pObject + i) != *(n.pObject + i))
return (*(pObject + i) < *(n.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * OBJECT_SIZE); // 10 Objects
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*OBJECT_SIZE);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to check the sort
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end());
free(pObjects);
To skip the #define
struct structObject
{
unsigned char* pObject;
};
struct structObjectComparerAscending
{
int iSize;
structObjectComparerAscending(int _iSize)
{
iSize = _iSize;
}
bool operator ()(structObject &stLeft, structObject &stRight)
{
for(int i=0; i<iSize; i++)
{
if(*(stLeft.pObject + i) != *(stRight.pObject + i))
return (*(stLeft.pObject + i) < *(stRight.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
int iObjectSize = 32; // Read it from somewhere
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * iObjectSize);
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*iObjectSize);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to work with something...
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end(), structObjectComparerAscending(iObjectSize));
free(pObjects);
Question First
Is there an elegant solution in C++ to prevent one from having to declare complex object variables that are only used within a loop outside of the loop for efficiency reasons?
Detailed explanation
A colleague has raised an interesting point wrt. to our code policy, which states (paraphrased): always use minimal scope for variables and declare the variable at the first initialization.
Coding Guide Example:
// [A] DO THIS
void f() {
...
for (int i=0; i!=n; ++i) {
const double x = calculate_x(i);
set_squares(i, x*x);
}
...
}
// [B] DON'T do this:
void f() {
int i;
int n;
double x;
...
for (i=0; i!=n; ++i) {
x = calculate_x(i);
set_squares(i, x*x);
}
...
}
This is all nice and well, and there's certainly nothing wrong with this, until you move from primitive types to objects. (for a certain kind of interface)
Example:
// [C]
void fs() {
...
for (int i=0; i!=n; ++i) {
string s;
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
...
}
Here, the string s will be destructed, it's memory release every loop cycle and then every cycle the get_text function will have to newly allocate the memory for the s buffer.
It would be clearly more efficient to write:
// [D]
string s;
for (int i=0; i!=n; ++i) {
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
as now the allocated memory in the s buffer will be preserved between loop runs and it is very likely that we'll save on allocations.
Disclaimer: Please note: Since this is loops and we're talking memory allocations, I do not consider it premature optimization to think about this problem generally. Certainly there are cases and loops where the overhead wouldn't matter; but n has the nagging tendency to be larger that the Dev initially expects and the code has the nagging tendency to be run in contexts where performance does matter.
Anyway, so now the more efficient way for the "general" loop construct is to violate code locality and declare complex objects out of place, "just in case". This makes me rather uneasy.
Note that I consider writing it like this:
// [E]
void fs() {
...
{
string s;
for (int i=0; i!=n; ++i) {
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
}
...
}
is no solution as readability suffers even more!
Thinking further, the interface of the get_text function is non-idiomatic anyway, as out params are so yesterday anyway and a "good" interface would return by value:
// [F]
for (int i=0; i!=n; ++i) {
string s = get_text(i); // string get_text(int);
to_lower(s);
set_lower_text(i, s);
}
Here, we do not pay double for memory allocation, because it is extremely likely that s will be constructed via RVO from the return value, so for [F] we pay the same in allocation overhead as in [C]. Unlike the [C] case however, we can't optimize this interface variant.
So the bottom line seems to be that using minimal scope (can) hurt performance and using clean interfaces I at least consider return by value a lot cleaner than that out-ref-param stuff will prevent optimization opportunities -- at least in the general case.
The problem isn't so much that one would have to forgo clean code for efficiency sometimes, the problem is that as soon as Devs start to find such special cases, the whole Coding Guide (see [A], [B]) looses authority.
The question now would be: see first paragraph
It would be clearly more efficient to write: [start of example D ...]
I doubt this bit. You're paying for default construction to begin with outside the loop. Within the loop, there is a possibility that get_text calls reallocate buffer (depends on how your get_text and the string is defined). Note that this for some runs you may actually see an improvement (say in the case where you get progressively shorter strings) and for some (where the string lengths go up by about a factor of 2 at every iteration) a huge hit in performance.
It makes perfect sense to hoist invariants out of your loop should they pose a bottleneck (which a profiler will tell you). Otherwise, go for code that is idiomatic.
I'd either:
make an exception to the rule for these heavyweights. like 'D' and note that you can restrict the scope as desired.
permit a helper function (the string could also be a parameter)
and if you really didn't like those, you could declare a local in your for loop's scope using a multi-element object which held your counter/iterator and the temporary. std::pair<int,std::string> would be one option, although a specialized container could reduce the syntactic noise.
(and the out parameter would be faster than RVO-style in many cases)
Depends on the implementation of get_text.
If you can implement it so it reuses the space allocated in the string object most of the time, then definitely declare the object outside the loop to avoid new dynamic memory allocation at each loop iteration.
Dynamic allocation is expensive (best single-threaded allocators will need about 40 instructions for a single allocation, multi-threading adds overhead and not all allocators are "best"), and can fragment memory.
(BTW, std::string typically implements so called "small string optimization", which avoids dynamic allocation for small strings. So if you know most of your strings will be small enough, and the implementation of std::string won't change, you could theoretically avoid dynamic allocation even when constructing a new object in each iteration. This would be very fragile however, so I'd recommend against it.)
In general case, it all depends on how your objects and functions that use them are implemented. If you care about performance, you'll have to deal with these kinds of "abstraction leaks" on case-by-case basis. So, pick your battles wisely: measure and optimize bottlenecks first.
If you have a copy-on-write implementation of the string class, then to_lower(s) will allocate memory anyway, so it is not clear that you can gain performance by simply declaring s outside the loop.
In my opinion, there are two possibilities:
1.) You have a class whose constructor does something non-trivial which need not be re-done in each iteration. Then it is logically straightforward to put the declaration outside the loop.
2.) You have a class whose constructor does not do anything useful, then put the declaration inside the loop.
If 1. is true, then you should probably split your object into a helper object which, e.g., allocates space and does non-trivial initializations, and a flyweight object. Something like the following:
StringReservedMemory m (500); /* base object for something complex, allocating 500 bytes of space */
for (...) {
MyOptimizedStringImplementation s (m);
...
}
Im wondering if this code:
int main(){
int p;
for(int i = 0; i < 10; i++){
p = ...;
}
return 0
}
is exactly the same as that one
int main(){
for(int i = 0; i < 10; i++){
int p = ...;
}
return 0
}
in term of efficiency ?
I mean, the p variable will be recreated 10 times in the second example ?
It's is the same in terms of efficiency.
It's not the same in terms of readability. The second is better in this aspect, isn't it?
It's a semantic difference which the code keeps hidden because it's not making a difference for int, but it makes a difference to the human reader. Do you want to carry the value of whatever calculation you do in ... outside of the loop? You don't, so you should write code that reflects your intention.
A human reader will need to seek the function and look for other uses of p to confirm himself that what you did was just premature "optimization" and didn't have any deeper purpose.
Assuming it makes a difference for the type you use, you can help the human reader by commenting your code
/* p is only used inside the for-loop, to keep it from reallocating */
std::vector<int> p;
p.reserve(10);
for(int i = 0; i < 10; i++){
p.clear();
/* ... */
}
In this case, it's the same. Use the smallest scope possible for the most readable code.
If int were a class with a significant constructor and destructor, then the first (declaring it outside the loop) can be a significant savings - but inside you usually need to recreate the state anyway... so oftentimes it ends up being no savings at all.
One instance where it might make a difference is containers. A string or vector uses internal storage that gets grown to fit the size of the data it is storing. You may not want to reconstruct this container each time through the loop, instead, just clear its contents and it may not need as many reallocations inside the loop. This can (in some cases) result in a significant performance improvement.
The bottom-line is write it clearly, and if profiling shows it matters, move it out :)
They are equal in terms of efficiency - you should trust your compiler to get rid of the immeasurably small difference. The second is better design.
Edit: This isn't necessarily true for custom types, especially those that deal with memory. If you were writing a loop for any T, I'd sure use the first form just in case. But if you know that it's an inbuilt type, like int, pointer, char, float, bool, etc. I'd go for the second.
In second example the p is visible only inside of the for loop. you cannot use it further in your code.
In terms of efficiency they are equal.