i´d like to know if there is a quicker way to copy my data from a mysqlpp::storequeryresult to a std::vector.
My example is as follows:
I store my Query Result with query.store() in StoreQueryResult and my result is e.g. a table with one column with doubles in it. Now I want to copy those doubles into a std::vector. The way I´m doing it right now is to access every single double with the [][] operator and copy it to my vector in a for-loop.
This works but it is very time consuming since i´m copying like 277000 double in a loop. Is there a way to just copy the column to my vector? The thing is my other functions use std::vectors in their parameterlists. Alternatively i could change my functions to call a StoreQueryResult i guess, but i´d prefere a std::vector.
Here is my simplified code:
void foo()
{
vector<double> vec;
mysqlpp::StoreQueryResult sqr;
Query query;
query << "SELECT * FROM tablename";
sqr = query.store();
vec.reserve(sqr.num_rows());
vec.resize(sqr.size());
for(int i=0; i != vec.size(); i++)
{
vec[i] = sqr[i]["my_column"];
}
}
I want something like:
vec = sqr["my_column"] // when my_column is a field with doubles
Thx in advance.
Martin
Ultimately, if you need to copy then you need to copy, and whether you write the loop yourself or get a library function to do it isn't particularly relevant.
What you can do is pre-reserve enough space in the destination vector to avoid repeated re-allocations and copies:
vec.reserve(sqr.num_rows());
It is possible that you wish to create a vector, but then only some values will actually be accessed and used.
In which case we may delay the conversion from mysqlpp::String to another datatype:
std::vector<mysqlpp::String> data(res.num_rows());
for(size_t i=0, n=res.num_rows(); i<n; ++i)
{
data[i] = std::move(res[i]["value"]);
}
Several things are happening here:
We are creating the vector that stores mysqlpp::String. It is an interesting datatype that can be converted to many others. In your case you were using operator double () const.
We get the size once, store it, and then use that value. It's the micro-optimisation, together with using ++i rather than i++; they don't add up to many cycles, but should be used, to keep the code in the spirit of optimisation.
We move the data, rather than copying it. See std::move if you've not encountered it before.
If then you have something like:
double sum = 0.0;
for(size_t i=0, n=data.num_rows(); i<n; i+=2)
{
sum+=double(data[i]);
}
You will only run the conversion routine on ½ of your values.
Of course, if you plan to use the resultant vector several times, you will actually start running the same conversions again and again. So this "optimisation" will actually hurt performance.
Related
I want to improve the performance of the following code. What aspect might affect the performance of the code when it's executed?
Also, considering that there is no limit to how many objects you can add to the container, what improvements could be made to “Object” or “addToContainer” to improve the performance of the program?
I was wondering if std::push_back in C++ affects performance of the code in any way? Especially if there is no limit to adding to list.
struct Object{
string name;
string description;
};
vector<Object> container;
void addToContainer(Object object) {
container.push_back(object);
}
int main() {
addToContainer({ "Fira", "+5 ATTACK" });
addToContainer({ "Potion", "+10 HP" });
}
Before you do ANYTHING profile the code and get a benchmark. After you make a change profile the code and get a benchmark. Compare the benchmarks. If you do not do this, you're rolling dice. Is it faster? Who knows.
Profile profile profile.
With push_back you have two main concerns:
Resizing the vector when it fills up, and
Copying the object into the vector.
There are a number of improvements you can make to the resizing cost cost of push_back depending on how items are being added.
Strategic use of reserve to minimize the amount of resizing, for example. If you know how many items are about to be added, you can check the capacity and size to see if it's worth your time to reserve to avoid multiple resizes. Note this requires knowledge of vector's expansion strategy and that is implementation-specific. An optimization for one vector implementation could be a terribly bad mistake on another.
You can use insert to add multiple items at a time. Of course this is close to useless if you need to add another container into the code in order to bulk-insert.
If you have no idea how many items are incoming, you might as well let vector do its job and optimize HOW the items are added.
For example
void addToContainer(Object object) // pass by value. Possible copy
{
container.push_back(object); // copy
}
Those copies can be expensive. Get rid of them.
void addToContainer(Object && object) //no copy and can still handle temporaries
{
container.push_back(std::move(object)); // moves rather than copies
}
std::string is often very cheap to move.
This variant of addToContainer can be used with
addToContainer({ "Fira", "+5 ATTACK" });
addToContainer({ "Potion", "+10 HP" });
and might just migrate a pointer and as few book-keeping variables per string. They are temporaries, so no one cares if it will rips their guts out and throws away the corpses.
As for existing Objects
Object o{"Pizza pop", "+5 food"};
addToContainer(std::move(o));
If they are expendable, they get moved as well. If they aren't expendable...
void addToContainer(const Object & object) // no copy
{
container.push_back(object); // copy
}
You have an overload that does it the hard way.
Tossing this one out there
If you already have a number of items you know are going to be in the list, rather than appending them all one at a time, use an initialization list:
vector<Object> container{
{"Vorpal Cheese Grater", "Many little pieces"},
{"Holy Hand Grenade", "OMG Damage"}
};
push_back can be extremely expensive, but as with everything, it depends on the context. Take for example this terrible code:
std::vector<float> slow_func(const float* ptr)
{
std::vector<float> v;
for(size_t i = 0; i < 256; ++i)
v.push_back(ptr[i]);
return v;
}
each call to push_back has to do the following:
Check to see if there is enough space in the vector
If not, allocate new memory, and copy the old values into the new vector
copy the new item to the end of the vector
increment end
Now there are two big problems here wrt performance. Firstly each push_back operation depends upon the previous operation (since the previous operation modified end, and possibly the entire contents of the array if it had to be resized). This pretty much destroys any vectorisation possibilities in the code. Take a look here:
https://godbolt.org/z/RU2tM0
The func that uses push_back does not make for very pretty asm. It's effectively hamstrung into being forced to copy a single float at a time. Now if you compare that to an alternative approach where you resize first, and then assign; the compiler just replaces the whole lot with a call to new, and a call to memcpy. This will be a few orders of magnitude faster than the previous method.
std::vector<float> fast_func(const float* ptr)
{
std::vector<float> v(256);
for(size_t i = 0; i < 256; ++i)
v[i] = ptr[i];
return v;
}
BUT, and it's a big but, the relative performance of push_back very much depends on whether the items in the array can be trivially copied (or moved). If you example you do something silly like:
struct Vec3 {
float x = 0;
float y = 0;
float z = 0;
};
Well now when we did this:
std::vector<Vec3> v(256);
The compiler will allocate memory, but also be forced to set all the values to zero (which is pointless if you are about to overwrite them again!). The obvious way around this is to use a different constructor:
std::vector<Vec3> v(ptr, ptr + 256);
So really, only use push_back (well, really you should prefer emplace_back in most cases) when either:
additional elements are added to your vector occasionally
or, The objects you are adding are complex to construct (in which case, use emplace_back!)
without any other requirements, unfortunately this is the most efficient:
void addToContainer(Object) { }
to answer the rest of your question. In general push_back will just add to the end of the allocated vector O(1), but will need to grow the vector on occasion, which can be amortized out but is O(N)
also, it would likely be more efficient not to use string, but to keep char * although memory management might be tricky unless it is always a literal being added
I am trying to do a product operand on the values inside of a vector. It is a huge mess of code.. I have posted it previously but no one was able to help. I just wanna confirm which is the correct way to do a single part of it. I currently have:
vector<double> taylorNumerator;
for(a = 0; a <= (constant); a++) {
double Number = equation involving a to get numerous values;
taylorNumerator.push_back(Number);
for(b = 0; b <= (constant); b++) {
double NewNumber *= taylorNumerator[b];
}
This is what I have as a snapshot, it is very short from what I actually have. Someone told me it is better to do vector.at(index) instead. Which is the correct or best way to accomplish this? If you so desire I can paste all of the code, it works but the values I get are wrong.
When possible, you should probably avoid using indexes at all. Your options are:
A range-based for loop:
for (auto numerator : taylorNumerators) { ... }
An iterator-based loop:
for (auto it = taylorNumerators.begin(); it != taylorNuemrators.end(); ++it) { ... }
A standard algorithm, perhaps with a lambda:
#include <algorithm>
std::for_each(taylorNumerators, [&](double numerator) { ... });
In particular, note that some algorithms let you specify a number of iterations, like std::generate_n, so you can create exactly n items without counting to n yourself.
If you need the index in the calculation, then it can be appropriate to use a traditional for loop. You have to watch for a couple pitfalls: std::vector<T>::size() returns a std::vector<T>::size_type which is typically identical to std::size_type, which is (1) unsigned and (2) quite possibly larger than an int.
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) { ... }
Your calculations probably deal with doubles or some numerical type other than std::size_t, so you have to consider the best way to convert it. Many programmers would rely on implicit conversions, but that can be dangerous unless you know the conversion rules very well. I'd generally start by doing a static cast of the index to the type I actually need. For example:
for (std::size_t i = 0; i != taylorNumerators.size(); ++i) {
const auto x = static_cast<double>(i);
/* calculation involving x */
}
In C++, it's probably far more common to make sure the index is in range and then use operator[] rather than to use at(). Many projects disable exceptions, so the safety guarantee of at() wouldn't really be available. And, if you can check the range once yourself, then it'll be faster to use operator[] than to rely on the range-check built into at() on each index operation.
What you have is fine. Modern compilers can optimize the heck out of the above such that the code is just as fast as the equivalent C code of accessing items direclty.
The only optimization for using vector I recommend is to invoke taylorNumerator.reserve(constant) to allocate the needed storage upfront instead of the vector resizing itself as new items are added.
About the only worthy optimization after that is to not use vector at all and just use a static array - especially if constant is small enough that it doesn't blow up the stack (or binary size if global).
double taylorNumerator[constant];
In my C++ code,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
So I combined these two vectors into a single one,
void combineVectors(vector<string>& strVector, vector <int>& intVector, vector < pair <string, int>>& pairVector)
{
for (int i = 0; i < strVector.size() || i < intVector.size(); ++i )
{
pairVector.push_back(pair<string, int> (strVector.at(i), intVector.at(i)));
}
}
Now this function is called like this,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
vector < pair <string, int>> pairVector
combineVectors(strVector, intVector, pairVector);
//rest of the implementation
The combineVectors function uses a loop to add the elements of other 2 vectors to the vector pair. I doubt this is a efficient way as this function gets called hundrands of times passing different data. This might cause a performance issue because everytime it goes through the loop.
My goal is to copy both the vectors in "one go" to the vector pair. i.e., without using a loop. Am not sure whether that's even possible.
Is there a better way of achieving this without compromising the performance?
You have clarified that the arrays will always be of equal size. That's a prerequisite condition.
So, your situation is as follows. You have vector A over here, and vector B over there. You have no guarantees whether the actual memory that vector A uses and the actual memory that vector B uses are next to each other. They could be anywhere.
Now you're combining the two vectors into a third vector, C. Again, no guarantees where vector C's memory is.
So, you have really very little to work with, in terms of optimizations. You have no additional guarantees whatsoever. This is pretty much fundamental: you have two chunks of bytes, and those two chunks need to be copied somewhere else. That's it. That's what has to be done, that's what it all comes down to, and there is no other way to get it done, other than doing exactly that.
But there is one thing that can be done to make things a little bit faster. A vector will typically allocate memory for its values in incremental steps, reserving some extra space, initially, and as values get added to the vector, one by one, and eventually reach the vector's reserved size, the vector has to now grab a new larger block of memory, copy everything in the vector to the larger memory block, then delete the older block, and only then add the next value to the vector. Then the cycle begins again.
But you know, in advance, how many values you are about to add to the vector, so you simply instruct the vector to reserve() enough size in advance, so it doesn't have to repeatedly grow itself, as you add values to it. Before your existing for loop, simply:
pairVector.reserve(pairVector.size()+strVector.size());
Now, the for loop will proceed and insert new values into pairVector which is guaranteed to have enough space.
A couple of other things are possible. Since you have stated that both vectors will always have the same size, you only need to check the size of one of them:
for (int i = 0; i < strVector.size(); ++i )
Next step: at() performs bounds checking. This loop ensures that i will never be out of bounds, so at()'s bound checking is also some overhead you can get rid of safely:
pairVector.push_back(pair<string, int> (strVector[i], intVector[i]));
Next: with a modern C++ compiler, the compiler should be able to optimize away, automatically, several redundant temporaries, and temporary copies here. It's possible you may need to help the compiler, a little bit, and use emplace_back() instead of push_back() (assuming C++11, or later):
pairVector.emplace_back(strVector[i], intVector[i]);
Going back to the loop condition, strVector.size() gets evaluated on each iteration of the loop. It's very likely that a modern C++ compiler will optimize it away, but just in case you can also help your compiler check the vector's size() only once:
int i=strVector.size();
for (int i = 0; i < n; ++i )
This is really a stretch, but it might eke out a few extra quantums of execution time. And that pretty much all obvious optimizations here. Realistically, the most to be gained here is by using reserve(). The other optimizations might help things a little bit more, but it all boils down to moving a certain number of bytes from one area in memory to another area. There aren't really special ways of doing that, that's faster than other ways.
We can use std:generate() to achieve this:
#include <bits/stdc++.h>
using namespace std;
vector <string> strVector{ "hello", "world" };
vector <int> intVector{ 2, 3 };
pair<string, int> f()
{
static int i = -1;
++i;
return make_pair(strVector[i], intVector[i]);
}
int main() {
int min_Size = min(strVector.size(), intVector.size());
vector< pair<string,int> > pairVector(min_Size);
generate(pairVector.begin(), pairVector.end(), f);
for( int i = 0 ; i < 2 ; i++ )
cout << pairVector[i].first <<" " << pairVector[i].second << endl;
}
I'll try and summarize what you want with some possible answers depending on your situation. You say you want a new vector that is essentially a zipped version of two other vectors which contain two heterogeneous types. Where you can access the two types as some sort of pair?
If you want to make this more efficient, you need to think about what you are using the new vector for? I can see three scenarios with what you are doing.
The new vector is a copy of your data so you can do stuff with it without affecting the original vectors. (ei you still need the original two vectors)
The new vector is now the storage mechanism for your data. (ei you
no longer need the original two vectors)
You are simply coupling the vectors together to make use and representation easier. (ei where they are stored doesn't actually matter)
1) Not much you can do aside from copying the data into your new vector. Explained more in Sam Varshavchik's answer.
3) You do something like Shakil's answer or here or some type of customized iterator.
2) Here you make some optimisations here where you do zero coping of the data with the use of a wrapper class. Note: A wrapper class works if you don't need to use the actual std::vector < std::pair > class. You can make a class where you move the data into it and create access operators for it. If you can do this, it also allows you to decompose the wrapper back into the original two vectors without copying. Something like this might suffice.
class StringIntContainer {
public:
StringIntContaint(std::vector<std::string>& _string_vec, std::vector<int>& _int_vec)
: string_vec_(std::move(_string_vec)), int_vec_(std::move(_int_vec))
{
assert(string_vec_.size() == int_vec_.size());
}
std::pair<std::string, int> operator[] (std::size_t _i) const
{
return std::make_pair(string_vec_[_i], int_vec_[_i]);
}
/* You may want methods that return reference to data so you can edit it*/
std::pair<std::vector<std::string>, std::vector<int>> Decompose()
{
return std::make_pair(std::move(string_vec_), std::move(int_vec_[_i])));
}
private:
std::vector<std::string> _string_vec_;
std::vector<int> int_vec_;
};
1) I want to pass a the pointer of a QVector to a function and then do things with it. I tried this:
void MainWindow::createLinearVector(QVector<float> *vector, float min, float max )
{
float elementDiff=(max-min)/(vector->size()-1);
if(max>min) min -= elementDiff;
else min += elementDiff;
for(int i=0; i< vector->size()+1 ; i++ )
{
min += elementDiff;
*(vector+i) = min; //Problematic line
}
}
However the compiler gives me "no match for operator =" for the *(vector+i) = min; line. What could be the best way to perform actions like this on a QVector?
2) The function is supposed to linearly distribute values on the vector for a plot, in a way the matlab : operator works, for instance vector(a:b:c). What is the simpliest and best way to perform such things in Qt?
EDIT:
With help from here the initial problem is solved. :)
I also improved the metod in itself. The precision could be improved a lot by using linear interpolation instead of multiple additions like above. With multiple addition an error is accumulating, which is eliminated in large part by linear interpolation.
Btw, the if statement in the first function was unecessary and possible to remove by just rearranging stuff a little bit even in the multiple addition method.
void MainWindow::createLinearVector(QVector<double> &vector, double min, double max )
{
double range = max-min;
double n = vector.size();
vector[0]=min;
for(int i=1; i< n ; i++ )
{
vector[i] = min+ i/(n-1)*range;
}
}
I considered using some enchanced loop for this, but would it be more practical?
With for instance a foreach loop I would still have to increment some variable for the interpolation right? And also make a conditional for skipping the first element?
I want to place a float a certain place in the QVector.
Then use this:
(*vector)[i] = min; //Problematic line
A vector is a pointer to a QVector, *vector will be a QVector, which can be indiced with [i] like any QVector. However, due to precedence, one needs parentheses to get the order of operations right.
I think, first u need use the Mutable iterator for this stuff: Qt doc link
Something like this:
QMutableVectorIterator<float> i(vector);
i.toBack();
while (i.hasPrevious())
qDebug() << i.{your code}
Right, so it does not make much sense to use a QVector pointer in here. These are the reasons for that:
Using a reference for the method parameter should be more C++'ish if the implicit sharing is not fast enough for you.
Although, most of the cases you would not even need a reference when just passing arguments around without getting the result back in the same argument (i.e. output argument). That is because *QVector is implicitly shared and the copy only happens for the write as per documentation. Luckily, the syntax will be the same for the calling and internal implementation of the method in both cases, so it is easy to change from one to another.
Using smart pointers is preferable instead of raw pointers, but here both are unnecessarily complex solutions in my opinion.
So, I would suggest to refactor your code into this:
void MainWindow::createLinearVector(QVector<float> &vector, float min, float max)
{
float elementDiff = (max-min) / (vector.size()-1);
min += ((max>min) ? (-elementDiff) : elementDiff)
foreach (float f, vector) {
min += elementDiff;
f = min;
}
}
Note that I fixed up the following things in your code:
Reference type parameter as opposed to pointer
"->" member resolution to "." respectively
Ternary operation instead of the unnatural if/else in this case
Qt's foreach instead of low-level indexing in which case your original point becomes moot
This is then how you would invoke the method from the caller:
createLinearVector(vector, fmin, fmax);
What's the fastest way to "clear" a large STL container? In my application, I need to deal with large size std::map, e.g., 10000 elements.
I have tested the following 3 methods to clear a std::map.
Create a new container every time I need it.
Calling map::clear() method.
Calling map::swap() method.
It seems that ::swap() gives the best result. Can anyone explain why this is the case, please? Is it safe to say that using map::swap() method is the proper way to "clear" a std::map? Is it the same for other STL containers, e.g., set, vector, list, etc.
m_timer_start = boost::posix_time::microsec_clock::local_time();
// test_map.clear();
test_map.swap(test_map2);
for (int i = 0; i< 30000; i++){
test_map.insert(std::pair<int, int>(i, i));
}
// std::map<int, int> test_map_new;
// for (int i = 0; i< 30000; i++){
// test_map_new.insert(std::pair<int, int>(i, i));
// }
m_timer_end = boost::posix_time::microsec_clock::local_time();
std::cout << timer_diff(m_timer_start, m_timer_end).fractional_seconds() << std::endl; // microsecond
You aren't properly testing the swap case. You need for the swap-to map to be destroyed in order to account for all of the time. Try one of these:
{ std::map<something, something_else> test_map2;
test_map.swap(test_map2);
} // test_map2 gets destroyed at the closing brace.
or
// temporary gets destroyed at the semi-colon
std::map<int, int>().swap(test_map);
Are you asking this because you're having a performance problem and you have identified that your program is spending too much time clearing your maps? If you haven't done this then just use map::clear() or create new local variables each time, whichever is most natural and direct for your program. The swap trick is an optimization and there's little point in wasting time optimizing unless you're certain you need to, based on experience.
If you have identified a performance issue then you've already got the tool to determine which of your methods best addresses it.