Faster to swap or assign a vector of strings?

Faster to swap or assign a vector of strings? - c++

I have a class with a vector of strings and a function that assigns to that vector. I am changing my function to only assign to the vector if it's successful. To do that I use a temporary vector of strings in the function and then if the function is successful I assign to the vector of strings in the class.
For example:
class test
{
vector<string> v;
void Function()
{
vector<string> temp;
v = temp; // Is this better?
v.swap( temp ); // Or instead is this better?
}
};

In C++11, move it:
v = std::move(temp);
In ancient dialects, swapping would be better than copy-assigning (assuming the vector isn't empty as it is in your example).
Moving or swapping just needs to modify a few pointers, while copying requires memory allocation and other expensive shenanigans.

From the complexity point of view std::swap algorithm should be preferred.
vector<string> temp;
v = temp; // complexity is linear in the size of the temp
v.swap( temp ); // complexity is constant

Related

How to initialize a vector of distinct dynamically allocated addresses

Is it possible to create a std::vector<T*> vec; during initialization, such that each element of vec stores a distinct address on the heap?
Simply doing
int N = 10;
std::vector<T*> vec(N, new T)
makes all elements of vec store the same address on the heap. Of course, I could simply just do
int N = 10;
std::vector<T*> vec(N);
std::for_each(vec.begin(), vec.end(), [](auto &ptr){
ptr = new int;
});
Is there any way to do it from within the constructor call?

Constructors that fill values into the vector all create duplicates of a single value, so they wont work in this case.
You can do a little better than std::for_each though. Since you want each element in the vector filled in with the result of a function, std::generate (or std::generate_n) is clearly a better fit:
std::vector<T *> vec(N);
std::generate(vec.begin(), vec.end(), [] { return new int; });
That said, a vector of raw pointers is most likely a mistake, so I'd recommend exploring other options.

Erase by value in a vector of shared pointers

I want to erase by value from a vector of shared ptr of string (i.e vector<shared_ptr<string>>) . Is there any efficient way of doing this instead of iterating the complete vector and then erasing from the iterator positions.
#include <bits/stdc++.h>
using namespace std;
int main()
{
vector<shared_ptr<string>> v;
v.push_back(make_shared<string>("aaa"));
int j = 0,ind;
for(auto i : v) {
if((*i)=="aaa"){
ind = j;
}
j++;
}
v.erase(v.begin()+ind);
}
Also I dont want to use memory for a map ( value vs address).

Try like that (Erase-Remove Idiom):
string s = "aaa";
auto cmp = [s](const shared_ptr<string> &p) { return s == *p; };
v.erase(std::remove_if(v.begin(), v.end(), cmp), v.end());

There is no better way then O(N) - you have to find the object in a vector, and you have to iterate the vector once to find it. Does not really matter if it is a pointer or any object.
The only way to do better is to use a different data structure, which provides O(1) finding/removal. A set is the first thing that comes to mind, but that would indicate your pointers are unique. A second option would be a map, such that multiple pointers pointing to the same value exist at the same hash key.
If you do not want to use a different structure, then you are out of luck. You could have an additional structure hashing the pointers, if you want to retain the vector but also have O(1) access.
For example if you do use a set, and define a proper key - hasher or key_equal. probably hasher is enough defined as the hash for *elementInSet, so each pointer must point to a distinct string for example:
struct myPtrHash {
size_t operator()(const std::shared_ptr<std::string>& p) const {
//Maybe we want to add checks/throw a more meaningful error if p is invalid?
return std::hash<std::string>()(*p);
}
};
such that your set is:
std::unordered_set<std::shared_ptr<std::string>,myPtrHash > pointerSet;
Then erasing would be O(1) simply as:
std::shared_ptr<std::string> toErase = make_shared("aaa");
pointerSet.erase(toErase)
That said, if you must use a vector a more idomatic way to do this is to use remove_if instead of iterating yourself - this will not improve time complexity though, just better practice.

Don't include bits/stdc++.h, and since you're iterating through the hole vector, you should be using std::for_each with a lambda.

Optimization of a C++ code (that uses UnorderedMap and Vector)

I am trying to optimize some part of a C++ code that is taking a long time (the following part of the code takes about 19 seconds for X amount of data, and I am trying to finish the whole process in less than 5 seconds for the same amount of data - based on some benchmarks that I have). I have a function "add" that I have written and copied the code here. I will try to explain as much as possible that I think is needed to understand the code. Please let me know if I have missed something.
The following function add is called X times for X amount of data entries.
void HashTable::add(PointObject vector) // PointObject is a user-defined object
{
int combinedHash = hash(vector); // the function "hash" takes less than 1 second for X amount of data
// hashTableMap is an unordered_map<int, std::vector<PointObject>>
if (hashTableMap.count(combinedHash) == 0)
{
// if the hashmap does not contain the combinedHash key, then
// add the key and a new vector
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
else
{
// otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
std::vector<PointObject> pointVectorList = it->second;
pointVectorList.push_back(vector);
it->second = pointVectorList;
}
}
}

You are doing a lot of useless operations... if I understand correctly, a simplified form could be simply:
void HashTable::add(const PointObject& vector) {
hashTableMap[hash(vector)].push_back(vector);
}
This works because
A map when accessed using operator[] will create a default-initialized value if it's not already present in the map
The value (an std::vector) is returned by reference so you can directly push_back the incoming point to it. This std::vector will be either a newly inserted one or a previously existing one if the key was already in the map.
Note also that, depending on the size of PointObject and other factors, it could be possibly more efficient to pass vector by value instead of by const PointObject&. This is the kind of micro optimization that however requires profiling to be performed sensibly.

Instead of calling hashTableMap.count(combinedHash) and hashTableMap.find(combinedHash), better just insert new element and check what insert() returned:
In versions (1) and (2), the function returns a pair object whose
first element is an iterator pointing either to the newly inserted
element in the container or to the element whose key is equivalent,
and a bool value indicating whether the element was successfully
inserted or not.
Moreover, do not pass objects by value, where you don't have to. Better pass it by pointer or by reference. This:
std::vector<PointObject> pointVectorList = it->second;
is inefficient since it will create an unnecessary copy of the vector.

This .count() is totally unecessary, you could simplify your function to:
void HashTable::add(PointObject vector)
{
int combinedHash = hash(vector);
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
std::vector<PointObject> pointVectorList = it->second;
pointVectorList.push_back(vector);
it->second = pointVectorList;
}
else
{
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
}
You are also performing copy operations everywhere. Copying an object is time consuming, avoid doing that. Also use references and pointers when possible:
void HashTable::add(PointObject& vector)
{
int combinedHash = hash(vector);
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
it->second.push_back(vector);
}
else
{
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
}
This code can probably be optimized further, but it would require knowing hash(), knowing the way hashTableMap works (by the way, why is it not a std::map?) and some experimentation.
If hashTableMap was a std::map<int, std::vector<pointVectorList>>, you could simplify your function to this:
void HashTable::add(PointObject& vector)
{
hashTableMap[hash(vector)].push_back(vector);
}
And if it was a std::map<int, std::vector<pointVectorList*>> (pointer) you can even avoid that last copy operation.

Without the if, try to insert an empty entry on the hash table:
auto ret = hashTableMap.insert(
std::make_pair(combinedHash, std::vector<PointObject>());
Either a new blank entry will be added, or the already present entry will be retrieved. In your case, you don't need to check which it the case, you just need to take the returned iterator and add the new element:
auto &pointVectorList = *ret.first;
pointVectorList.push_back(vector);

Assuming that PointObject is big and making copies of it is expensive, std::move is your friend here. You'll want to ensure that PointObject is move-aware (either don't define a destructor or copy operator, or provide a move-constructor and move-assignment operator yourself).
void HashTable::add(PointObject vector) // PointObject is a user-defined object
{
int combinedHash = hash(vector); // the function "hash" takes less than 1 second for X amount of data
// hashTableMap is an unordered_map<int, std::vector<PointObject>>
if (hashTableMap.count(combinedHash) == 0)
{
// if the hashmap does not contain the combinedHash key, then
// add the key and a new vector
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(std::move(vector));
hashTableMap.insert(std::make_pair(combinedHash, std::move(pointVectorList)));
}
else
{
// otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
std::vector<PointObject> pointVectorList = it->second;
pointVectorList.push_back(std::move(vector));
it->second = std::move(pointVectorList);
}
}
}

Using std::unordered_map doesn't seem appropriate here - you use the int from hash as the key (which presumably) is the hash of PointObject rather than PointObject itself. Essentially double hashing. And also if you need a PointObject in order to compute the map key then it's not really a key at all! Perhaps std::unordered_multiset would be a better choice?
First define the hash function form PointObject
namespace std
{
template<>
struct hash<PointObject> {
size_t operator()(const PointObject& p) const {
return ::hash(p);
}
};
}
Then something like
#include <unordered_set>
using HashTable = std::unordered_multiset<PointObject>;
int main()
{
HashTable table {};
PointObject a {};
table.insert(a);
table.emplace(/* whatever */);
return 0;
}

Your biggest problem is that you're copying the entire vector (and every element in that vector) twice in the else part:
std::vector<PointObject> pointVectorList = it->second; // first copy
pointVectorList.push_back(vector);
it->second = pointVectorList; // second copy
This means that every time you're adding an element to an existing vector you're copying that entire vector.
If you used a reference to that vector you'd do a lot better:
std::vector<PointObject> &pointVectorList = it->second;
pointVectorList.push_back(vector);
//it->second = pointVectorList; // don't need this anymore.
On a side note, in your unordered_map you're hashing your value to be your key.
You could use an unordered_set with your hash function instead.

Strings in Vectors. and placing them in order

So I am placing objects in a vector. I want to drop them in order as they are added. the basics of the object are
class myObj {
private:
string firstName;
string lastName;
public:
string getFirst;
string getLast;
}
I also have a vector of these objects
vector< myObj > myVect;
vector< myObj >::iterator myVectit = myVect.begin();
when I add a new object to the vector I want to find where it should be placed before inserting it. Can I search a vector by an object value and how? This is my first attempt
void addanObj (myObj & objtoAdd){
int lowwerB = lower_bound(
myVect.begin().getLast(), myVect.end().getLast(), objtoAdd.getLast()
);
int upperB = upper_bound(
myVect.begin().getLast(), myVect.end().getLast(), objtoAdd.getLast()
);
from there i plan to use lowwerB and upper B to determine where to insert the entry. what do I need to do to get this to work or what is a better method of tackling this challenge?
----Follow up----
the error I get when I attempt to compile
error C2440: 'initializing' : cannot convert from 'std::string' to 'int'
No user-defined-conversion operator available that can perform this conversion,
or the operator cannot be called
The compiler highlights both lower_bound and upper_bound. I would guess it is referring to where I am putting
objtoAdd.getLast()
-----More Follow up-----------------
THis is close to compiling but not quite. What should I expect to get from lower_bound and upper_bound? It doesnt match the iterator i defined and im not sure what I should expect.
void addMyObj(myObj myObjtoadd)
vector< myObj>::iterator tempLB;
vector< myObj>::iterator tempUB;
myVectit= theDex.begin();
tempLB = lower_bound(
myVect.begin()->getLast(), myVect.end()->getLast(), myObjtoadd.getLast()
);
tempUB = upper_bound(
myVect.begin()->getLast(), myVect.end()->getLast(), myObjtoadd.getLast()
);

Your calls to std::lower_bound and std::upper_bound are incorrect. The first two parameters must be iterators that define a range of elements to search and the returned values are also iterators.
Since these algorithms compare the container elements to the third parameter value you'll also need to provide correct operator< functions that compare an object's lastName and a std::string. I've added two different compare functions since std::lower_bound and std::upper_bound pass the parameters in opposite order.
I think I have the machinery correct in this code, it should be close enough for you to get the idea.
class myObj {
private:
std::string firstName;
std::string lastName;
public:
std::string getFirst() const { return firstName; }
std::string getLast() const { return lastName; }
};
bool operator<(const myObj &obj, const std::string &value) // used by lower_bound()
{
return obj.getLast() < value;
}
bool operator<(const std::string &value, const myObj &obj) // used by upper_bound()
{
return value < obj.getLast();
}
int main()
{
std::vector<myObj> myVect;
std::vector<myObj>::iterator tempLB, tempUB;
myObj objtoAdd;
tempLB = std::lower_bound(myVect.begin(), myVect.end(), objtoAdd.getLast());
tempUB = std::upper_bound(myVect.begin(), myVect.end(), objtoAdd.getLast());
}

So this is definitely not the best way to go. Here's why:
Vector Size
A default vector starts out with 0 elements, but capacity to hold some number; say 100. After you add the 101st element, it has to completely recreate the vector, copy over all the data, and then delete the old memory. This copying can become expensive, if done enough.
Inserting into a vector
This is going to be even more of a problem. Because a vector is just a contiguous block of memory with objects stored in insert order, say you have the below:
[xxxxxxxzzzzzzzz ]
if you want to add 'y', it belongs between x and z, right? this means you need to move all the z's over 1 place. But because you are reusing the same block of memory, you need to do it one at a time.
[xxxxxxxzzzzzzz z ]
[xxxxxxxzzzzzz zz ]
[xxxxxxxzzzzz zzz ]
...
[xxxxxxx zzzzzzzz ]
[xxxxxxxyzzzzzzzz ]
(the spaces are for clarity - previous value isn't explicitly cleared)
As you can see, this is a lot of steps to make room for your 'y', and will be very very slow for large data sets.
A better solution
As others have mentioned, std::set sounds like it's more appropriate for your needs. std::set will automatically order all inserted elements (using a tree data structure for much faster insertion), and allows you to find particular data members by last name also in log(n) time. It does this by using bool myObj::operator(const & _myObj) const to know how to sort the different objects. If you simply define this operator to compare this->lastName < _myObj.lastName, you can simply insert into the set much quicker.
Alternately, if you really really want to use vector: instead of sorting it as you go, just add all the items to the vector, and then perform std::sort to sort them after all the inserts are done. This will also complete in n log(n) time, but should be considerably faster than the current approach because of the vector insertion problem.

Best way to delete a std::unique_ptr from a vector with a raw pointer?

So I have a vector like so:
std::vector<std::unique_ptr<SomeClass>> myVector;
Then I have another vector which contains raw pointers of SomeClass:
std::vector<SomeClass*> myOtherVector;
If there is an element inside myOtherVector it will also be inside myVector, so I want to go through each element in myOtherVector and remove the same element from myVector. Then clear out the vector. This is what I came up with:
for(size_t i = 0; i < myOtherVector.size(); i++)
{
myVector.erase(std::remove(myVector.begin(), myVector.end(), myOtherVector[i]), myVector.end());
}
myOtherVector.clear();
This produces a compile time error because myVector holds unique pointers but I'm giving the remove() function a raw pointer. This is where I need help because I don't know what the proper way to solve this problem would be. I changed the line to:
myVector.erase(std::remove(myVector.begin(), myVector.end(), std::unique_ptr<SomeClass>(myOtherVector[i])), myVector.end());
Frist of all this is incorrect because now I have two std::unique_ptrs referencing the same object. The element inside myVector contains a reference and the construction of the unique pointer in the above line is another reference. And I don't even know if constructing a new pointer to get the same type is conceptually the correct way to go about doing this. So then I changed the unique pointers to shared pointers:
std::vector<std::shared_ptr<SomeClass>> myVector;
std::vector<SomeClass*> myOtherVector;
for(size_t i = 0; i < myOtherVector.size(); i++)
{
myVector.erase(std::remove(myVector.begin(), myVector.end(), std::shared_ptr<SomeClass>(myOtherVector[i])), myVector.end());
}
myOtherVector.clear();
When I ran the application the myVector.erase() line resulted in a runtime error which said "ApplicationName.exe has triggered a breakpoint." upon clicking continue I got a debug assertion failure.
So obviously I'm doing something wrong, but I don't know what. What is the correct way to erase a smart pointer from a vector with a raw pointer?

This is how I would do it. Performance could be improved, but as long as it won't prove to be a bottleneck for your application, I would not bother with that. The algorithm is simple and clear.
It uses remove_if to selectively remove from the first container (myVector) all the elements pointing to objects that are pointed to by elements of the second container (myOtherVector); then, it clears the second container. The predicate is implemented through a lambda function:
#include <vector>
#include <memory>
#include <algorithm>
struct SomeClass { /* ... */ };
int main()
{
std::vector<std::unique_ptr<SomeClass>> myVector;
std::vector<SomeClass*> myOtherVector;
myVector.erase(
std::remove_if( // Selectively remove elements in the second vector...
myVector.begin(),
myVector.end(),
[&] (std::unique_ptr<SomeClass> const& p)
{ // This predicate checks whether the element is contained
// in the second vector of pointers to be removed...
return std::find(
myOtherVector.cbegin(),
myOtherVector.cend(),
p.get()
) != myOtherVector.end();
}),
myVector.end()
);
myOtherVector.clear();
}

std::unique_ptr has a member function, get, that returns the owned pointer.
Consider the following:
std::sort(myOtherVector.begin(), myOtherVector.end());
myVector.erase(std::remove_if(myVector.begin(), myVector.end(),
[&](std::unique_ptr<SomeClass> const& p) -> bool
{
return std::binary_search(myOtherVector.begin(), myOtherVector.end(),
p.get());
}));
myOtherVector.clear();

If you cant simplify your problem, how about std::set_difference or one of its kin (http://www.cplusplus.com/reference/algorithm/set_difference/)?
You would need to specify a compare function to get() the raw pointer from the unique_ptr

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Faster to swap or assign a vector of strings? - c++

In C++11, move it: v = std::move(temp); In ancient dialects, swapping would be better than copy-assigning (assuming the vector isn't empty as it is in your example). Moving or swapping just needs to modify a few pointers, while copying requires memory allocation and other expensive shenanigans.

From the complexity point of view std::swap algorithm should be preferred. vector<string> temp; v = temp; // complexity is linear in the size of the temp v.swap( temp ); // complexity is constant

Related

How to initialize a vector of distinct dynamically allocated addresses

Erase by value in a vector of shared pointers

Optimization of a C++ code (that uses UnorderedMap and Vector)

Strings in Vectors. and placing them in order

Best way to delete a std::unique_ptr from a vector with a raw pointer?

Categories

Resources