Strings in Vectors. and placing them in order

Strings in Vectors. and placing them in order - c++

So I am placing objects in a vector. I want to drop them in order as they are added. the basics of the object are
class myObj {
private:
string firstName;
string lastName;
public:
string getFirst;
string getLast;
}
I also have a vector of these objects
vector< myObj > myVect;
vector< myObj >::iterator myVectit = myVect.begin();
when I add a new object to the vector I want to find where it should be placed before inserting it. Can I search a vector by an object value and how? This is my first attempt
void addanObj (myObj & objtoAdd){
int lowwerB = lower_bound(
myVect.begin().getLast(), myVect.end().getLast(), objtoAdd.getLast()
);
int upperB = upper_bound(
myVect.begin().getLast(), myVect.end().getLast(), objtoAdd.getLast()
);
from there i plan to use lowwerB and upper B to determine where to insert the entry. what do I need to do to get this to work or what is a better method of tackling this challenge?
----Follow up----
the error I get when I attempt to compile
error C2440: 'initializing' : cannot convert from 'std::string' to 'int'
No user-defined-conversion operator available that can perform this conversion,
or the operator cannot be called
The compiler highlights both lower_bound and upper_bound. I would guess it is referring to where I am putting
objtoAdd.getLast()
-----More Follow up-----------------
THis is close to compiling but not quite. What should I expect to get from lower_bound and upper_bound? It doesnt match the iterator i defined and im not sure what I should expect.
void addMyObj(myObj myObjtoadd)
vector< myObj>::iterator tempLB;
vector< myObj>::iterator tempUB;
myVectit= theDex.begin();
tempLB = lower_bound(
myVect.begin()->getLast(), myVect.end()->getLast(), myObjtoadd.getLast()
);
tempUB = upper_bound(
myVect.begin()->getLast(), myVect.end()->getLast(), myObjtoadd.getLast()
);

Your calls to std::lower_bound and std::upper_bound are incorrect. The first two parameters must be iterators that define a range of elements to search and the returned values are also iterators.
Since these algorithms compare the container elements to the third parameter value you'll also need to provide correct operator< functions that compare an object's lastName and a std::string. I've added two different compare functions since std::lower_bound and std::upper_bound pass the parameters in opposite order.
I think I have the machinery correct in this code, it should be close enough for you to get the idea.
class myObj {
private:
std::string firstName;
std::string lastName;
public:
std::string getFirst() const { return firstName; }
std::string getLast() const { return lastName; }
};
bool operator<(const myObj &obj, const std::string &value) // used by lower_bound()
{
return obj.getLast() < value;
}
bool operator<(const std::string &value, const myObj &obj) // used by upper_bound()
{
return value < obj.getLast();
}
int main()
{
std::vector<myObj> myVect;
std::vector<myObj>::iterator tempLB, tempUB;
myObj objtoAdd;
tempLB = std::lower_bound(myVect.begin(), myVect.end(), objtoAdd.getLast());
tempUB = std::upper_bound(myVect.begin(), myVect.end(), objtoAdd.getLast());
}

So this is definitely not the best way to go. Here's why:
Vector Size
A default vector starts out with 0 elements, but capacity to hold some number; say 100. After you add the 101st element, it has to completely recreate the vector, copy over all the data, and then delete the old memory. This copying can become expensive, if done enough.
Inserting into a vector
This is going to be even more of a problem. Because a vector is just a contiguous block of memory with objects stored in insert order, say you have the below:
[xxxxxxxzzzzzzzz ]
if you want to add 'y', it belongs between x and z, right? this means you need to move all the z's over 1 place. But because you are reusing the same block of memory, you need to do it one at a time.
[xxxxxxxzzzzzzz z ]
[xxxxxxxzzzzzz zz ]
[xxxxxxxzzzzz zzz ]
...
[xxxxxxx zzzzzzzz ]
[xxxxxxxyzzzzzzzz ]
(the spaces are for clarity - previous value isn't explicitly cleared)
As you can see, this is a lot of steps to make room for your 'y', and will be very very slow for large data sets.
A better solution
As others have mentioned, std::set sounds like it's more appropriate for your needs. std::set will automatically order all inserted elements (using a tree data structure for much faster insertion), and allows you to find particular data members by last name also in log(n) time. It does this by using bool myObj::operator(const & _myObj) const to know how to sort the different objects. If you simply define this operator to compare this->lastName < _myObj.lastName, you can simply insert into the set much quicker.
Alternately, if you really really want to use vector: instead of sorting it as you go, just add all the items to the vector, and then perform std::sort to sort them after all the inserts are done. This will also complete in n log(n) time, but should be considerably faster than the current approach because of the vector insertion problem.

Related

Erase by value in a vector of shared pointers

I want to erase by value from a vector of shared ptr of string (i.e vector<shared_ptr<string>>) . Is there any efficient way of doing this instead of iterating the complete vector and then erasing from the iterator positions.
#include <bits/stdc++.h>
using namespace std;
int main()
{
vector<shared_ptr<string>> v;
v.push_back(make_shared<string>("aaa"));
int j = 0,ind;
for(auto i : v) {
if((*i)=="aaa"){
ind = j;
}
j++;
}
v.erase(v.begin()+ind);
}
Also I dont want to use memory for a map ( value vs address).

Try like that (Erase-Remove Idiom):
string s = "aaa";
auto cmp = [s](const shared_ptr<string> &p) { return s == *p; };
v.erase(std::remove_if(v.begin(), v.end(), cmp), v.end());

There is no better way then O(N) - you have to find the object in a vector, and you have to iterate the vector once to find it. Does not really matter if it is a pointer or any object.
The only way to do better is to use a different data structure, which provides O(1) finding/removal. A set is the first thing that comes to mind, but that would indicate your pointers are unique. A second option would be a map, such that multiple pointers pointing to the same value exist at the same hash key.
If you do not want to use a different structure, then you are out of luck. You could have an additional structure hashing the pointers, if you want to retain the vector but also have O(1) access.
For example if you do use a set, and define a proper key - hasher or key_equal. probably hasher is enough defined as the hash for *elementInSet, so each pointer must point to a distinct string for example:
struct myPtrHash {
size_t operator()(const std::shared_ptr<std::string>& p) const {
//Maybe we want to add checks/throw a more meaningful error if p is invalid?
return std::hash<std::string>()(*p);
}
};
such that your set is:
std::unordered_set<std::shared_ptr<std::string>,myPtrHash > pointerSet;
Then erasing would be O(1) simply as:
std::shared_ptr<std::string> toErase = make_shared("aaa");
pointerSet.erase(toErase)
That said, if you must use a vector a more idomatic way to do this is to use remove_if instead of iterating yourself - this will not improve time complexity though, just better practice.

Don't include bits/stdc++.h, and since you're iterating through the hole vector, you should be using std::for_each with a lambda.

Replacing std::map with std::set and search by index

Say we have a map with larger objects and an index value. The index value is also part of the larger object.
What I would like to know is whether it is possible to replace the map with a set, extracting the index value.
It is fairly easy to create a set that sorts on a functor comparing two larger objects by extracting the index value.
Which leaves searching by index value, which is not supported by default in a set, I think.
I was thinking of using std::find_if, but I believe that searches linearly, ignoring the fact we have set.
Then I thought of using std::binary_search with a functor comparing the larger object and the value, but I believe that it doesn't work in this case as it wouldn't make use of the structure and would use traversal as it doesn't have a random access iterator. Is this correct? Or are there overloads which correctly handle this call on a set?
And then finally I was thinking of using a boost::containter::flat_set, as this has an underlying vector and thus presumably should be able to work well with std::binary_search?
But maybe there is an all together easier way to do this?
Before you answer just use a map where a map ought to be used - I am actually using a vector that is manually sorted (well std::lower_bound) and was thinking of replacing it with boost::containter::flat_set, but it doesn't seem to be easily possible to do so, so I might just stick with the vector.

C++14 will introduce the ability to lookup by a key that does not require the construction of the entire stored object. This can be used as follows:
#include <set>
#include <iostream>
struct StringRef {
StringRef(const std::string& s):x(&s[0]) { }
StringRef(const char *s):x(s) { std::cout << "works: " << s << std::endl; }
const char *x;
};
struct Object {
long long data;
std::size_t index;
};
struct ObjectIndexer {
ObjectIndexer(Object const& o) : index(o.index) {}
ObjectIndexer(std::size_t index) : index(index) {}
std::size_t index;
};
struct ObjComp {
bool operator()(ObjectIndexer a, ObjectIndexer b) const {
return a.index < b.index;
}
typedef void is_transparent; //Allows the comparison with non-Object types.
};
int main() {
std::set<Object, ObjComp> stuff;
stuff.insert(Object{135, 1});
std::cout << stuff.find(ObjectIndexer(1))->data << "\n";
}
More generally, these sorts of problems where there are multiple ways of indexing your data can be solved using Boost.MultiIndex.

Use boost::intrusive::set which can utilize the object's index value directly. It has a find(const KeyType & key, KeyValueCompare comp) function with logarithmic complexity. There are also other set types based on splay trees, AVL trees, scapegoat trees etc. which may perform better depending on your requirements.

If you add the following to your contained object type:
less than operator that only compares the object indices
equality operator that only compares the object indices
a constructor that takes your index type and initializes a dummy object with that value for the index
then you can pass your index type to find, lower_bound, equal_range, etc... and it will act the way you want. When you pass your index to the set's (or flat_set's) find methods it will construct a dummy object of the contained type to use for the comparisons.
Now if your object is really big, or expensive to construct, this might not be the way you want to go.

How do I build a vector<> of search results from a source vector<>?

Considering this example:
std::vector<Student> students;
//poplate students from a data source
std::vector<Student> searched(students.size());
auto s = std::copy_if(students.begin(), students.end(), searched.begin(),
[](const Student &stud) {
return stud.getFirstName().find("an") != std::string::npos;
});
searched.resize(std::distance(searched.begin(), s));
I have the following questions:
Is it ok to allocate memory for searched vector equals to the initial vector? There may be 500 not small objects and maybe none satisfying the search criteria? Is there any other way?
When copying to the searched vector it is called the copy assignment operator and ..obviously a copy is made. What if from those 500 objects 400 satisfying the search criteria?
Isn't just memory wasting?
I am a c++ noob so I may say something stupid. I don't see why to ever use vector<T> where T is a object. I would always use vector<shared_ptr<T>>. If T is a primitive type like an int i guess it's kinda straight forward to use vector<T>.
I considered this example because I think it's very general, you always have to pull some data out of a database or xml file or any other source. Would you ever have vector<T> in your data access layer or vector<shared_ptr<T>>?

Concerning your first question:
1 - Is it ok to allocate memory for searched vector equals to the initial vector? There may be 500 not small objects and maybe none satisfying the search criteria? Is there any other way?
You could use a back inserter iterator, using the std::back_inserter() standard function to create one for the searched vector:
#include <vector>
#include <string>
#include <algorithm>
#include <iterator> // This is the header to include for std::back_inserter()
// Just a dummy definition of your Student class,
// to make this example compile...
struct Student
{
std::string getFirstName() const { return "hello"; }
};
int main()
{
std::vector<Student> students;
std::vector<Student> searched;
// ^^^^^^^^^
// Watch out: no parentheses here, or you will be
// declaring a function accepting no arguments and
// returning a std::vector<Student>
auto s = std::copy_if(
students.begin(),
students.end(),
std::back_inserter(searched),
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Returns an insert iterator
[] (const Student &stud)
{
return stud.getFirstName().find("an") != std::string::npos;
});
}
Concering your second question:
2 - When copying to the searched vector it is called the copy assignment operator and ..obviously a copy is made. What if from those 500 objects 400 satisfying the search criteria? Isn't just memory wasting?
Well, if you have no statistical information on the selectivity of your predicate, then there is not much you can do about it. Of course, if your purpose is to process somehow all those students for which a certain predicate is true, than you should use std::for_each() on the source vector rather than create a separate vector:
std::for_each(students.begin(), students.end(), [] (const Student &stud)
{
if (stud.getFirstName().find("an") != std::string::npos)
{
// ...
}
});
However, whether this approach satisfies your requirements depends on your particular application.
I don't see why to ever use vector<T> where T is a object. I would always use vector<shared_ptr<T>>.
Whether or not to use (smart) pointers rather than values depends on whether or not you need reference semantics (apart from possible performance considerations about copying and moving those objects around). From the information you provided, it is not clear whether this is the case, so it may or may not be a good idea.

What are you going to do with all those students?
Just do that instead:
for(Student& student: students) {
if(student.firstNameMatches("an")) {
//.. do something
}
}

std::sort to sort an array and a list of index?

I have a function that takes two vectors of the same size as parameters :
void mysort(std::vector<double>& data, std::vector<unsigned int>& index)
{
// For example :
// The data vector contains : 9.8 1.2 10.5 -4.3
// The index vector contains : 0 1 2 3
// The goal is to obtain for the data : -4.3 1.2 9.8 10.5
// The goal is to obtain for the index : 3 1 0 2
// Using std::sort and minimizing copies
}
How to solve that problem minimizing the number of required copies ?
An obvious way would be to make a single vector of std::pair<double, unsigned int> and specify the comparator by [](std::pair<double, unsigned int> x, std::pair<double, unsigned int> y){return x.first < y.first;} and then to copy the results in the two original vectors but it would not be efficient.
Note : the signature of the function is fixed, and I cannot pass a single vector of std::pair.

Inside the function, make a vector positions = [0,1,2,3...]
Sort positions with the comparator (int x, int y){return data[x]<data[y];}.
Then iterate over positions , doing result.push_back(index[*it]);
This assumes the values in index can be arbitrary. If it is guaranteed to already be [0,1,2..] as in your example, then you don't to make the positions array, just use index in it's place and skip the last copy.

http://www.boost.org/doc/libs/1_52_0/libs/iterator/doc/index.html#iterator-facade-and-adaptor
Write a iterator over std::pair<double&, signed int&> that actually wraps a pair of iterators into each vector. The only tricky part is making sure that std::sort realizes that the result is a random access iterator.
If you can't use boost, just write the equivalent yourself.
Before doing this, determine if it is worth your bother. A zip, sort and unzip is easier to write, and programmer time can be exchanged for performance in lots of spots: until you konw where it is optimally spent, maybe you should just do a good-enough job and then benchmark where you need to speed things up.

You can use a custom iterator class, which iterates over both vectors in parallel. Its internal members would consist of
Two references (or pointers), one for each vector
An index indicating the current position
The value type of the iterator should be a pair<double, unsigned>. This is because std::sort will not only swap items, but in some cases also temporarily store single values. I wrote more details about this in section 3 of this question.
The reference type has to be some class which again holds references to both vectors and a current index. So you might make the reference type the same as the iterator type, if you are careful. The operator= of the reference type must allow assignment from the value type. And the swap function should be specialized for this reference, to allow swapping such list items in place, by swapping for both lists separately.

You can use a functor class to hold a reference to the value array and use it as the comparator to sort the index array. Then copy the values to a new value array and swap the contents.
struct Comparator
{
Comparator(const std::vector<double> & data) : m_data(data) {}
bool operator()(int left, int right) const { return data[left] < data[right]; }
const std::vector<double> & m_data;
};
void mysort(std::vector<double>& data, std::vector<unsigned int>& index)
{
std::sort(index.begin(), index.end(), Comparator(data));
std::vector<double> result;
result.reserve(data.size());
for (std::vector<int>::iterator it = index.begin(), e = index.end(); it != e; ++it)
result.push_back(data[*it]);
data.swap(result);
}

This should do it:
std::sort(index.begin(), index.end(), [&data](unsigned i1, unsigned i2)->bool
{ return data[i1]<data[i2]; });
std::sort(data.begin(), data.end());

std::map keys in C++

I have a requirement to create two different maps in C++. The Key is of type CHAR* and the Value is a pointer to a struct. I am filling 2 maps with these pairs, in separate iterations. After creating both maps I need find all such instances in which the value of the string referenced by the CHAR* are same.
For this I am using the following code :
typedef struct _STRUCTTYPE
{
..
} STRUCTTYPE, *PSTRUCTTYPE;
typedef pair <CHAR *,PSTRUCTTYPE> kvpair;
..
CHAR *xyz;
PSTRUCTTYPE abc;
// after filling the information;
Map.insert (kvpair(xyz,abc));
// the above is repeated x times for the first map, and y times for the second map.
// after both are filled out;
std::map<CHAR *, PSTRUCTTYPE>::iterator Iter,findIter;
for (Iter=iteratedMap->begin();Iter!=iteratedMap->end();mapIterator++)
{
char *key = Iter->first;
printf("%s\n",key);
findIter=otherMap->find(key);
//printf("%u",findIter->second);
if (findIter!=otherMap->end())
{
printf("Match!\n");
}
}
The above code does not show any match, although the list of keys in both maps show obvious matches. My understanding is that the equals operator for CHAR * just equates the memory address of the pointers.
My question is, what should i do to alter the equals operator for this type of key or could I use a different datatype for the string?

My understanding is that the equals operator for CHAR* just equates the memory address of the pointers.
Your understanding is correct.
The easiest thing to do would be to use std::string as the key. That way you get comparisons for the actual string value working without much effort:
std::map<std::string, PSTRUCTTYPE> m;
PSTRUCTTYPE s = bar();
m.insert(std::make_pair("foo", s));
if(m.find("foo") != m.end()) {
// works now
}
Note that you might leak memory for your structs if you don't always delete them manually. If you can't store by value, consider using smart pointers instead.
Depending on your usecase, you don't have to neccessarily store pointers to the structs:
std::map<std::string, STRUCTTYPE> m;
m.insert(std::make_pair("foo", STRUCTTYPE(whatever)));
A final note: typedefing structs the way you are doing it is a C-ism, in C++ the following is sufficient:
typedef struct STRUCTTYPE {
// ...
} *PSTRUCTTYPE;

If you use std::string instead of char * there are more convenient comparison functions you can use. Also, instead of writing your own key matching code, you can use the STL set_intersection algorithm (see here for more details) to find the shared elements in two sorted containers (std::map is of course sorted). Here is an example
typedef map<std::string, STRUCTTYPE *> ExampleMap;
ExampleMap inputMap1, inputMap2, matchedMap;
// Insert elements to input maps
inputMap1.insert(...);
// Put common elements of inputMap1 and inputMap2 into matchedMap
std::set_intersection(inputMap1.begin(), inputMap1.end(), inputMap2.begin(), inputMap2.end(), matchedMap.begin());
for(ExampleMap::iterator iter = matchedMap.begin(); iter != matchedMap.end(); ++iter)
{
// Do things with matched elements
std::cout << iter->first << endl;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Strings in Vectors. and placing them in order - c++

Related

Erase by value in a vector of shared pointers

Replacing std::map with std::set and search by index

How do I build a vector<> of search results from a source vector<>?

std::sort to sort an array and a list of index?

std::map keys in C++

Categories

Resources