Remove duplicate strings in string vector - c++

I have the code, listed below, which I am trying to get to remove any duplicate football team names from a string vector. However, it is only working sometimes, it will remove duplicate names for some of the teams; but then for others there will be multiple occurrences of the same team name in the final array.
For example it would print:
aresnal
wigan
villa
liverpool
villa
Notice there are two 'villa' names, could anyone give me a suggestion?
The 'finalLeague' is the array which is storing all of the names, and is the array which needs the duplicates removing out of.
for (int i = 0;i < finalLeague.size();i++)
{
string temp = finalLeague[i];
int h = i + 1;
for (int j = i+1;j < finalLeague.size();j++)
{
if (finalLeague[j] == finalLeague[i])
{
finalLeague.erase(finalLeague.begin()+j);
}
}
}

Sure, you can use a combination of std::sort, std::unique and std::vector::erase:
std::sort(finalLeague.begin(), finalLeague.end());
auto it = std::unique(finalLeague.begin(), finalLeague.end());
finalLeague.erase(it, finalLeague.end());
Alternatively, use a container that does not accept duplicates in the first place:
std::set<std::string> finalLeague; // BST, C++03 and C++11
std::unordered_set<std::string> finalLeague; // hash table, C++11

This can also be done using a hashmap. Using #include <unordered_map> will let you use it. Note that you might have to use C++ 11 for it. Read about unordered maps here.
All you need to do is check whether the string has occurred before or not and keep pushing unique strings into a new vector.
USP of this method is that it needs minimal amount of code. Just one loop would do the trick.

you should use std::unique
std::vector<std::string> vec;
// filling vector
// ....
std::vector<std::string>::iterator it;
it = std::unique (vec.begin(), vec.end());
vec.resize(std::distance(vec.begin(),it));
#edit: as #Gorpik said, vector must be sorted before use std::unique, otherwise only equal consecutive elements will be deleted.

Related

Best way to group string members of object in a vector

I am trying to store a vector of objects and sort them by a string member possessed by each object. It doesn't need to be sorted alphabetically, it only needs to group every object with an identical string together in the vector.
IE reading through the vector and outputting the strings from beginning to end should return something like:
string_bulletSprite
string_bulletSprite
string_bulletSprite
string_playerSprite
string_enemySprite
string_enemySprite
But should NEVER return something like:
string_bulletSprite
string_playerSprite
string_bulletSprite
[etc.]
Currently I am using std:sort and a custom comparison function:
std::vector<GameObject*> worldVector;
[...]
std::sort(worldVector.begin(), worldVector.end(), compString);
And the comparison function used in the std::sort looks like this:
bool compString(GameObject* a, GameObject* b)
{
return a->getSpriteNameAndPath() < b->getSpriteNameAndPath();
}
getSpriteNameAndPath() is a simple accessor which returns a normal string.
This seems to work fine. I've stress tested this a fair bit and it seems to always group things together the way I wanted.
My question is, is this the ideal or most logical/efficient way of accomplishing the stated goal? I get the impression Sort isn't quite meant to be used this way and I'm wondering if there's a better way to do this if all I want to do is group but don't care about doing so in alphabetic order.
Or is this fine?
If you have lots of equivalent elements in your range, then std::sort is less efficient than manually sorting the elements.
You can do this by shifting the minimum elements to the beginning of the range, and then repeating this process on the remaining non-minimum elements
// given some range v
auto b = std::begin(v); // keeps track of remaining elements
while (b != std::end(v)) // while there's elements to be arranged
{
auto min = *std::min_element(b, std::end(v)); // find the minimum
// move elements matching that to the front
// and simultaneously update the remaining range
b = std::partition(b, std::end(v),
[=](auto const & i) {
return i == min;
});
}
Of course, a custom comparator can be passed to min_element, and the lambda in partition can be modified if equivalence is defined some other way.
Note that if you have very few equivalent elements, this method is much less efficient than using std::sort.
Here's a demo with a range of ints.
I hope I understood your question correctly, if so, I will give you a little example of std::map which is great for grouping things by keys, which will most probably be a std::string.
Please take a look:
class Sprite
{
public:
Sprite(/* args */)
{
}
~Sprite()
{
}
};
int main(int argc, char ** argv){
std::map <std::string, std::map<std::string, Sprite>> sprites;
std::map <std::string, Sprite> spaceships;
spaceships.insert(std::make_pair("executor", Sprite()));
spaceships.insert(std::make_pair("millennium Falcon", Sprite()));
spaceships.insert(std::make_pair("death star", Sprite()));
sprites.insert(std::make_pair("spaceships",spaceships));
std::cout << sprites["spaceships"]["executor"].~member_variable_or_function~() << std::endl;
return 0;
}
Seems like Functor or Lambda is the way to go for this particular program, but I realized some time after posting that I could just create an ID for the images and sort those instead of strings. Thanks for the help though, everyone!

How to remove duplicates from a vector whose numbers might be in different positions?

How do you remove elements from a vector of vectors that are identical to another vector but whose elements are not in the same indices?
For example:
std::vector<vector<int>> vectA = {{1,3,4}. {1,2,3}, {3,2,1};
I want it so that {3,2,1} is removed from vectA and it becomes:
vectA = {{1,3,4}, {1,2,3}}
Any idea how to proceed efficiently?
Sort the elements of each vector
Drop duplicates (this is an easy look-up)
If you need to retain the original element order, then build any correspondence you wish: parallel arrays of vectors (original and sorted), pairs of (unsorted, sorted) vectors, etc. Drop duplicates based on the sorted ones.
I trust that you can take it from here.
What you are describing is the behavior of std::set, ie. this solves your problem:
set<set<int>> input = {{1,3,4}, {1,2,3}, {3,2,1}};
// input is now {{1,2,3},{1,3,4}}
This works because a set is basically equal to a sorted vector with no duplicates.
If you really want to, you can now convert to std::vector:
vector<vector<int>> nums;
for(auto & s : input) nums.emplace_back(s.begin(), s.end());

How to avoid out of range exception when erasing vector in a loop?

My apologies for the lengthy explanation.
I am working on a C++ application that loads two files into two 2D string vectors, rearranges those vectors, builds another 2D string vector, and outputs it all in a report. The first element of the two vectors is a code that identifies the owner of the item and the item in the vector. I pass the owner's identification to the program on start and loop through the two vectors in a nested while loop to find those that have matching first elements. When I do, I build a third vector with components of the first two, and I then need to capture any that don't match.
I was using the syntax "vector.erase(vector.begin() + i)" to remove elements from the two original arrays when they matched. When the loop completed, I had my new third vector, and I was left with two vectors that only had elements, which didn't match and that is what I needed. This was working fine as I tried the various owners in the files (the program accepts one owner at a time). Then I tried one that generated an out of range error.
I could not figure out how to do the erase inside of the loop without throwing the error (it didn't seem that swap and pop or erase-remove were feasible solutions). I solved my problem for the program with two extra nested while loops after building my third vector in this one.
I'd like to know how to make the erase method work here (as it seems a simpler solution) or at least how to check for my out of range error (and avoid it). There were a lot of "rows" for this particular owner; so debugging was tedious. Before giving up and going on to the nested while solution, I determined that the second erase was throwing the error. How can I make this work, or are my nested whiles after the fact, the best I can do? Here is the code:
i = 0;
while (i < AIvector.size())
{
CHECK:
j = 0;
while (j < TRvector.size())
{
if (AIvector[i][0] == TRvector[j][0])
{
linevector.clear();
// Add the necessary data from both vectors to Combo_outputvector
for (x = 0; x < AIvector[i].size(); x++)
{
linevector.push_back(AIvector[i][x]); // add AI info
}
for (x = 3; x < TRvector[j].size(); x++) // Don't need the the first three elements; so start with x=3.
{
linevector.push_back(TRvector[j][x]); // add TR info
}
Combo_outputvector.push_back(linevector); // build the combo vector
// then erase these two current rows/elements from their respective vectors, this revises the AI and TR vectors
AIvector.erase(AIvector.begin() + i);
TRvector.erase(TRvector.begin() + j);
goto CHECK; // jump from here because the erase will have changed the two increments
}
j++;
}
i++;
}
As already discussed, your goto jumps to the wrong position. Simply moving it out of the first while loop should solve your problems. But can we do better?
Erasing from a vector can be done cleanly with std::remove and std::erase for cheap-to-move objects, which vector and string both are. After some thought, however, I believe this isn't the best solution for you because you need a function that does more than just check if a certain row exists in both containers and that is not easily expressed with the erase-remove idiom.
Retaining the current structure, then, we can use iterators for the loop condition. We have a lot to gain from this, because std::vector::erase returns an iterator to the next valid element after the erased one. Not to mention that it takes an iterator anyway. Conditionally erasing elements in a vector becomes as simple as
auto it = vec.begin()
while (it != vec.end()) {
if (...)
it = vec.erase(it);
else
++it;
}
Because we assign erase's return value to it we don't have to worry about iterator invalidation. If we erase the last element, it returns vec.end() so that doesn't need special handling.
Your second loop can be removed altogether. The C++ standard defines functions for searching inside STL containers. std::find_if searches for a value in a container that satisfies a condition and returns an iterator to it, or end() if it doesn't exist. You haven't declared your types anywhere so I'm just going to assume the rows are std::vector<std::string>>.
using row_t = std::vector<std::string>;
auto AI_it = AIVector.begin();
while (AI_it != AIVector.end()) {
// Find a row in TRVector with the same first element as *AI_it
auto TR_it = std::find_if (TRVector.begin(), TRVector.end(), [&AI_it](const row_t& row) {
return row[0] == (*AI_it)[0];
});
// If a matching row was found
if (TR_it != TRVector.end()) {
// Copy the line from AIVector
auto linevector = *AI_it;
// Do NOT do this if you don't guarantee size > 3
assert(TR_it->size() >= 3);
std::copy(TR_it->begin() + 3, TR_it->end(),
std::back_inserter(linevector));
Combo_outputvector.emplace_back(std::move(linevector));
AI_it = AIVector.erase(AI_it);
TRVector.erase(TR_it);
}
else
++AI_it;
}
As you can see, switching to iterators completely sidesteps your initial problem of figuring out how not to access invalid indices. If you don't understand the syntax of the arguments for find_if search for the term lambda. It is beyond the scope if this answer to explain what they are.
A few notable changes:
linevector is now encapsulated properly. There is no reason for it to be declared outside this scope and reused.
linevector simply copies the desired row from AIVector rather than push_back every element in it, as long as Combo_outputvector (and therefore linevector) contains the same type than AIVector and TRVector.
std::copy is used instead of a for loop. Apart from being slightly shorter, it is also more generic, meaning you could change your container type to anything that supports random access iterators and inserting at the back, and the copy would still work.
linevector is moved into Combo_outputvector. This can be a huge performance optimization if your vectors are large!
It is possible that you used an non-encapsulated linevector because you wanted to keep a copy of the last inserted row outside of the loop. That would prohibit moving it, however. For this reason it is faster and more descriptive to do it as I showed above and then simply do the following after the loop.
auto linevector = Combo_outputvector.back();

How to sort a vector using sort() in c++

I want to insert an element in my already sorted vector and it'll be placed at its sorted position when it'll be sorted again.
There is a function called sort() for sorting an array.How can I use the same function to sort a vector?
Here's my code.It gives me the compilation error.
//assume g1 is already sorted with some numbers
int x;
cin >> x;
g1.push_back(x);
int s = g1.size()/g1.at(g1.begin());
//similar to int s = arr/arr[0]
sort(g1 + s,g1);
std::sort(g1.begin(), g1.end());
Should do the trick.
Alternatively, if you want to be able to switch out the vectorness of g1 with, say, a plain array, without having to modify this code, or if this was in template code where you want to work with multiple container types, then the form
std::sort(std::begin(g1), std::end(g1));
may hold more appeal.
In any case; Please, do read the documentation.
Instead of sorting, find the position where the new element should go and insert it there:
std::vector<int>::iterator loc = std::upper_bound(g1.begin(), g1.end(), x);
g1.insert(loc, x);
Or, more briefly:
g1.insert(std::upper_bound(g1.begin(), g1.end(), x), x);
Just keeping it simple, let arr be the vector name. Write: sort(arr.beg(),arr.end());
(assuming already you have written using namespace std;)

Selectively populated vectors with substrings extracted from a source string

I have a char array, in which its contents look something like the following:
char buffer[] = "I1 I2 V1 V2 I3 V3 I4 DO V4";
As you may see, it's a typical blank separated character string. I want to put all sub-string(s) starting with a letter "I" into a vector (IVector), and sort its elements in ascending order. At the same time, I'd also want to put all sub-string(s) starting with a letter "V" into another vector (VVector), and sort its elements in ascending order. The other(s) (e.g. "DO" in this example) will be ignored.
I'm not familiar with STL algorithm library. Are there any functions to help me achieve the avove-mentioned job?
Thank you!
You can iterate over all the substrings using an std::istream_iterator<std::string>:
std::stringstream s(buffer);
std::istream_iterator<std::string> begin(s);
std::istream_iterator<std::string> end;
for( ; begin != end; ++begin) {
switch((*begin)[0]) { // switch on first character
// insert into appropriate vector here
}
}
Then you can use std::sort to sort the vectors, as #Billy has already pointed out. You could also consider using an std::set, as that will always keep your items sorted in the first place.
Are there any functions to help me achieve the avove-mentioned job?
Yes. Have a look at std::find and std::sort.