Delete recurring characters - c++

For a little something I was trying out in C++, I have accepted a string (say 'a tomato is red') and gotten rid of spaces ('atomatoisred').
Now how would I go about deleting recurring characters only, on condition that the first instance of that character gets to stay (so our example becomes,'atomisred')?
Thanks in advance!

You can use the erase-remove idiom in conjunction with a set keeping track of the duplicate characters:
std::set<char> dupes;
str.erase(
std::remove_if(
str.begin(), str.end(),
[&](char c) { return not dupes.insert(c).second; }),
str.end());
This also uses the fact that the return value of std::set::insert is a pair whose second element is a bool indicating whether the insertion took place.

If you want to implement it yourself (without stl), there are a number of ways.
Through sorting. This works if you don't care about the order of the chars. Sort your string first, then go through it, performing a very simple check on every element:
if( currentElement == elemebtBeforeIt )
deleteCurrentElement
Another way is to have an array dedicated to the unique characters (well, maybe not an array, but you'll get the idea). Go through your string, and for each charcter, check:
foreach Element of the string:
if( arrayOfUniqueElements contains currentElement )
do nothing
else
put currentElement into the arrayOfUniquElements
After this, you will have all unique elements in the dedicated array.

Related

Substring of an element in a set

Is there a way to find and replace subset of a char*/string in a set?
Example:
std::set<char*> myset;
myset.insert("catt");
myset.insert("world");
myset.insert("hello");
it = myset.subsetfind("tt");
myset.replace(it, "t");
There are at least three reasons why this won't work.
std::set provides only the means to search the set for a value that compares equally to the value being searched for, and not to a value that matches some arbitrary portion of the value.
The shown program is undefined behavior. A string literal, such as "hello" is a const char *, and not a char *. No self-respecting C++ compiler will allow you to insert a const char * into a container of char *s. And you can't modify const values, by definition, anyway.
Values in std::set cannot be modified. To effect the modification of an existing value in a set, it must be erase()d, then the new value insert()ed.
std::set is simply not the right container for the goals you're trying to accomplish.
No, you can't (or at least shouldn't) modify the key while it's in the set. Doing so could change the relative order of the elements, in which case the modification would render the set invalid.
You need to start with a set of things you can modify. Then you need to search for the item, remove it from the set, modify it, then re-insert the result back into the set.
std::set<std::string> myset {"catt", "world", "hello"};
auto pos = std::find_if(myset.begin(), myset.end(), [](auto const &s) { return s.find("tt");};
if (pos != myset.end()) {
auto temp = *pos;
myset.remove(pos);
auto p= temp.find("tt");
temp.replace(p, 2, "t");
myset.insert(temp);
}
You cannot modify elements within a set.
You can find strings that contain the substring using std::find_if. Once you find matching elements, you can remove each from the set and add a modified copy of the string, with the substring replaced with something else.
PS. Remember that you cannot modify string literals. You will need to allocate some memory for the strings.
PPS. Implicit conversion of string literal to char* has been deprecated since C++ was standardized, and since C++11 such conversion is ill-formed.
PPPS. The default comparator will not be correct when you use pointers as the element type. I recommend you to use std::string instead. (A strcmp based comparator approach would also be possible, although much more prone to memory bugs).
You could use std::find_if with a predicate function/functor/lambda that searches for the substring you want.

string erase appends elements with backward iterators

I am trying to remove every '0' in a std::string, starting from the back. According to this, there are multiple ways to .erase backward iterators.
The weird thing is that they all append numbers to the std::string, instead of erasing them!
Here is a little sample
//'str' will be "5.090000"
std::string str = std::to_string(5.09);
auto it = std::remove_if(str.rbegin(), str.rend(), [](char value) {
return value == '0';
});
1) Pre-C++11 way:
str.erase(--(it.base()));
2) C++11 way (1)
str.erase(std::next(it).base());
3) C++11 way (2)
std::advance(it, 1);
str.erase(it.base());
In all cases, str == "5.095.9". Why? Because as I see it, str should be 5.9, but it isn't...
My best guess is that I am doing something wrong with the backwards iterators, because if you split the string: 5.09 | 5.9, the first value has still the in-between '0', but not the last ones. The second value is actually the std::string I expected.
What I am doing wrong?
I made 2 errors in my approach:
erase called with only 1 iterator removes the element the iterator is pointing at, not from the iterator to the end (as I falsely assumed)
So str.erase(std::next(it).base(), str.end()); - this is still wrong, continue reading ;)
As #KerrekSB pointed out, I didn't read the docs carefully enough: Because I am using std::reverse_iterator, the elements get pushed back to the front! So, as it points to the new end iterator (which is BTW before the not-removed elements), I have to delete the range from the beginning (str.begin()) to it.base().
TL;DR
The new working version is: str.erase(str.begin(), it.base());

efficient way to remove a list of string from a big vector

I am using visual studio 2012 (windows) and I am trying to write an efficient c++ function to remove some words from a big vector of strings.
I am using stl algorithms. I am a c++ beginner so I am not sure that it is the best way to proceed. This is what I have did :
#include <algorithm>
#include <unordered_set>
using std::vector;
vector<std::string> stripWords(vector<std::string>& input,
std::tr1::unordered_set<std::string>& toRemove){
input.erase(
remove_if(input.begin(), input.end(),
[&toRemove](std::string x) -> bool {
return toRemove.find(x) != toRemove.end();
}));
return input;
}
But this don't work, It doesn't loop over all input vector.
This how I test my code:
vector<std::string> in_tokens;
in_tokens.push_back("removeme");
in_tokens.push_back("keep");
in_tokens.push_back("removeme1");
in_tokens.push_back("removeme1");
std::tr1::unordered_set<std::string> words;
words.insert("removeme");
words.insert("removeme1");
stripWords(in_tokens,words);
You need the two-argument form of erase. Don't outsmart yourself and write it on separate lines:
auto it = std::remove_if(input.begin(), input.end(),
[&toRemove](std::string x) -> bool
{ return toRemove.find(x) != toRemove.end(); });
input.erase(it, input.end()); // erases an entire range
Your approach using std::remove_if() is nearly the correct approach but it erases just one element. You need to use the two argument version of erase():
input.erase(
remove_if(input.begin(), input.end(),
[&toRemove](std::string x) -> bool {
return toRemove.find(x) != toRemove.end();
}), input.end());
std::remove_if() reorders the elements such that the kept elements are in the front of the sequence. It returns an iterator it to the first position which is to be considered the new end of the sequence, i.e., you need to erase the range [it, input.end()).
You've already gotten a couple of answers about how to this correctly.
Now, the question is whether you can make it substantially more efficient. The answer to that will depend on another question: do you care about the order of the strings in the vector?
If you can rearrange the strings in the vector without causing a problem, then you can make the removal substantially more efficient.
Instead of removing strings from the middle of the vector (which requires moving all the other strings over to fill in the hole) you can swap all the unwanted strings to the end of the vector, then remove them.
Especially if you're only removing a few strings from near the beginning of a large vector, this can improve efficiency a lot. Just for example, let's assume a string you want to remove is followed by 1000 other strings. With this, you end up swapping only two strings, then erasing the last one (which is fast). With your current method, you end up moving 1000 strings just to remove one.
Better still, even with fairly old compilers, you can expect swapping strings to be quite fast as a rule--typically faster than moving them would be (unless your compiler is new enough to support move assignment).

Search string for first char different than "X"

The opposite of str.find('X') -
What is the most efficient way of finding first character in std::string that is different than specific char? If I have a string that consists of mainly X'es, but at some point there is another char - how do I find it quickly?
std::string str = "XXXXXXXXXXXXXXX.XXXXXXXXXXX";
size_t index = str.find_first_not_of('X');
But a plain old for loop will be just as good.
Or, if you want an iterator instead of an index, perhaps like this:
std::string::iterator = std::find_if(str.begin(), str.end(),
[](char c){ return c != 'X'; });
I think the most efficient way would be to iterate through the string and compare each character with 'X', returning the first one that is different.
Without any prior knowledge about the string, I don't see an approach better than O(n), and calling find('X') succesively might be worse than just iterating through the characters.

How can you compare words within a string array?

Basically I want to check a string array to see if any of the words match "and".
Is this possible?
Can you push me in the right direction?
Thanks
I should make it clear that the words are char put together best way to explain is an example
abc defg hijk and lmnop <-- each character is in its own element
I recommend you use std::string and not null-terminated char* strings (maybe you already are -- hard to be sure). And use a standard container rather than an array. Then use std::find (which would work on an array too, but containers are better).
Iterate through the array and use int string::compare ( const string& str ) const; to check for matches.
Break from the loop on first match.
I am guessing you'd like to take care of lower\upper casing and the word appearing in the beginning\end of the string.
std::string data;
std::transform(data.begin(), data.end(), data.begin(), ::tolower);
data.append(' ');
if (data.find("and ") != std::string::npos) ......