Remove duplicates in string algorithm

Remove duplicates in string algorithm - c++

My homework is remove duplicates in a random string. My idea is use 2 loops to solve the problem.
1st one will scan every character in the string.
2nd one will check that character is duplicated or not. If so, remove the character.
string content = "Blah blah..."
for (int i = 0; i < content.size(); ++i) {
char check = content.at(i);
for (int j = i + 1; j < content.size() - 1; ++j) {
if (check == content.at(j)) {
content.erase(content.begin()+j);
}
}
}
The problem is it doesn't work. It always removes the wrong character. Seems an indices problem but I don't understand why.
A temporary fix is change content.erase(content.begin()+j); to content.erase( remove(content.begin() + i+1, content.end(), check),content.end());
But I think trigger a "remove by value" scan isn't a nice way. I want to do it with 2 loops or fewer.
Any ideas will be appreciated :)

Your loops could look the following way
#include <iostream>
#include <string>
int main()
{
std::string s = "Blah blah...";
std::cout << '\"' << s << '\"' << std::endl;
for ( std::string::size_type i = 0; i < s.size(); i++ )
{
std::string::size_type j = i + 1;
while ( j < s.size() )
{
if ( s[i] == s[j] )
{
s.erase( j, 1 );
}
else
{
++j;
}
}
}
std::cout << '\"' << s << '\"' << std::endl;
return 0;
}
The output is
"Blah blah..."
"Blah b."
There are many other approaches using standard algorithms. For example
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
int main()
{
std::string s = "Blah blah...";
std::cout << '\"' << s << '\"' << std::endl;
auto last = s.end();
for ( auto first = s.begin(); first != last; ++first )
{
last = std::remove( std::next( first ), last, *first );
}
s.erase( last, s.end() );
std::cout << '\"' << s << '\"' << std::endl;
return 0;
}
The output is the same as for the previous code example
"Blah blah..."
"Blah b."

If use of STL is a possible option, you could use an std::unordered_set to keep the characters seen so far and the erase-remove idiom with std::remove_if, like in the following example:
#include <iostream>
#include <string>
#include <unordered_set>
#include <algorithm>
int main() {
std::string str("Hello World!");
std::unordered_set<char> log;
std::cout << "Before: " << str << std::endl;
str.erase(std::remove_if(str.begin(), str.end(), [&] (char const c) { return !(log.insert(c).second); }), str.end());
std::cout << "After: " << str << std::endl;
}
LIVE DEMO

I recommend a two pass approach. The first pass identifies the positions of the duplicated characters; the second pass removes them.
I recommend using a std::set and a std::vector<unsigned int>. The vector contains letters that are in the string. The vector contains the positions of the duplicated letters.
The first pass detects if a letter is present in the set. If the letter exists, the position is appended to the vector. Otherwise the letter is inserted into the set.
For the second pass, sort the vector in descending order.
Erase the character at the position in the vector, then remove the position from the vector.
By erasing characters from the end of the string towards the front, the positions of the remaining duplicates won't change when the character is erased from the string.

I am not sure that this is what is causing your problem, but another problem that I see with your code is in your second for loop. Your j < content.size() - 1 statement should just be
j < content.size().
The reasoning for this is a little tricky to see at first, but in this case you are not just getting the size of your vector to act as the size, but to act as the ending indices of your string. You are shortening the last indices by one which means you wont hit the last char in your string. I don't know if this will help your initial problem, but who knows?

Note: Your actual problem is maintaining a proper index to the next element in question:
If you do not erase a character, the next element is at the next position.
If you erase a character, the next element will move into the place of the current position (the position stays the same).
Also: There are more efficient solutions (eg.: utilizing a set)

Related

how to find a substring or string literal

I am trying to write a code that will search userInput for the word "darn" and if it is found, print out "Censored". if it is not found, it will just print out the userInput. It works in some cases, but not others. If the userInput is "That darn cat!", it will print out "Censored". However, if the userInput is "Dang, that was scary!", it also prints out "Censored". I am trying to use find() to search for the string literal "darn " (the space is because it should be able to determine between the word "darn" and words like "darning". I am not worrying about punctuation after "darn"). However, it seems as though find() is not doing what I would like. Is there another way I could search for a string literal? I tried using substr() but I couldn't figure out what the index and the len should be.
#include <iostream>
#include <string>
using namespace std;
int main() {
string userInput;
userInput = "That darn cat.";
if (userInput.find("darn ") > 0){
cout << "Censored" << endl;
}
else {
cout << userInput << endl;
} //userText.substr(0, 7)
return 0;
}

The problem here is your condition. std::string::find returns a object of std::string::size_type which is an unsigned integer type. That means it can never be less than 0 which means
if (userInput.find("darn ") > 0)
will always be true unless userInput starts with "darn ". Because of this if find doesn't find anything then it returns std::string::npos. What you need to do is compare against that like
if (userInput.find("darn ") != std::string::npos)
Do note that userInput.find("darn ") will not work in all cases. If userInput is just "darn" or "Darn" then it won't match. The space needs to be handled as a separate element. For example:
std::string::size_type position = userInput.find("darn");
if (position != std::string::npos) {
// now you can check which character is at userInput[position + 4]
}

std::search and std::string::replace were made for this:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string userInput;
userInput = "That darn cat is a varmint.";
static const string bad_words [] = {
"darn",
"varmint"
};
for(auto&& bad : bad_words)
{
const auto size = distance(bad.begin(), bad.end());
auto i = userInput.begin();
while ((i = std::search(i, userInput.end(), bad.begin(), bad.end())) != userInput.end())
{
// modify this part to allow more or fewer leading letters from the offending words
constexpr std::size_t leading_letters = 1;
// never allow the whole word to appear - hide at least the last letter
auto leading = std::min(leading_letters, std::size_t(size - 1));
auto replacement = std::string(i, i + leading) + std::string(size - leading, '*');
userInput.replace(i, i + size, replacement.begin(), replacement.end());
i += size;
}
}
cout << userInput << endl;
return 0;
}
expected output:
That d*** cat is a v******.

shuffle letters in a string c++

I'm meant to write a function that keeps the first and last letter of a string the same and ignores any non-letters after the last letter. I'm supposed to use the std::random_shuffle(). I read the documentation however I don't seem to grasp the concept of this function. This is my code:
#include <iostream>
#include <algorithm>
#include <string>
std::string mix(std::string s){
int last_char;
if(std::isalpha(s[s.size()-1]) == true){
last_char = s.size()-1;
} else {
for(int i=s.size()-1 ; i>0; --i){
if((std::isalpha(s[i]) == false) && (std::isalpha(s[i-1])==true)){
last_char = i -1;
break;
}
}
}
std::random_shuffle(&s[1],&s[last_char]);
return s;
}
int main(){
std::string test = "Hello";
std::cout << mix(test) << std::endl;
}
edit: Now however I keep getting the error: segmentation fault(core dumped). ANyone got an idea why? Can't seem to find the problem.

std::random_shuffle takes iterators or pointers, as arguments, not values in the array/container to be sorted. Your call to std::random_shuffle should probably be:
std::random_shuffle(&s[1],&s[last_char]);
Note that the second parameter is the ending iterator value. The ending iterator doesn't point to the last value to sort, but the one after that.
This is not the only problem with the shown code. You'll need to fix several bugs in your code that precedes the call to std::random_shuffle. For example:
for(int i=s.size() ; i>0; --i){
if((std::isalpha(s[i]) == false) && (std::isalpha(s[i-1])==true)){
s.size() gives you the size of the string. On the first iteration, i will be equal to its size(), but accessing s[i] will now result in undefined behavior, and a bug, since s[i] obviously does not exist. In a string that contains n characters, the characters are s[0] through s[n-1], of course.
You will need to fix your algorithm so that last_char ends up being the index of the next character after the one you want to shuffle, and then use the fixed std::random_shuffle call, above.
Or, alternatively, compute last_char to be the index of the last character to sort, and call
std::random_shuffle(&s[1],&s[last_char+1]);
Either approach will be fine.

You'll need to find the leftmost "non-letter" in the right side of your string.
One place to the left of this is the position of the last letter.
One place to the right is your first letter.
Simply invoke random_shuffle with your "first" and "last".
Here are some useful links:
http://www.cplusplus.com/reference/algorithm/random_shuffle/
Remember that "begin" is inclusive, "end" is exclusive"

Something to get you started. It has at least one corner case that you'll have to fix. Visit cppreference.com to understand how the algorithms work.
#include <iostream>
#include <cctype>
#include <algorithm>
#include <string>
std::string
special_shuffle(std::string s)
{
if (s.size() < 3) return s;
auto begin = std::find_if(s.begin(), s.end(), ::isalpha);
auto end = std::find_if(s.rbegin(), s.rend(), ::isalpha).base();
std::random_shuffle(++begin, --end);
return s;
}
int
main()
{
std::string s1 = "Hello World!";
std::string s2 = "AB";
std::string s3 = "A";
std::string s4 = "";
std::string s5 = "a string going from a to z";
std::cout << s1 << " --> " << special_shuffle(s1) << "\n"
<< s2 << " --> " << special_shuffle(s2) << "\n"
<< s3 << " --> " << special_shuffle(s3) << "\n"
<< s4 << " --> " << special_shuffle(s4) << "\n"
<< s5 << " --> " << special_shuffle(s5) << "\n";
}
Compile and run:
$ g++ example.cpp -std=c++14 -Wall -Wextra
$ ./a.out
Hello World! --> Hooll eWlrd!
AB --> AB
A --> A
-->
a string going from a to z --> aarfritomgi nnso t g goz

How to advance iterator c++ through every character in string

What I'm confused about is that I have a map which is made up of size_t of string as the key, and strings as the value.
std::multimap<size_t, std::string> wordMap;
Then I have a pair that stores the equal_range for all strings with size of 4. Then I want to iterate through the start of that equal_range to the end of that equal_range. The start is my pair.first and end is my pair.second. How would I iterate through every character that my pair.first points too and then compare that to every word in between pair.first and pair.second ?
pair<multimap<size_t, string>::iterator, multimap<size_t, string>::iterator> key_range;
key_range = wordMap.equal_range(n);
Basically I want to compare every letter in word1 to every character in word2.
Advance itr2 which is word2 to the next word and compare every letter in that to every letter in word1. Do this for every word then advance itr1 which is word1 to another word and compare that to every word.
How would I get every character itr2 points to? I think the first for loop accomplishes this for the first iterator but I don't know how to do it for itr2.
for (word_map::iterator itr = key_range.first; itr != key_range.second; itr++) { //this loop will iterate through every word to be compared
for (word_map::iterator itr2 = next(key_range.first); itr2 != key_range.second; itr2++) { //this loop will iterate through every word being compared against itr1
int i = 0;
int hit = 0;
for (char& c1 : itr->first) {
char& c2{ (itr2)->first[i] };
if(c1 != c2)
hit++;
i++;
}
}
I'd like to compare every letter in every word against each other as long as they have the same string size. Then if hit == 1 that means the words are only off by 1 character and they should be mapped or stored in some type of STL container that groups them. I'm still new to STL so i'm thinking a set but I need to read more into it.

First, you'd be more likely to get assistance if you provided a minimal compilable example. I'm assuming your words are std::strings for this answer, but you know what they say about assuming.
There are algorithms like "zip" which is implemented in Boost specifically for iterating over mulitple collections simultaneously, but I don't think there's anything comparable in the standard library.
A simple but unpleasantly fiddly approach would be just to manually iterate through both strings. This will output each letter in the two words until either one word ends, or there's a difference.
Note all the fiddly bits: you need to make sure both iterators are valid at all times in case one word ends before the other, and working out what actually happened is a bit cumbersome.
#include <string>
#include <iostream>
int main()
{
std::string word1 = "capsicum";
std::string word2 = "capsicube";
std::string::iterator it1 = word1.begin();
std::string::iterator it2 = word2.begin();
while (it1 != word1.end() && it2 != word2.end())
{
// characters are different!
if (*it1 != *it2)
break;
// characters are the same
std::cout << "Both have: " << *it1 << std::endl;
// advance both iterators
++it1;
++it2;
}
if (it1 == word1.end() && it2 == word2.end())
{
std::cout << "Words were the same!" << std::endl;
}
else if (it1 == word1.end())
{
std::cout << "Word 1 was shorter than word 2." << std::endl;
}
else if (it2 == word2.end())
{
std::cout << "Word 1 was longer than word 2." << std::endl;
}
else
{
std::cout << "Words were different after position " << (it1 - word1.begin())
<< ": '" << *it1 << "' vs '" << *it2 << "'" << std::endl;
}
}

New answer, since the question was significantly updated. I'm still not sure this will do exactly what you want, but I think you should be able to use it to get where you want to go.
I've written this as a minimal, complete, verifiable example, which is ideally how you should pose your questions. I've also used C++11 features for brevity/readability.
Hopefully the inline comments will explain things sufficiently for you to at least be able to do your own research for anything you don't fully understand, but feel free to comment if you have any more questions. The basic idea is to store the first word (using the key_range.first iterator), and then start iterating from the following iterator using std::next(), until we reach the end iterator in key_pair.second.
This then gives us word1 outside of the loop, and word2 within the loop which will be set to every other word in the list. We then use the "dual interation" technique I posted in my other answer to compare each word character by character.
#include <map>
#include <string>
#include <iostream>
int
main()
{
std::multimap<size_t, std::string> wordMap;
wordMap.insert({4, "dogs"});
wordMap.insert({4, "digs"});
wordMap.insert({4, "does"});
wordMap.insert({4, "dogs"});
wordMap.insert({4, "dibs"});
// original type declaration...
// std::pair<std::multimap<size_t, std::string>::iterator, std::multimap<size_t, std::string>::iterator> key_range;
// C++11 type inference...
auto key_range = wordMap.equal_range(4);
// make sure the range wasn't empty
if (key_range.first == key_range.second)
{
std::cerr << "No words in desired range." << std::endl;
return 1;
}
// get a reference to the first word
std::string const& word1 = key_range.first->second;
std::cout << "Comparing '" << word1 << "' to..." << std::endl;
// loop through every iterator from the key_range, skipping for the first
// (since that's the word we're comparing everything else to)
for (auto itr = std::next(key_range.first); itr != key_range.second; ++itr)
{
// create a reference for clarity
std::string const& word2 = itr->second;
std::cout << "... '" << word2 << "'";
// hit counter; where hit is defined as characters not matching
int hit = 0;
// get iterators to the start of each word
auto witr1 = word1.begin();
auto witr2 = word2.begin();
// loop until we reach the end of either iterator. If we're completely
// confident the two words are the same length, we could only check
// one of them; but defensive coding is a good idea.
while (witr1 != word1.end() && witr2 != word2.end())
{
// dereferencing the iterators will yield a char; so compare them
if (*witr1 != *witr2)
++hit;
// advance both iterators
++witr1;
++witr2;
}
// do something depending on the number of hits
if (hit <= 1)
{
std::cout << " ... close enough!" << std::endl;
}
else
{
std::cout << " ... not a match, " << hit << " hits." << std::endl;
}
}
}

How to compare two arrays and return non matching values in C++

I would like to parse through two vectors of strings and find the strings that match each other and the ones that do not.
Example of what I want get:
input vector 1 would look like: [string1, string2, string3]
input vector 2 would look like: [string2, string3, string4]
Ideal output:
string1: No Match
string2: Match
string3: Match
string4: No Match
At the moment I use this code:
vector<string> function(vector<string> sequences, vector<string> second_sequences){
for(vector<string>::size_type i = 0; i != sequences.size(); i++) {
for(vector<string>::size_type j = 0; j != second_sequences.size(); j++){
if (sequences[i] == second_sequences[j]){
cout << "Match: " << sequences[i];
}else{
cout << "No Match: " << sequences[i];
cout << "No Match: " << second_sequences[j];
}
}
}
}
It works great for the ones that match, but iterates over everything so many times,
and the ones that do not match get printed a large number of times.
How can I improve this?

The best code is the code that you did not have to write.
If you take a (STL) map container it will take care for you of sorting and memorizing the different strings you encounter.
So let the container works for us.
I propose a small code quickly written. You need for this syntax to enable at least the C++ 2011 option of your compiler ( -std=c++11 on gcc for example ). The syntax that should be used before C++11 is much more verbose (but should be known from a scholar point of view ).
You have only a single loop.
This is only a hint for you ( my code does not take into account that in the second vector string4 could be present more than once, but I let you arrange it to your exact needs)
#include <iostream>
#include <vector>
#include <string>
#include <map>
using namespace std;
vector<string> v1 { "string1","string2","string3"};
vector<string> v2 { "string2","string3","string4"};
//ordered map will take care of "alphabetical" ordering
//The key are the strings
//the value is a counter ( or could be any object of your own
//containing more information )
map<string,int> my_map;
int main()
{
cout << "Hello world!" << endl;
//The first vector feeds the map before comparison with
//The second vector
for ( const auto & cstr_ref:v1)
my_map[cstr_ref] = 0;
//We will look into the second vector ( it could also be the third,
//the fourth... )
for ( const auto & cstr_ref:v2)
{
auto iterpair = my_map.equal_range(cstr_ref);
if ( my_map.end() != iterpair.first )
{
//if the element already exist we increment the counter
iterpair.first->second += 1;
}
else
{
//otherwise we put the string inside the map
my_map[cstr_ref] = 0;
}
}
for ( const auto & map_iter: my_map)
{
if ( 0 < map_iter.second )
{
cout << "Match :";
}
else
{
cout << "No Match :" ;
}
cout << map_iter.first << endl;
}
return 0;
}
Output:
No Match :string1
Match :string2
Match :string3
No Match :string4

std::sort(std::begin(v1), std::end(v1));
std::sort(std::begin(v2), std::end(v2));
std::vector<std::string> common_elements;
std::set_intersection(std::begin(v1), std::end(v1)
, std::begin(v2), std::end(v2)
, std::back_inserter(common_elements));
for(auto const& s : common_elements)
{
std::cout<<s<<std::endl;
}

How do I find all the positions of a substring in a string?

I want to search a large string for all the locations of a string.

The two other answers are correct but they are very slow and have O(N^2) complexity. But there is the Knuth-Morris-Pratt algorithm, which finds all substrings in O(N) complexity.
Edit:
Also, there is another algorithm: the so-called "Z-function" with O(N) complexity, but I couldn't find an English source for this algorithm (maybe because there is also another more famous one with same name - the Z-function of Rieman), so I will just put its code here and explain what it does.
void calc_z (string &s, vector<int> & z)
{
int len = s.size();
z.resize (len);
int l = 0, r = 0;
for (int i=1; i<len; ++i)
if (z[i-l]+i <= r)
z[i] = z[i-l];
else
{
l = i;
if (i > r) r = i;
for (z[i] = r-i; r<len; ++r, ++z[i])
if (s[r] != s[z[i]])
break;
--r;
}
}
int main()
{
string main_string = "some string where we want to find substring or sub of string or just sub";
string substring = "sub";
string working_string = substring + main_string;
vector<int> z;
calc_z(working_string, z);
//after this z[i] is maximal length of prefix of working_string
//which is equal to string which starting from i-th position of
//working_string. So the positions where z[i] >= substring.size()
//are positions of substrings.
for(int i = substring.size(); i < working_string.size(); ++i)
if(z[i] >=substring.size())
cout << i - substring.size() << endl; //to get position in main_string
}

Using std::string::find. You can do something like:
std::string::size_type start_pos = 0;
while( std::string::npos !=
( start_pos = mystring.find( my_sub_string, start_pos ) ) )
{
// do something with start_pos or store it in a container
++start_pos;
}
EDIT: Doh! Thanks for the remark, Nawaz! Better?

I'll add for completeness, there is another approach that is possible with std::search, works like std::string::find, difference is that you work with iterators, something like:
std::string::iterator it(str.begin()), end(str.end());
std::string::iterator s_it(search_str.begin()), s_end(search_str.end());
it = std::search(it, end, s_it, s_end);
while(it != end)
{
// do something with this position..
// a tiny optimisation could be to buffer the result of the std::distance - heyho..
it = std::search(std::advance(it, std::distance(s_it, s_end)), end, s_it, s_end);
}
I find that this sometimes outperforms std::string::find, esp. if you represent your string as a vector<char>.

Simply use std::string::find() which returns the position at which the substring was found, or std::string::npos if none was found.
Here is the documentation.
An here is the example taken from this documentation:
// string::find
#include <iostream>
#include <string>
using namespace std;
int main ()
{
string str ("There are two needles in this haystack with needles.");
string str2 ("needle");
size_t found;
// different member versions of find in the same order as above:
found=str.find(str2);
if (found!=string::npos)
cout << "first 'needle' found at: " << int(found) << endl;
found=str.find("needles are small",found+1,6);
if (found!=string::npos)
cout << "second 'needle' found at: " << int(found) << endl;
found=str.find("haystack");
if (found!=string::npos)
cout << "'haystack' also found at: " << int(found) << endl;
found=str.find('.');
if (found!=string::npos)
cout << "Period found at: " << int(found) << endl;
// let's replace the first needle:
str.replace(str.find(str2),str2.length(),"preposition");
cout << str << endl;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove duplicates in string algorithm - c++

Related

how to find a substring or string literal

shuffle letters in a string c++

How to advance iterator c++ through every character in string

How to compare two arrays and return non matching values in C++

How do I find all the positions of a substring in a string?

Categories

Resources