Remove repetitions of patterns in a string

Remove repetitions of patterns in a string - c++

Let's say I have the following string:
aabccd
I'd like to find and remove all patterns that are repeated. In this example, one a is followed by another one, which is a repetition. Same thing about cc.
The final string would be:
bd
Another example: banana => ba (because two an or two na).
Here's the algorithm I've come up with:
Splitting the string in two halves and taking the biggest one
Let's say it has a length of 4; I look in the other half if it's there
Next turn of loop, I shift my string from one character (but still of
length 4) until I loop all the possible patterns
Then I reduce the half by one character (so length 3) and start again
// etc.
It would do something like this:
helloguys -> hell -> ello -> llog
And some code, not working as it should:
std::string processWord(std::string word)
{
// Split word in the two halves
std::string firstHalf = word.substr(0, word.size() / 2);
std::string secondHalf = word.substr(firstHalf.size());
// Check if the tiniest half is present in the biggest one (firstHalf will be eitheir lower or equal to secondHalf)
if (secondHalf.find(firstHalf) != std::string::npos)
{
std::cout << firstHalf << " is found in " << secondHalf << std::endl;
// Remove firstHalf from secondHalf
word.replace(word.find(firstHalf), firstHalf.size(), "");
std::cout << word << std::endl;
return word;
}
for (size_t i = 1; i < secondHalf.size(); ++i)
{
// Get secondHalf minus one character at each loop turn to check occurences
std::string occurence = secondHalf.substr(0, secondHalf.size() - i);
// Mark the first occurence
size_t startIndex = indexOf(word, occurence);
// Mark the last occurence
size_t lastIndex = startIndex;
int totalOccurences = 1;
// As long as there are occurences following each other, we continue
// Example: "anandgdfgd". We see "an" twice. But it would not work with "anshfsduihsuan" since they are not following each other
while (word.find(occurence, lastIndex + occurence.size()) != std::string::npos)
{
lastIndex += occurence.size();
totalOccurences++;
}
std::ostringstream oss;
oss << "[" << totalOccurences << "*" << occurence << "]";
word.replace(startIndex, lastIndex, oss.str());
}
// Some code is missing here
return word;
}
I'm sure this problem has already been solved but I can't find anything. Any tips?

Clear rules have not been defined yet, but I think that building Suffix Array is good starting point. You can compare neighbour suffixes, find length of common starting parts and make decision - what substrings are the best candidates for removing.

Related

Comparing two arrays against each other to find a common character

Consider:
int main() {
int x;
const int Maxword = 5;
char Guess[Maxword] {};
std::string words[Maxword] = {
"Hello",
"World",
"Shift",
"Green",
"Seven"
};
srand(time(NULL));
int iSecret = rand() % Maxword;
std::string Word(words[iSecret]);
for (int i = 0; i < 5; i++) {
std::cout << Word[i] << std::endl;
}
for (int i = 0; i < 5; i++) {
std::cout << ("Please enter the letters you would like to guess") << std::endl;
std::cin >> Guess[i];
std::cout << Guess[i] << std::endl;
}
for (int i = 0; i < 5; i++) {
if (Guess[i] == Word[i]) {
std::cout << Guess[i] << "\t" << "Is in the right place" << std::endl;
} else if (Guess[i] != Word[i]) {
std::cout << Guess[i] << "\t" << "Isnt in the right place" << std::endl;
} else {
}
}
}
Here I have my code and I would like to compare the two character arrays Guess and Word to check for a common character. How would I do this?

According to your clarification in the comments, you want to have three kinds of output:
the letter is in the word and guessed in the correct position
the letter is in the word but guessed in the wrong position
the letter is not in the word
Your current loop is handling only case #1.
To handle #2 (and by extension #3) you can use std::string::find. That function will return the position of the character in the string, or if not found it returns std::string::npos.
So, the logic could be:
if (Guess[i] == Word[i]) {
std::cout << Guess[i] << "\tis in the right place\n";
} else if (Word.find(Guess[i]) != std::string::npos) {
std::cout << Guess[i] << "\tisn't in the right place\n";
} else {
std::cout << Guess[i] << "\tisn't in the word\n";
}
See the documentation for std::string::find
In C++23, there is a nicer way to test if the string contains a character: std::string::contains
As a closing comment, it's important that you understand the distinction between a string object and an array:
In your question and followup discussion, you keep referring to Word an array, but it is not -- it is a std::string. It overloads the operator[](size_t) so that you can use array syntax to access characters, but that does not make it an array.
The only arrays in your program are Guess (an array of char values) and words (an array of std::string values).

I know this is strictly speaking not directly an answer to your question but I believe it might help you and others nevertheless.
Correct me if I am wrong but I believe what you are trying to do is to implement the game of Wordle.
Wordle with Map
While your approach with arrays would work I have implemented a different approach to implement the game using maps. Maps are well suited for quick lookups in constant time i. e. O(1) so they are generally faster than searching in arrays and this is why I am using this approach here now. However, for this little example with just 5 character searching in an array might probably be even faster than in a map.
Pre-processing
As soon as you know the word you can pre-process a map which maps each character in the word to its position in the word. As there might be many positions for the same character I store the positions in a list.
Checking guesses
Now when you have to check a guess you iterate over all characters of the guess and you can now use the pre-processed map to check whether a character is in the word (i.e in the map) or not. This will happen in constant time O(1) independent of the size of the word.
If the character is in the map, retrieve the associated list of positions for this character and iterate over this list in order to check if any of the positions match the position in the guess.
If so, the letter is at the correct position, if not, it is in the word but at a wrong position.
Java program
Unfortunately I am not a C++ programmer but I believe the following Java code should easily be portable to C++. Instead of Java's HashMap you may use std::unordered_map and the rest should just be some small differences in syntax.
This is just a simple implementation of Wordle in Java without getting into too much detail but it should be sufficient as a reference for your implementation.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
/**
* Simple implemantation of Wordle.
*/
public class Wordle {
// word to guess
private String word;
// map: Character in word -> list of positions of that character in the word
private HashMap<Character, List<Integer>> wordLookup;
public Wordle(String word){
this.word = word;
// allocate memory for lookup table containing characters of word and their positions within the word
this.wordLookup = new HashMap<>(word.length());
var wordCharArr = word.toCharArray();
// for each character in the word save it's position
for (int i = 0; i < wordCharArr.length; i++) {
if(this.wordLookup.containsKey(wordCharArr[i])){
// a character appears more than once -> add the other position to the list of positions
var positionsList = this.wordLookup.get(wordCharArr[i]);
positionsList.add(i);
} else {
var positionsList = new ArrayList<Integer>();
positionsList.add(i);
this.wordLookup.put(wordCharArr[i], positionsList);
}
}
}
public void checkGuess(String guess){
checkGuess(guess.toCharArray());
}
public void checkGuess(char[] guess){
System.out.println("Checking you guess...");
for (int i = 0; i < guess.length; i++) {
var guessedChar = guess[i];
// lookup if guessed character is in word
if(wordLookup.containsKey(guessedChar)){
// guessed character is in word
// is it in (one of) the right position(s)?
var posInWord = wordLookup.get(guessedChar);
int finalI = i;
if(posInWord.stream().anyMatch(pos -> finalI == pos)){
// character also is at the right position (this implementation does not signal that there may be multiple positions this character might go)
System.out.printf("> %c is correct at position %s!\n", guessedChar, i);
}
else {
// character is not at the right position
System.out.printf("> %c is in word but not at the right position!\n", guessedChar);
}
} else {
// guessed character is not in word
System.out.printf("> %c is not in word!\n", guessedChar);
}
}
System.out.println("Done.");
}
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public HashMap<Character, List<Integer>> getWordLookup() {
return wordLookup;
}
public void setWordLookup(HashMap<Character, List<Integer>> wordLookup) {
this.wordLookup = wordLookup;
}
public static void main(String[] args) {
// create a game of Wordle for the word "hello"
var aGameOfWordle = new Wordle("hello");
// get user input for a guess here
//...
// check guesses..
aGameOfWordle.checkGuess("hello");
aGameOfWordle.checkGuess("holas");
}
}
Expected output:
Checking you guess...
> h is correct at position 0!
> e is correct at position 1!
> l is correct at position 2!
> l is correct at position 3!
> o is correct at position 4!
Done.
Checking you guess...
> h is correct at position 0!
> o is in word but not at the right position!
> l is correct at position 2!
> a is not in word!
> s is not in word!
Done.

Cannot understand the for loops used in this code

I was trying out a solution to a question, when I came across this code snippet written in C++:
string s;
cin >> s;
vector<int> r;
for (string t: {"twone", "one", "two"}) {
for (size_t pos = 0; (pos = s.find(t, pos)) != string::npos;) {
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
}
}
cout << r.size() << endl;
for (auto rr: r)
cout << rr + 1 << " ";
cout << endl;
I am new to the language and was unable to understand what is happening in the second (nested) for loop and the 3rd for loop. Can someone help me to understand?

The first and the third loops are range-based for loops.
The first loop iterates over a container of strings. So t takes successively the value "twone", "one", and "two"
The second loop searches for all the occurences of t in the string s (each search starts from position pos of the previous occurence found). As long as a element is found it does:
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
The push_back() stores the position of the middle of each occurence found in a vector of integers.
The third loop iterates over this vector of stored positions and prints the elements (the positions count starts at 0, the +1 shifts the printed positions as if the count would start with 1).

One of the main ways to try and understand complex code is to try and simplify it. It also helps to know what the involved functions do, so a reference to std::string::find is helpful to read.
First of all, lets skip the body and concentrate only on the loop itself:
for (size_t pos = 0; (pos = s.find(t, pos)) != string::npos;) {
}
All for loops could be seen as a while loop, and while loops could be somewhat easier to understand and follow, so we convert it to such a while loop:
size_t pos = 0;
while (pos = s.find(t, pos)) != string::npos)
{
}
This might not help so much as it's the condition that is most likely the hard part to understand, so then we simplify that as well:
size_t pos = 0;
pos = s.find(t, pos);
while (pos != string::npos)
{
pos = s.find(t, pos);
}
The initialization of pos could then be further simplified:
size_t pos = s.find(t);
while (pos != string::npos)
{
pos = s.find(t, pos);
}
Now the loop itself is a simple as it could be, and looking at it we see that basically attempt to find the sub-string t inside the string s. The loop continues as long as the sub-string t is found inside s.
Now that we deconstructed the loop itself, let's take a look at the loop-body, and what it does:
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
First of all lets pull out the common sub-expression into a temporary variable:
auto new_pos = pos + t.length() / 2;
s[new_pos] = '?';
r.push_back(new_pos);
The first statement
s[new_pos] = '?';
replaces the middle character of the sub-string t inside s with the character '?'.
The second statement
r.push_back(new_pos);
pushes the position of the '?' into the vector r.
Now lastly we put the inner loop (explained above) into the context of the outer loop:
for (string t: {"twone", "one", "two"})
This is a range-based for loop which loops over all elements in the container on the right-hand side of the :. That is, the loop will iterate three times, with t being equal to "twone", "one" and "two" in that order.
So loops will search for "twone", "one" and "two" inside the string s, replace the middle character of the sub-strings ("twone", "one" and "two") inside s with a single '?' character, and push the position of that '?' character into the vector r.
For example if the input in s is "someone with the number two" then the result will the the string "someo?e with the number t?o", and the vector r should contain the values 5 and 25 (which will be printed as 6 and 26 because of the + 1).
Here's an example shoing exactly that.

Just run the code inserting in it an output pf intermediate results.
Here is a demonstrative program.
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::string s;
std::cin >> s;
std::vector<int> r;
for ( const std::string &t : { "twone", "one", "two" } )
{
for ( std::string::size_type pos = 0; (pos = s.find( t, pos ) ) != std::string::npos; )
{
s[pos + t.length() / 2] = '?';
std::cout << pos << ": " << s << '\n';
r.push_back( pos + t.length() / 2 );
}
}
std::cout << r.size() << '\n';
for ( const auto &rr: r ) std::cout << rr + 1 << " ";
std::cout << '\n';
}
Let's assume that the user entered string onetwoone. So the inner loop searches in the entered string all occurrences of words "twone", "one", "two" sequentially.
For the given string the word "twone" is not found.
The word "one" is found at position 0. This statement
s[pos + t.length() / 2] = '?';
the middle character of the found word in the entered string by the sign '?'.
Thus this added statement
std::cout << pos << ": " << s << '\n';
outputs
0: o?etwoone
The position of the sign '?' (the number 1) is stored in the vector.
Then within the loop the word "one" is found second time. And again the middle character of the found word is substituted for '?'. So this statement
std::cout << pos << ": " << s << '\n';
outputs
6: o?etwoo?e
The position of the sign '?' (the number 7) is stored in the vector.
So at this moment we have the following output
0: o?etwoone
6: o?etwoo?e
The word "one" is not found any more.
The word "two" is occurred only once in the given string. SO the output is
3: o?et?oo?e
The position of '?' equal to 4 is stored in the vector.
Now at this moment we have the following output
0: o?etwoone
6: o?etwoo?e
3: o?et?oo?e
produced by the inner loop.
So as a result three occurrences of the words are found in the entered string.
Thus these statements
std::cout << r.size() << '\n';
for ( const auto &rr: r ) std::cout << rr + 1 << " ";
output
3
2 8 5
The last values correspond to expressions rr + 1 that is to stored positions of the sign '?' plus 1.

How to remove all double spaces from string

I am attempting to remove all double spaces from my string so that only single spaces remain:
while (doublespace != -1) {
kstring.replace(doublespace, 1, " ") ;
doublespace = kstring.find_first_of(" ") ; }
it finds the first double space, triggering the while statement. It then takes the first space, adds 1 to it and sets the two spaces to one space. Then it checks again.
The problem is that the loop never ends - for example if I put "hello " doubleSpace would never be set to -1.

std::string::find_first_of only searches until it finds one of the characters in the input string, so as you pass it " ", it'll effectively only search for " " - see the documentation here:
Searches the string for the first character that matches any of the characters specified in its arguments.
You should use std::string::find instead, which searches for the first instance of the entire substring:
Notice that unlike member find_first_of, whenever more than one character is being searched for, it is not enough that just one of these characters match, but the entire sequence must match.
You're also replacing only the first space with a space (sString.replace(doubleSpace, 1, " "), which means your output will still contain double spaces. Just use std::string::erase instead, to erase just the first space.
This means your code snippet should look more like:
std::size_t doubleSpace = sString.find(" ");
while (doubleSpace != std::string::npos)
{
sString.erase(doubleSpace, 1);
doubleSpace = sString.find(" ");
}

Here is an alternative take that returns copy with only single spaces in it:
#include <iostream>
#include <string>
int main()
{
std::string str = " hello - h e l l o ";
std::string newstr;
size_t beg = 0;
size_t len = str.length();
while (beg < len)
{
size_t end = str.find_first_of(' ', beg) + 1;
newstr += str.substr(beg, end - beg);
beg = str.find_first_not_of(' ', end);
}
std::cout << newstr << std::endl;
return 0;
}
Result:
hello - h e l l o
As suggested by #hnefatl this approach could also be more efficient (see comment below)

I see two errors in your code. The first is that find_first_of() only searches for one of the characters you supply so, in your case, it will only be looking for single spaces. Secondly you only replace one space, not two.
This should fix both those problems:
std::string& reduce_double_spaces(std::string& s)
{
std::string::size_type pos = s.find(" ");
while (pos != std::string::npos) {
// replace BOTH spaces with one space
s.replace(pos, 2, " ");
// start searching again, where you left off
// rather than going back to the beginning
pos = s.find(" ", pos);
}
return s;
}
NOTE: By beginning the subsequent searches from the place you found your last space, this version should be much more efficient. The longer the string the bigger the savings.

This alternative uses const time operations of back(), pop_back(), push_back(), empty(), and size(),
std::string str = " hello - h e l l o ";
std::string newStr = str; // diag only
std::string Miss; Miss.reserve(str.size());
while ( str.size() > 1 ) // note: str.back() undefined when str.empty()
{
// fetch copy and remove last element
char aKar = str.back(); str.pop_back();
if (! ((' ' == aKar) && // space
( ' ' == str.back()))) // space
{
Miss.push_back(aKar); // not double-space
}
}
assert(1 == str.size()); // optional
while (! Miss.empty() ) // restore str to original order
{
str.push_back (Miss.back()); // copy last element
Miss.pop_back(); // remove last element
}
assert(Miss.empty()); // optional
std::cout << "\n " << __FUNCTION__
<< "\n in: " << newStr
<< "\n out: " << str << std::endl;

Is there a better way to count substring occurrences within a string than char* and a loop?

I have this line:
const char *S1 = "AaA BbB CcC DdD AaA";
I think that this creates a pointer *S1, which is located a constant char type value and has the AaA BbB CcC DdD AaA value in it. Is that right?
If so, how can I read each character of this constant value and recognize how many times AaA occurs?
I was thinking of creating a loop that will copy each letter to a different cell and then 3 enclosed if statements, of which the first could check for A, the second for a and so one. And if those 3 are true I will increment a counter like so i++. Is that correct?
I think it's too complicated and it can be done with less code.

Your fundamental approach is sound. However, it’s complex and doesn’t scale: what if you wanted to search for a word with more than three letters? Four ifs? Five ifs? Six …? Clearly that won’t do.
Instead, use two loops: one to go over the string you search in (the “haystack” or “reference”) and one over the string you search for (“needle” or “pattern”).
But luckily you don’t even have to do that, because C++ gives you the tools to search for the occurrence of one string in another, the find function:
#include <string>
#include <iostream>
int main() {
std::string const reference = "AaA BbB CcC DdD AaA";
std::string const pattern = "AaA";
std::string::size_type previous = 0;
int occurrences = 0;
for (;;) {
auto position = reference.find(pattern, previous);
if (position == std::string::npos)
break;
previous = position + 1;
++occurrences;
}
std::cout << occurrences << " occurrences of " << pattern << '\n';
}
You can look up the individual types and functions in the C++ reference. For instance, you can find the std::string::find function there, which does the actual searching for us.
Note that this will find nested patterns: the reference “AaAaA” will contain two occurrences of “AaA”. If this isn’t what you want, change the line where the previous position is reassigned.

A simple way to achieve what you want is using strstr(str1, str2) function, which returns a pointer to the first occurrence of str2 in str1, or a null pointer if str2 is not part of str1.
int count_sequence(const char *S1, const char *sequence) {
int times, sequence_len;
const char *ptr;
times = 0;
sequence_len = strlen(sequence);
ptr = strstr(S1, sequence); //Search for the first sequence
while(ptr != NULL) {
times++;
ptr = strstr(ptr + sequence_len, sequence); //search from the last position
}
return times;
}

The C++ way:
using std::string for string management, it provide a lot benefit, memory management, iterators, some algorithms like find.
using find method of std::string to search for the index of s1 where s2 begin, if s2 is not present in s1 (a dummy value std::string::npos is returned).
Code:
#include <iostream>
int main() {
std::string s1("AaAaAaA");
//std::string s1("AaA BbB CcC DdD AaA");
std::string s2("AaA");
int times = 0;
size_t index = s1.find(s2, index);
while (index != std::string::npos) {
times++;
index = s1.find(s2, index + 1);
}
std::cout << "Found '" << s2 << "' in '" << s1 << "' "
<< times << " times" << std::endl;
}

c++11 regex : check if a set of characters exist in a string

If for example, I have the string: "asdf{ asdf }",
I want to check if the string contains any character in the set []{}().
How would I go about doing this?
I'm looking for a general solution that checks if the string has the characters in the set, so that I can continue to add lookup characters in the set in the future.

Your question is unclear on whether you only want to detect if any of the characters in the search set are present in the input string, or whether you want to find all matches.
In either case, use std::regex to create the regular expression object. Because all the characters in your search set have special meanings in regular expressions, you'll need to escape all of them.
std::regex r{R"([\[\]\{\}\(\)])"};
char const *str = "asdf{ asdf }";
If you want to only detect whether at least one match was found, use std::regex_search.
std::cmatch results;
if(std::regex_search(str, results, r)) {
std::cout << "match found\n";
}
On the other hand, if you want to find all the matches, use std::regex_iterator.
std::cmatch results;
auto first = std::cregex_iterator(str, str + std::strlen(str), r);
auto last = std::cregex_iterator();
if(first != last) std::cout << "match found\n";
while(first != last) {
std::cout << (*first++).str() << '\n';
}
Live demo

I know you are asking about regex but this specific problem can be solved without it using std::string::find_first_of() which finds the position of the first character in the string(s) that is contained in a set (g):
#include <string>
#include <iostream>
int main()
{
std::string s = "asdf{ asdf }";
std::string g = "[]{}()";
// Does the string contain one of thecharacters?
if(s.find_first_of(g) != std::string::npos)
std::cout << s << " contains one of " << g << '\n';
// find the position of each occurence of the characters in the string
for(size_t pos = 0; (pos = s.find_first_of(g, pos)) != std::string::npos; ++pos)
std::cout << s << " contains " << s[pos] << " at " << pos << '\n';
}
OUTPUT:
asdf{ asdf } contains one of []{}()
asdf{ asdf } contains { at 4
asdf{ asdf } contains } at 11

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove repetitions of patterns in a string - c++

Clear rules have not been defined yet, but I think that building Suffix Array is good starting point. You can compare neighbour suffixes, find length of common starting parts and make decision - what substrings are the best candidates for removing.

Related

Comparing two arrays against each other to find a common character

Cannot understand the for loops used in this code

How to remove all double spaces from string

Is there a better way to count substring occurrences within a string than char* and a loop?

c++11 regex : check if a set of characters exist in a string

Categories

Resources