How to omit string change when two vowels are next to eachother - c++

I am trying to write a code that makes use of the language Oppengloppish by adding "opp" before each vowel in an English word. This is the code that I currently have written but I am having trouble with one thing:
#include <iostream>
#include <string>
#include <algorithm>
bool is_vowel(char c)
{
return c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u';
}
using namespace std;
int main()
{
const string vowel_postfix = "opp";
string word, oword;
cin >> word;
auto vowel_count = count_if(word.begin(), word.end(), is_vowel);
oword.reserve(word.length() + vowel_count * 2);
for (char c : word) {
oword.push_back(c);
if (is_vowel(c))
oword.insert(oword.length() -1, vowel_postfix);
}
cout << oword << std::endl;
}
I would like to try to omit "opp" from being added when there is a grouping of vowels but still have it added to only the first vowel.
Example: score-> scopporoppe
Expected behavior: team-> toppeam
Current behavior: team-> toppeoppam

It is not clear where you want to add the word (before or after the first vowel in the vowels group). In any case, what you are looking for is the transition from a consonant to a vowel. You need to remember the previous letter and see if it is a consonant and if the current letter is a vowel. The following code adds the word before the first vowel (demo):
char prev = 'b'; // must be initialized to an arbitrary consonant, otherwise if the first letter is a vowel, it won't be detected.
for (char c : word)
{
if (!is_vowel(prev) && is_vowel(c))
oword += "opp"; // transition detected
oword += prev = c;
}
A less intuitive solution would be to test the last letter added to the output word:
for (char c : word)
{
if (is_vowel(c) && (oword.empty() || !is_vowel(oword.back())))
oword += "opp";
oword += c;
}
Note that you should also test capital letters.

You could:
walk the input string.
creating an output string along the way.
keeping count if the previous character was a vowel or not.
in case the previous character was not a vowel, and the current one is, add oop to the output string.
always add the current character to the output string.
[Demo]
#include <fmt/core.h>
#include <string>
#include <string_view>
std::string insert_opp(std::string_view text) {
auto is_vowel = [](char c) {
static std::string_view vowels{ "aeiouAEIOU" };
return vowels.contains(c);
};
std::string ret{};
bool is_vowel_previous_c{ false };
for (auto c : text) {
auto is_vowel_c{ is_vowel(c) };
if (!is_vowel_previous_c && is_vowel_c) {
ret += "oop";
}
ret += c;
is_vowel_previous_c = is_vowel_c;
}
return ret;
}
int main() {
for (auto& text : { "team", "score", "aeiou", "" }) {
fmt::print("'{}' -> '{}'\n", text, insert_opp(text));
}
}
// Outputs:
//
// 'team' -> 'toopeam'
// 'score' -> 'scooporoope'
// 'aeiou' -> 'oopaeiou'
// '' -> ''

This problem can easily be solved using a regex replacement. First create a regex that captures 1 or more consecutive vowels:
std::regex vowels{"([aeiou]+)"};
and then replace all occurrences of this pattern with "opp" followed by the pattern itself (which is denoted by $1):
auto oword = std::regex_replace(word, vowels, "opp$1");
Here's a demo.

Related

What is the effective way to replace all occurrences of a character with every character in the alphabet?

What is the effective way to replace all occurrences of a character with every character in the alphabet in std::string?
#include <algorithm>
#include <string>
using namespace std;
void some_func() {
string s = "example *trin*";
string letters = "abcdefghijklmnopqrstuvwxyz";
// replace all '*' to 'letter of alphabet'
for (int i = 0; i < 25; i++)
{
//replace letter * with a letter in string which is moved +1 each loop
replace(s.begin(), s.end(), '*', letters.at(i));
i++;
cout << s;
}
how can i get this to work?
You can just have a function:
receiving the string you want to operate on, and the character you want to replace, and
returning a list with the new strings, once the replacement has been done;
for every letter in the alphabet, you could check if it is in the input string and, in that case, create a copy of the input string, do the replacement using std::replace, and add it to the return list.
[Demo]
#include <algorithm> // replace
#include <fmt/ranges.h>
#include <string>
#include <string_view>
#include <vector>
std::vector<std::string> replace(const std::string& s, const char c) {
std::string_view alphabet{"abcdefghijklmnopqrstuvwxyz"};
std::vector<std::string> ret{};
for (const char l : alphabet) {
if (s.find(c) != std::string::npos) {
std::string t{s};
std::ranges::replace(t, c, l);
ret.emplace_back(std::move(t));
}
}
return ret;
}
int main() {
std::string s{"occurrences"};
fmt::print("Replace '{}': {}\n", 'c', replace(s, 'c'));
fmt::print("Replace '{}': {}\n", 'z', replace(s, 'z'));
}
// Outputs:
//
// Replace 'c': ["oaaurrenaes", "obburrenbes", "oddurrendes"...]
// Replace 'z': []
Edit: update on your comment below.
however if I wanted to replace 1 character at a time for example in
occurrences there are multiple "C" if i only wanted to replace 1 of
them then run all outcomes of that then move onto the next "C" and
replace all of them and so on, how could that be done?
In that case, you'd need to iterate over your input string, doing the replacement to one char at a time, and adding each of those new strings to the return list.
[Demo]
for (const char l : alphabet) {
if (s.find(c) != std::string::npos) {
for (size_t i{0}; i < s.size(); ++i) {
if (s[i] == c) {
std::string t{s};
t[i] = l;
ret.emplace_back(std::move(t));
}
}
}
}
// Outputs:
//
// Replace 'c': ["oacurrences", "ocaurrences", "occurrenaes"...]
// Replace 'z': []

ask for text to edit, text formatting

I would like to make a program that asks for text (a paragraph with several words) that would be separated by commas.
To transform the text and add a tag between the two, like to format the text to html text
Example:
word1, word2, word3
to
<a> word1 </a>, <a> word2 </a>, <a> word3 </a>
So I started doing this code but I do not know how to continue. How can I test the text to find the front of the word? I imagine with ASCII tests?
Maybe with a table that will test every case ?
I do not necessarily ask the complete answer but maybe a direction to follow could help.
#include <iostream>
#include <iomanip>
#include <string> //For getline()
using namespace std;
// Creating class
class GetText
{
public:
string text;
string line; //Using this as a buffer
void userText()
{
cout << "Please type a message: ";
do
{
getline(cin, line);
text += line;
}
while(line != "");
}
void to_string()
{
cout << "\n" << "User's Text: " << "\n" << text << endl;
}
};
int main() {
GetText test;
test.userText();
test.to_string();
system("pause");
return 0;
}
The next thing you would need to do is to split your input by a deltimeter (in your case ',') into a vector and later combine everything with pre and posfixes. C++ does not support splitting by default, you would have to be creative or search for a solution like here.
If you want to keep it really simple, you can detect word boundaries by checking two characters at a time. Here's a working example.
using namespace std;
#include <iostream>
#include <string>
#include <cctype>
typedef enum boundary_type_e {
E_BOUNDARY_TYPE_ERROR = -1,
E_BOUNDARY_TYPE_NONE,
E_BOUNDARY_TYPE_LEFT,
E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;
typedef struct boundary_s {
boundary_type_t type;
int pos;
} boundary_t;
bool is_word_char(int c) {
return ' ' <= c && c <= '~' && !isspace(c) && c != ',';
}
boundary_t maybe_word_boundary(string str, int pos) {
int len = str.length();
if (pos < 0 || pos >= len) {
return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
} else {
if (pos == 0 && is_word_char(str[pos])) {
// if the first character is word-y, we have a left boundary at the beginning
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
} else if (pos == len - 1 && is_word_char(str[pos])) {
// if the last character is word-y, we have a right boundary left of the null terminator
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
} else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
// if we have a delimiter followed by a word char, we have a left boundary left of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
} else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
// if we have a word char followed by a delimiter, we have a right boundary right of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
}
return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
}
}
int main() {
string str;
string ins_left("<tag>");
string ins_right("</tag>");
getline(cin, str);
// can't use length for the loop condition without recalculating it all the time
for (int i = 0; str[i] != '\0'; i++) {
boundary_t boundary = maybe_word_boundary(str, i);
if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
str.insert(boundary.pos, ins_left);
i += ins_left.length();
} else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
str.insert(boundary.pos, ins_right);
i += ins_right.length();
}
}
}
It would be better to use enum class but I forgot the notation. You can also copy to a buffer instead of generating the new string in-place, I was just trying to keep it simple. Feel free to expand it to a class based C++ style. To get your exact desired output, strip the spaces first and add spaces to ins_left and ins_right.

Count unique words in a string in C++

I want to count how many unique words are in string 's' where punctuations and newline character (\n) separates each word. So far I've used the logical or operator to check how many wordSeparators are in the string, and added 1 to the result to get the number of words in string s.
My current code returns 12 as the number of word. Since 'ab', 'AB', 'aB', 'Ab' (and same for 'zzzz') are all same and not unique, how can I ignore the variants of a word? I followed the link: http://www.cplusplus.com/reference/algorithm/unique/, but the reference counts unique item in a vector. But, I am using string and not vector.
Here is my code:
#include <iostream>
#include <string>
using namespace std;
bool isWordSeparator(char & c) {
return c == ' ' || c == '-' || c == '\n' || c == '?' || c == '.' || c == ','
|| c == '?' || c == '!' || c == ':' || c == ';';
}
int countWords(string s) {
int wordCount = 0;
if (s.empty()) {
return 0;
}
for (int x = 0; x < s.length(); x++) {
if (isWordSeparator(s.at(x))) {
wordCount++;
return wordCount+1;
int main() {
string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
int number_of_words = countWords(s);
cout << "Number of Words: " << number_of_words << endl;
return 0;
}
What you need to make your code case-insensitive is tolower().
You can apply it to your original string using std::transform:
std::transform(s.begin(), s.end(), s.begin(), ::tolower);
I should add however that your current code is much closer to C than to C++, perhaps you should check out what standard library has to offer.
I suggest istringstream + istream_iterator for tokenizing and either unique_copy or set for getting rid of the duplicates, like this: https://ideone.com/nb4BEH
You could create a set of strings, save the position of the last separator (starting with 0) and use substring to extract the word, then insert it into the set. When done just return the set's size.
You could make the whole operation easier by using string::split - it tokenizes the string for you. All you have to do is insert all of the elements in the returned array to the set and again return it's size.
Edit: as per comments, you need a custom comparator to ignore case for comparisons.
First of all I'd suggest rewriting isWordSeparator like this:
bool isWordSeparator(char c) {
return std::isspace(c) || std::ispunct(c);
}
since your current implementation doesn't handle all the punctuation and space, like \t or +.
Also, incrementing wordCount when isWordSeparator is true is incorrect for example if you have something like ?!.
So, a less error-prone approach would be to substitute all separators by space and then iterate words inserting them into an (unordered) set:
#include <iterator>
#include <unordered_set>
#include <algorithm>
#include <cctype>
#include <sstream>
int countWords(std::string s) {
std::transform(s.begin(), s.end(), s.begin(), [](char c) {
if (isWordSeparator(c)) {
return ' ';
}
return std::tolower(c);
});
std::unordered_set<std::string> uniqWords;
std::stringstream ss(s);
std::copy(std::istream_iterator<std::string>(ss), std::istream_iterator<std::string(), std::inserter(uniqWords));
return uniqWords.size();
}
While splitting the string into words, insert all words into a std::set. This will get rid of the duplicates. Then it's just a matter of calling set::size() to get the number of unique words.
I'm using the boost::split() function from the boost string algorithm library in my solution, because is almost standard nowadays.
Explanations in the comments in code...
#include <iostream>
#include <string>
#include <set>
#include <boost/algorithm/string.hpp>
using namespace std;
// Function suggested by user 'mshrbkv':
bool isWordSeparator(char c) {
return std::isspace(c) || std::ispunct(c);
}
// This is used to make the set case-insensitive.
// Alternatively you could call boost::to_lower() to make the
// string all lowercase before calling boost::split().
struct IgnoreCaseCompare {
bool operator()( const std::string& a, const std::string& b ) const {
return boost::ilexicographical_compare( a, b );
}
};
int main()
{
string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
// Define a set that will contain only unique strings, ignoring case.
set< string, IgnoreCaseCompare > words;
// Split the string by using your isWordSeparator function
// to define the delimiters. token_compress_on collapses multiple
// consecutive delimiters into only one.
boost::split( words, s, isWordSeparator, boost::token_compress_on );
// Now the set contains only the unique words.
cout << "Number of Words: " << words.size() << endl;
for( auto& w : words )
cout << w << endl;
return 0;
}
Demo: http://coliru.stacked-crooked.com/a/a3b51a6c6a3b4ee8
You can consider SQLite c++ wrapper

C++ writing a function that extracts words from a paragraph

The program I am writing reads a text file, breaks the paragraph into individual words, compares them to a list of "sensitive words" and if a word from the text file matches a word from the Sensitive word list, it is censored. I have wrote functions that find the beginning of each word, and a function that will censor or replace words on the Sensitive word list with "#####" (which I left out of this post). A word in this case is any string that contains alphanumeric characters.
The function I am having trouble with is the function that will "extract" or return the individual words to compare to the sensitive word list (extractWord). At the moment it just returns the first letter of the last word in the sentence. So right now all the function does is return "w". I need all the individual words.
Here is what I have so far ...
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
bool wordBeginsAt (const std::string& message, int pos);
bool isAlphanumeric (char c); //
std::string extractWord (const std::string& fromMessage, int beginningAt);
int main()
{
string word = "I need to break these words up individually. 12345 count as words";
string newWord;
for (int i = 0; i < word.length(); ++i)
{
if (wordBeginsAt(word, i))
{
newWord = extractWord(word, i);
}
}
//cout << newWord; // testing output
return 0;
}
bool wordBeginsAt (const std::string& message, int pos)
{
if(pos==0)
{return true;}
else
if (isAlphanumeric(message[pos])==true && isAlphanumeric(message[pos- 1])==false)
{
return true;
}
else
return false;
}
bool isAlphanumeric (char c)
{
return (c >= 'A' && c <= 'Z')
|| (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9');
}
std::string extractWord (const std::string& fromMessage, int beginningAt)
{
string targetWord= "";
targetWord = targetWord + fromMessage[beginningAt];
return targetWord;
}
edit: after trying to use targetWord as an array (which I couldn't define the size) and using several different for and while loops within extractWord I found a solution:
std::string extractWord (const std::string& fromMessage, int beginningAt)
{
string targetWord= "";
while (isAlphanumeric(fromMessage[beginningAt++]))
{
targetWord = targetWord + fromMessage[beginningAt-1];
}
return targetWord;
Since this is a C++ question, how about using modern C++, instead of using dressed-up C code? The modern C++ library has all the algorithms and functions needed to implement all of this work for you:
#include <algorithm>
#include <cctype>
std::string paragraph;
// Somehow, figure out how to get your paragraph into this std::string, then:
auto b=paragraph.begin(), e=paragraph.end();
while (b != e)
{
// Find first alphanumeric character, using POSIX isalnum()
auto p=std::find_if(b, e, [](char c) { return isalnum(c); });
// Find the next non-alphanumeric chararacter
b=std::find_if(p, e, [](char c) { return !isalnum(c); });
if (isbadword(std::string(p, b)))
std::fill(p, b, '#');
}
This does pretty much what you asked, in a fraction of the size of all that manual code that manually searches this stuff. All you have to do is to figure out what...
bool isbadword(const std::string &s)
...needs to do.
Your homework assignment is how to slightly tweak this code to avoid, in certain specific situations, calling isbadword() with an empty string.

string analysis

IF a string may include several un-necessary elements, e.g., such as #, #, $,%.
How to find them and delete them?
I know this requires a loop iteration, but I do not know how to represent sth such as #, #, $,%.
If you can give me a code example, then I will be really appreciated.
The usual standard C++ approach would be the erase/remove idiom:
#include <string>
#include <algorithm>
#include <iostream>
struct OneOf {
std::string chars;
OneOf(const std::string& s) : chars(s) {}
bool operator()(char c) const {
return chars.find_first_of(c) != std::string::npos;
}
};
int main()
{
std::string s = "string with #, #, $, %";
s.erase(remove_if(s.begin(), s.end(), OneOf("##$%")), s.end());
std::cout << s << '\n';
}
and yes, boost offers some neat ways to write it shorter, for example using boost::erase_all_regex
#include <string>
#include <iostream>
#include <boost/algorithm/string/regex.hpp>
int main()
{
std::string s = "string with #, #, $, %";
erase_all_regex(s, boost::regex("[##$%]"));
std::cout << s << '\n';
}
If you want to get fancy, there is Boost.Regex otherwise you can use the STL replace function in combination with the strchr function..
And if you, for some reason, have to do it yourself C-style, something like this would work:
char* oldstr = ... something something dark side ...
int oldstrlen = strlen(oldstr)+1;
char* newstr = new char[oldstrlen]; // allocate memory for the new nicer string
char* p = newstr; // get a pointer to the beginning of the new string
for ( int i=0; i<oldstrlen; i++ ) // iterate over the original string
if (oldstr[i] != '#' && oldstr[i] != '#' && etc....) // check that the current character is not a bad one
*p++ = oldstr[i]; // append it to the new string
*p = 0; // dont forget the null-termination
I think for this I'd use std::remove_copy_if:
#include <string>
#include <algorithm>
#include <iostream>
struct bad_char {
bool operator()(char ch) {
return ch == '#' || ch == '#' || ch == '$' || ch == '%';
}
};
int main() {
std::string in("This#is#a$string%with#extra#stuff$to%ignore");
std::string out;
std::remove_copy_if(in.begin(), in.end(), std::back_inserter(out), bad_char());
std::cout << out << "\n";
return 0;
}
Result:
Thisisastringwithextrastufftoignore
Since the data containing these unwanted characters will normally come from a file of some sort, it's also worth considering getting rid of them as you read the data from the file instead of reading the unwanted data into a string, and then filtering it out. To do this, you could create a facet that classifies the unwanted characters as white space:
struct filter: std::ctype<char>
{
filter(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::mask());
rc['#'] = std::ctype_base::space;
rc['#'] = std::ctype_base::space;
rc['$'] = std::ctype_base::space;
rc['%'] = std::ctype_base::space;
return &rc[0];
}
};
To use this, you imbue the input stream with a locale using this facet, and then read normally. For the moment I'll use an istringstream, though you'd normally use something like an istream or ifstream:
int main() {
std::istringstream in("This#is#a$string%with#extra#stuff$to%ignore");
in.imbue(std::locale(std::locale(), new filter));
std::copy(std::istream_iterator<char>(in),
std::istream_iterator<char>(),
std::ostream_iterator<char>(std::cout));
return 0;
}
Is this C or C++? (You've tagged it both ways.)
In pure C, you pretty much have to loop through character by character and delete the unwanted ones. For example:
char *buf;
int len = strlen(buf);
int i, j;
for (i = 0; i < len; i++)
{
if (buf[i] == '#' || buf[i] == '#' || buf[i] == '$' /* etc */)
{
for (j = i; j < len; j++)
{
buf[j] = buf[j+1];
}
i --;
}
}
This isn't very efficient - it checks each character in turn and shuffles them all up if there's one you don't want. You have to decrement the index afterwards to make sure you check the new next character.
General algorithm:
Build a string that contains the characters you want purged: "##$%"
Iterate character by character over the subject string.
Search if each character is found in the purge set.
If a character matches, discard it.
If a character doesn't match, append it to a result string.
Depending on the string library you are using, there are functions/methods that implement one or more of the above steps, such as strchr() or find() to determine if a character is in a string.
use the characterizer operator, ie a would be 'a'. you haven't said whether your using C++ strings(in which case you can use the find and replace methods) or C strings in which case you'd use something like this(this is by no means the best way, but its a simple way):
void RemoveChar(char* szString, char c)
{
while(*szString != '\0')
{
if(*szString == c)
memcpy(szString,szString+1,strlen(szString+1)+1);
szString++;
}
}
You can use a loop and call find_last_of (http://www.cplusplus.com/reference/string/string/find_last_of/) repeatedly to find the last character that you want to replace, replace it with blank, and then continue working backwards in the string.
Something like this would do :
bool is_bad(char c)
{
if( c == '#' || c == '#' || c == '$' || c == '%' )
return true;
else
return false;
}
int main(int argc, char **argv)
{
string str = "a #test ##string";
str.erase(std::remove_if(str.begin(), str.end(), is_bad), str.end() );
}
If your compiler supports lambdas (or if you can use boost), it can be made even shorter. Example using boost::lambda :
string str = "a #test ##string";
str.erase(std::remove_if(str.begin(), str.end(), (_1 == '#' || _1 == '#' || _1 == '$' || _1 == '%')), str.end() );
(yay two lines!)
A character is represented in C/C++ by single quotes, e.g. '#', '#', etc. (except for a few that need to be escaped).
To search for a character in a string, use strchr(). Here is a link to a sample code:
http://www.cplusplus.com/reference/clibrary/cstring/strchr/