C++ function to remove parenthesis from string doesn't catch them all

C++ function to remove parenthesis from string doesn't catch them all - c++

I wrote a function in c++ to remove parenthesis from a string, but it doesn't always catch them all for some reason that I'm sure is really simple.
string sanitize(string word)
{
int i = 0;
while(i < word.size())
{
if(word[i] == '(' || word[i] == ')')
{
word.erase(i,1);
}
i++;
}
return word;
}
Sample result:
Input: ((3)8)8)8)8))7
Output: (38888)7
Why is this? I can get around the problem by calling the function on the output (so running the string through twice), but that is clearly not "good" programming. Thanks!

if(word[i] == '(' || word[i] == ')')
{
word.erase(i,1);
}
i++;
If you erase a parenthesis, the next character moves to the index previously occupied by the parenthesis, so it is not checked. Use an else.
if(word[i] == '(' || word[i] == ')')
{
word.erase(i,1);
} else {
i++;
}

while(i < word.size())
{
if(word[i] == '(' || word[i] == ')')
{
word.erase(i,1);
}
i++;
}
When you remove an element the next element is moved to that location. If you want to test it, you will have to avoid incrementing the counter:
while (i < word.size()) {
if (word[i] == '(' || word[i] == ')' ) {
word.erase(i,1);
} else {
++i;
}
}
That can also be done with iterators, but either option is bad. For each parenthesis in the string, all elements that are after it will be copied, which means that your function has quadratic complexity: O(N^2). A much better solution is use the erase-remove idiom:
s.erase( std::remove_if(s.begin(), s.end(),
[](char ch){ return ch==`(` || ch ==`)`; })
s.end() );
If your compiler does not have support for lambdas you can implement the check as a function object (functor). This algorithm has linear complexity O(N) as the elements that are not removed are copied only once to the final location.

It's failing because your incrementing the index in all cases. You should only do that if you're not deleting the character, since the deletion shifts all the characters beyond that point back by one.
In other words, you'll have this problem wherever you have two or more consecutive characters to delete. Rather than deleting them both, it "collapses" the two into one.
Running it through your function twice will work on that particular input string but you'll still get into trouble with something like "((((pax))))" since the first call will collapse it to "((pax))" and the second will give you "(pax)".
One solution is to not advance the index when deleting a character:
std::string sanitize (std::string word) {
int i = 0;
while (i < word.size()) {
if(word[i] == '(' || word[i] == ')') {
word.erase(i,1);
continue;
}
i++;
}
return word;
}
However, I'd be using the facilities of the language a little more intelligently. C++ strings already have the capability to search for a selection of characters, one that's possibly far more optimised than a user loop. So you can use a much simpler approach:
std::string sanitize (std::string word) {
int spos = 0;
while ((spos = word.find_first_of ("()", spos)) != std::string::npos)
word.erase (spos, 1);
return word;
}
You can see this in action in the following complete program:
#include <iostream>
#include <string>
std::string sanitize (std::string word) {
int i = 0;
while ((i = word.find_first_of ("()", i)) != std::string::npos)
word.erase (i, 1);
return word;
}
int main (void) {
std::string s = "((3)8)8)8)8))7 ((((pax))))";
s = sanitize (s);
std::cout << s << '\n';
return 0;
}
which outputs:
388887 pax

Why not just use strtok and a temporary string?
string sanitize(string word)
{
int i = 0;
string rVal;
char * temp;
strtok(word.c_str(), "()"); //I make the assumption that your values should always start with a (
do
{
temp = strtok(0, "()");
if(temp == 0)
{
break;
}
else { rVal += temp;}
}while(1);
return rVal;
}

Related

C++ separate string by selected commas

I was reading the following question Parsing a comma-delimited std::string on how to split a string by a comma (Someone gave me the link from my previous question) and one of the answers was:
stringstream ss( "1,1,1,1, or something else ,1,1,1,0" );
vector<string> result;
while( ss.good() )
{
string substr;
getline( ss, substr, ',' );
result.push_back( substr );
}
But what if my string was like the following, and I wanted to separate values only by the bold commas and ignoring what appears inside <>?
<a,b>,<c,d>,,<d,l>,
I want to get:
<a,b>
<c,d>
"" //Empty string
<d,l>
""
Given:<a,b>,,<c,d> It should return: <a,b> and "" and <c,d>
Given:<a,b>,<c,d> It should return:<a,b> and <c,d>
Given:<a,b>, It should return:<a,b> and ""
Given:<a,b>,,,<c,d> It should return:<a,b> and "" and "" and <c,d>
In other words, my program should behave just like the given solution above separated by , (Supposing there is no other , except the bold ones)
Here are some suggested solution and their problems:
Delete all bold commas: This will result in treating the following 2 inputs the same way while they shouldn't
<a,b>,<c,d>
<a,b>,,<c,d>
Replace all bold commas with some char and use the above algorithm: I can't select some char to replace the commas with since any value could appear in the rest of my string

Adding to #Carlos' answer, apart from regex (take a look at my comment); you can implement the substitution like the following (Here, I actually build a new string):
#include <algorithm>
#include <iostream>
#include <string>
int main() {
std::string str;
getline(std::cin,str);
std::string str_builder;
for (auto it = str.begin(); it != str.end(); it++) {
static bool flag = false;
if (*it == '<') {
flag = true;
}
else if (*it == '>') {
flag = false;
str_builder += *it;
}
if (flag) {
str_builder += *it;
}
}
}

Why not replace one set of commas with some known-to-not-clash character, then split it by the other commas, then reverse the replacement?
So replace the commas that are inside the <> with something, do the string split, replace again.

I think what you want is something like this:
vector<string> result;
string s = "<a,b>,,<c,d>"
int in_string = 0;
int latest_comma = 0;
for (int i = 0; i < s.size(); i++) {
if(s[i] == '<'){
result.push_back(s[i]);
in_string = 1;
latest_comma = 0;
}
else if(s[i] == '>'){
result.push_back(s[i]);
in_string = 0;
}
else if(!in_string && s[i] == ','){
if(latest_comma == 1)
result.push_back('\n');
else
latest_comma = 1;
}
else
result.push_back(s[i]);
}

Here is a possible code that scans a string one char at a time and splits it on commas (',') unless they are masked between brackets ('<' and '>').
Algo:
assume starting outside brackets
loop for each character:
if not a comma, or if inside brackets
store the character in the current item
if a < bracket: note that we are inside brackets
if a > bracket: note that we are outside brackets
else (an unmasked comma)
store the current item as a string into the resulting vector
clear the current item
store the last item into the resulting vector
Only 10 lines and my rubber duck agreed that it should work...
C++ implementation: I will use a vector to handle the current item because it is easier to build it one character at a time
std::vector<std::string> parse(const std::string& str) {
std::vector<std::string> result;
bool masked = false;
std::vector<char> current; // stores chars of the current item
for (const char c : str) {
if (masked || (c != ',')) {
current.push_back(c);
switch (c) {
case '<': masked = true; break;
case '>': masked = false;
}
}
else { // unmasked comma: store item and prepare next
current.push_back('\0'); // a terminating null for the vector data
result.push_back(std::string(&current[0]));
current.clear();
}
}
// do not forget the last item...
current.push_back('\0');
result.push_back(std::string(&current[0]));
return result;
}
I tested it with all your example strings and it gives the expected results.

Seems quite straight forward to me.
vector<string> customSplit(string s)
{
vector<string> results;
int level = 0;
std::stringstream ss;
for (char c : s)
{
switch (c)
{
case ',':
if (level == 0)
{
results.push_back(ss.str());
stringstream temp;
ss.swap(temp); // Clear ss for the new string.
}
else
{
ss << c;
}
break;
case '<':
level += 2;
case '>':
level -= 1;
default:
ss << c;
}
}
results.push_back(ss.str());
return results;
}

Matching balanced and nested braces in input text

I attended a quiz, I gave the code but the auto-test shows that one of the eight test cases failed.
I myself tested my code many times, but all passed. I can't find where is the problem.
The question is to design a algorithm to check whether the brackets in a string match.
1) Just consider rounded brackets () and square brackets [], omit ohter chars.
2) Each pair brackets should match each other. That means （ matches ), and [ matches ].
3) Intercrossing is not allowed, such as : ([)]. There are two pairs of brackets, but they intercross each other.
To solve the problem, my method is described as follows:
Search each char in the whole input string, the index from 0 to str.size() - 1.
Use two stacks to record the opening tag (, and [, each type in one stack. When encountering one of them, push its index in the corresponding stack.
When encouterning the closing tag ) and ], we pop the corresponding stack.
Before popping, check the top of two stacks, the current stack should have the max index, otherwise that means there are unmatched opening tag with the other type, so the intercrossing can be checked this way.
My Code is Here:
#include <iostream>
#include <stack>
using namespace std;
int main()
{
string str;
cin >> str;
stack<int> s1, s2;
int result = 0;
for (int ix = 0, len = str.size(); ix < len; ix++)
{
if (str[ix] == '(')
{
s1.push(ix);
}
else if (str[ix] == '[')
{
s2.push(ix);
}
else if (str[ix] == ')')
{
if (s1.empty() || (!s2.empty() && s1.top() < s2.top()))
{
result = 1;
break;
}
s1.pop();
}
else if (str[ix] == ']')
{
if (s2.empty() || (!s1.empty() && s2.top() < s1.top()))
{
result = 1;
break;
}
s2.pop();
}
else
{
// do nothing
}
}
if (!s1.empty() || !s2.empty())
{
result = 1;
}
cout << result << endl;
}
As methoned before, this question can be solved by just on stack, so I modified my code, and here is the single stack version. [THE KEY POINT IS NOT TO ARGUE WHITCH IS BETTER, BUT WHAT'S WRONG WITH MY CODE.]
#include <iostream>
#include <stack>
using namespace std;
int main()
{
string str;
cin >> str;
stack<char> s;
const char *p = str.c_str();
int result = 0;
while (*p != '\0')
{
if (*p == '(' || *p == '[')
{
s.push(*p);
}
else if (*p == ')')
{
if (s.empty() || s.top() != '(')
{
result = 1;
break;
}
s.pop();
}
else if (*p == ']')
{
if (s.empty() || s.top() != '[')
{
result = 1;
break;
}
s.pop();
}
else
{
// do nothing
}
p++;
}
if (!s.empty())
{
result = 1;
}
cout << result << endl;
}

When using formatted input to read a std::string only the first word is read: after skipping leading whitespate a string is read until the first whitespace is encountered. As a result, the input ( ) should match but std::cin >> str would only read (. Thus, the input should probably look like this:
if (std::getline(std::cin, str)) {
// algorithm for matching parenthesis and brackets goes here
}
Using std::getline() still makes an assumption about how the input is presented, namely that it is on one line. If the algorithm should process the entire input from std::cin I would use
str.assign(std::istreambuf_iterator<char>(std::cin),
std::istreambuf_iterator<char>());
Although I think the algorithm is unnecessary complex (on stack storing the kind of parenthesis would suffice), I also think that it should work, i.e., the only problem I spotted is the way the input is obtained.

c++: string::insert(string::iterator _where, char _Ch) suddenly not working

I'm doing some string manipulation, and am looping through a string with a string iterator, and under certain conditions insert a character into the string. Here is the code:
string * const Expression::process(char * const s)
{
if(s == NULL)
{
printf("(from Expression::process())\n > NULL data");
return NULL;
}
string *rtrn = new string(s);
string garbage;
//EDIT
rtrn->erase(remove(rtrn->begin(), rtrn->end(), ' '), rtrn->end());
for(string::iterator j = rtrn->begin(); (j+2) != rtrn->end(); j++)
{
if(Operator::isValid(&*j, garbage) != Operator::SYM && *(j+1) == '-' && (Operator::isValid(&(*(j+2)), garbage) != Operator::INVALID))
rtrn->replace(j+1, j+2, "+-");
}
rtrn->insert(rtrn->begin(), '(');
rtrn->append(")");
for(string::iterator k = rtrn->begin(); k+1 != rtrn->end(); k++)
{
if(*k == '-' && !Operator::isValidNum(*(k+1)))
rtrn->replace(k, k+1, "-1*");
if((Operator::isValid(&*(k+1), garbage) != Operator::INVALID && (Operator::isValid(&*(k+1), garbage) != Operator::SYM || *(k+1)=='(')) &&
(Operator::isValid(&*k, garbage) == Operator::VAR || Operator::isValidNum(*k) || *k==')') &&
!(Operator::isValid(&*k, garbage) == Operator::NUM && Operator::isValid(&*(k+1), garbage) == Operator::NUM))
{
if(Operator::isValid(&*k, garbage) == Operator::SYM)
{
if(opSymb::valid[garbage]->getArguments())
rtrn->insert(k+1, '*');
}
else
{
rtrn->insert(k+1, '*');
}
}
}
return rtrn;
}
When s is equal to "20x(5x+3)-6x(5x^2+11/2)", I get a runtime error at rtrn->insert(k+1, '*'); under the else statement when it gets to "5x^2" in the string. Basically, when it makes the 6th insertion, it crashes on me and complains about the iterator + operator being out of range. Although, when I'm debugging, it does pass the correct offset. And it does successfully insert the char into the string, but after the function executes, the iterator is pointing to corrupt data.

for(string::iterator i = rtrn->begin(); i != rtrn->end(); i++)
{
if(*i == ' ')
rtrn->erase(i);
}
There are errors in this and all code snippets like this: for loop can`t be used for deleting element from a container, becase erase() - invalidates all iterators related to the container,
I offer you to use while loop instead, here is a short example from another question I answered:
string::iterator it = input.begin();
while (it != input.end())
{
while( it != input.end() && isdigit(*it))
{
it = input.erase(it);
}
if (it != input.end())
++it;
}

So after research and help from you guys, it seems I have to refine my code so that any string functions such as erase, insert, or replace writes over the iterator passed to the function. So I need to change my code to something like this
for(string::iterator k = rtrn->begin(), m=k+1; m != rtrn->end(); k=m, m=k+1)
{
if(*k == '-' && !Operator::isValidNum(*m))
rtrn->replace(k, m, "-1*");
if((Operator::isValid(&*m, garbage) != Operator::INVALID && (Operator::isValid(&*m, garbage) != Operator::SYM || *m=='(')) &&
(Operator::isValid(&*k, garbage) == Operator::VAR || Operator::isValidNum(*k) || *k==')') &&
!(Operator::isValid(&*k, garbage) == Operator::NUM && Operator::isValid(&*m, garbage) == Operator::NUM))
{
if(Operator::isValid(&*k, garbage) == Operator::SYM)
{
if(opSymb::valid[garbage]->getArguments())
rtrn->insert(m, '*');
}
else
{
m=rtrn->insert(m, '*');
}
}
}

Separator character in string c++

This is the requirement: Read a string and loop it, whenever a new word is encountered insert it into std::list. If the . character has a space, tab, newline or digit on the left and a digit on the right then it is treated as a decimal point and thus part of a word. Otherwise it is treated as a full stop and a word separator.
And this is the result I run from the template program:
foo.bar -> 2 words (foo, bar)
f5.5f -> 1 word
.4.5.6.5 -> 1 word
d.4.5f -> 3 words (d, 4, 5f)
.5.6..6.... -> 2 words (.5.6, 6)
It seems very complex for me in first time dealing with string c++. Im really stuck to implement the code. Could anyone suggest me a hint ? Thanks
I just did some scratch ideas
bool isDecimal(std::string &word) {
bool ok = false;
for (unsigned int i = 0; i < word.size(); i++) {
if (word[i] == '.') {
if ((std::isdigit(word[(int)i - 1]) ||
std::isspace(word[(int)i -1]) ||
(int)(i - 1) == (int)(word.size() - 1)) && std::isdigit(word[i + 1]))
ok = true;
else {
ok = false;
break;
}
}
}
return ok;
}
void checkDecimal(std::string &word) {
if (!isDecimal(word)) {
std::string temp = word;
word.clear();
for (unsigned int i = 0; i < temp.size(); i++) {
if (temp[i] != '.')
word += temp[i];
else {
if (std::isalpha(temp[i + 1]) || std::isdigit(temp[i + 1]))
word += ' ';
}
}
}
trimLeft(word);
}

I think you may be approaching the problem from the wrong direction. It seems much easier if you turn the condition upside down. To give you some pointers in a pseudocode skeleton:
bool isSeparator(const std::string& string, size_t position)
{
// Determine whether the character at <position> in <string> is a word separator
}
void tokenizeString(const std::string& string, std::list& wordList)
{
// for every character in string
// if(isSeparator(character) || end of string)
// list.push_back(substring from last separator to this one)
}

I suggest to implement it using flex and bison with c++ implementation

C++ Remove new line from multiline string

Whats the most efficient way of removing a 'newline' from a std::string?

#include <algorithm>
#include <string>
std::string str;
str.erase(std::remove(str.begin(), str.end(), '\n'), str.cend());
The behavior of std::remove may not quite be what you'd expect.
A call to remove is typically followed by a call to a container's erase method, which erases the unspecified values and reduces the physical size of the container to match its new logical size.
See an explanation of it here.

If the newline is expected to be at the end of the string, then:
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
If the string can contain many newlines anywhere in the string:
std::string::size_type i = 0;
while (i < s.length()) {
i = s.find('\n', i);
if (i == std::string:npos) {
break;
}
s.erase(i);
}

You should use the erase-remove idiom, looking for '\n'. This will work for any standard sequence container; not just string.

Here is one for DOS or Unix new line:
void chomp( string &s)
{
int pos;
if((pos=s.find('\n')) != string::npos)
s.erase(pos);
}

Slight modification on edW's solution to remove all exisiting endline chars
void chomp(string &s){
size_t pos;
while (((pos=s.find('\n')) != string::npos))
s.erase(pos,1);
}
Note that size_t is typed for pos, it is because npos is defined differently for different types, for example, -1 (unsigned int) and -1 (unsigned float) are not the same, due to the fact the max size of each type are different. Therefore, comparing int to size_t might return false even if their values are both -1.

s.erase(std::remove(s.begin(), s.end(), '\n'), s.end());

The code removes all newlines from the string str.
O(N) implementation best served without comments on SO and with comments in production.
unsigned shift=0;
for (unsigned i=0; i<length(str); ++i){
if (str[i] == '\n') {
++shift;
}else{
str[i-shift] = str[i];
}
}
str.resize(str.length() - shift);

std::string some_str = SOME_VAL;
if ( some_str.size() > 0 && some_str[some_str.length()-1] == '\n' )
some_str.resize( some_str.length()-1 );
or (removes several newlines at the end)
some_str.resize( some_str.find_last_not_of(L"\n")+1 );

Another way to do it in the for loop
void rm_nl(string &s) {
for (int p = s.find("\n"); p != (int) string::npos; p = s.find("\n"))
s.erase(p,1);
}
Usage:
string data = "\naaa\nbbb\nccc\nddd\n";
rm_nl(data);
cout << data; // data = aaabbbcccddd

All these answers seem a bit heavy to me.
If you just flat out remove the '\n' and move everything else back a spot, you are liable to have some characters slammed together in a weird-looking way. So why not just do the simple (and most efficient) thing: Replace all '\n's with spaces?
for (int i = 0; i < str.length();i++) {
if (str[i] == '\n') {
str[i] = ' ';
}
}
There may be ways to improve the speed of this at the edges, but it will be way quicker than moving whole chunks of the string around in memory.

If its anywhere in the string than you can't do better than O(n).
And the only way is to search for '\n' in the string and erase it.
for(int i=0;i<s.length();i++) if(s[i]=='\n') s.erase(s.begin()+i);
For more newlines than:
int n=0;
for(int i=0;i<s.length();i++){
if(s[i]=='\n'){
n++;//we increase the number of newlines we have found so far
}else{
s[i-n]=s[i];
}
}
s.resize(s.length()-n);//to delete only once the last n elements witch are now newlines
It erases all the newlines once.

About answer 3 removing only the last \n off string code :
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
Will the if condition not fail if the string is really empty ?
Is it not better to do :
if (!s.empty())
{
if (s[s.length()-1] == '\n')
s.erase(s.length()-1);
}

To extend #Greg Hewgill's answer for C++11:
If you just need to delete a newline at the very end of the string:
This in C++98:
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
...can now be done like this in C++11:
if (!s.empty() && s.back() == '\n') {
s.pop_back();
}
Optionally, wrap it up in a function. Note that I pass it by ptr here simply so that when you take its address as you pass it to the function, it reminds you that the string will be modified in place inside the function.
void remove_trailing_newline(std::string* str)
{
if (str->empty())
{
return;
}
if (str->back() == '\n')
{
str->pop_back();
}
}
// usage
std::string str = "some string\n";
remove_trailing_newline(&str);
Whats the most efficient way of removing a 'newline' from a std::string?
As far as the most efficient way goes--that I'd have to speed test/profile and see. I'll see if I can get back to you on that and run some speed tests between the top two answers here, and a C-style way like I did here: Removing elements from array in C. I'll use my nanos() timestamp function for speed testing.
Other References:
See these "new" C++11 functions in this reference wiki here: https://en.cppreference.com/w/cpp/string/basic_string
https://en.cppreference.com/w/cpp/string/basic_string/empty
https://en.cppreference.com/w/cpp/string/basic_string/back
https://en.cppreference.com/w/cpp/string/basic_string/pop_back

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ function to remove parenthesis from string doesn't catch them all - c++

if(word[i] == '(' || word[i] == ')') { word.erase(i,1); } i++; If you erase a parenthesis, the next character moves to the index previously occupied by the parenthesis, so it is not checked. Use an else. if(word[i] == '(' || word[i] == ')') { word.erase(i,1); } else { i++; }

Related

C++ separate string by selected commas

Matching balanced and nested braces in input text

c++: string::insert(string::iterator _where, char _Ch) suddenly not working

Separator character in string c++

C++ Remove new line from multiline string

Categories

Resources