Remove spaces from string before period and comma - c++

I could have a string like:
During this time , Bond meets a stunning IRS agent , whom he seduces .
I need to remove the extra spaces before the comma and before the period in my whole string. I tried throwing this into a char vector and only not push_back if the current char was " " and the following char was a "." or "," but it did not work. I know there is a simple way to do it maybe using trim(), find(), or erase() or some kind of regex but I am not the most familiar with regex.

A solution could be (using regex library):
std::string fix_string(const std::string& str) {
static const std::regex rgx_pattern("\\s+(?=[\\.,])");
std::string rtn;
rtn.reserve(str.size());
std::regex_replace(std::back_insert_iterator<std::string>(rtn),
str.cbegin(),
str.cend(),
rgx_pattern,
"");
return rtn;
}
This function takes in input a string and "fixes the spaces problem".
Here a demo

On a loop search for string " ," and if you find one replace that to ",":
std::string str = "...";
while( true ) {
auto pos = str.find( " ," );
if( pos == std::string::npos )
break;
str.replace( pos, 2, "," );
}
Do the same for " .". If you need to process different space symbols like tab use regex and proper group.

I don't know how to use regex for C++, also not sure if C++ supports PCRE regex, anyway I post this answer for the regex (I could delete it if it doesn't work for C++).
You can use this regex:
\s+(?=[,.])
Regex demo

First, there is no need to use a vector of char: you could very well do the same by using an std::string.
Then, your approach can't work because your copy is independent of the position of the space. Unfortunately you have to remove only spaces around the punctuation, and not those between words.
Modifying your code slightly you could delay copy of spaces waiting to the value of the first non-space: if it's not a punctuation you'd copy a space before the character, otherwise you just copy the non-space char (thus getting rid of spaces.
Similarly, once you've copied a punctuation just loop and ignore the following spaces until the first non-space char.
I could have written code. It would have been shorter. But i prefer letting you finish your homework with full understanding of the approach.

Related

Regex match all except first instance

I am struggling to find a working regex pattern to match all instances of a string except the first.
I am using std::regex_replace to add a new line before each instance of a substring with a new line. however none of the googling I have found so far has produced a working regex pattern for this.
outputString = std::regex_replace(outputString, std::regex("// ~"), "\n// ~");
So all but the first instance of // ~ should be changed to \n// ~
In this case, I'd advise being the anti-Nike: "Just don't do it!"
The pattern you're searching for is trivial. So is the replacement. There's simply no need to use a regex at all. Under the circumstances, I'd just use std::string::find in a loop, skipping replacement of the first instance you find.
std::string s = "a = b; // ~ first line // ~ second line // ~ third line";
std::string pat = "// ~";
std::string rep = "\n// ~";
auto pos = s.find(pat); // first one, which we skip
while ((pos=s.find(pat, pos+pat.size())) != std::string::npos) {
s.replace(pos, pat.size(), rep);
pos += rep.size() - pat.size();
}
std::cout << s << "\n";
When you're doing a replacement like this one, where the replacement string includes a copy of the string you're searching for, you have to be a little careful about where you start second (and subsequent) searches, to assure that you don't repeatedly find the same string you just replaced. That's why we keep track of the current position, and each subsequent search we make, we move the starting point far enough along that we won't find the same instance of the pattern again and again.

Using regex to split special char

string MyName = " 'hi, load1', 'hi, load2', varthatnotstring ";
I want to use regex to split the above string at every ,, while preserving strings that are inside quotation.
As such, splitting MyName should yield:
1: 'hi, load1'
2: 'hi, load2'
3: varthatnotstring
I currently use regex MyR("(.),(.),(.*)");, but that gives me:
1: 'hi
2: load1'
3: 'hi
4: load2'
What regular-expression should I use?
Depending on how you want to handle certain corner cases, you can use the following:
std::regex reg(R"--((('.*?')|[^,])+)--");
Step, by step:
R"--(...)--" Is syntax for raw string literals, so we don't have to worry about escaping. We don't need it here, but I'm using them by default for regex strings.
('.*?') all characters between (and including) two apostrophes (non greedy)
[^,] anything that is not a comma
(('.*?')|[^,])+ arbitrary sequence of non-,-characters or '...'-sequences.
(Note: the ('.*?') part has to come first)
So this will also match e.g. tkasd 'rtzrze,123' as a single match. It will also NOT remove any whitespaces.
Usage:
std::regex reg(R"--((('.*?')|[^,])+)--");
std::string s = ",,t '123,4565',k ,'rt',t,z";
for (std::sregex_iterator rit(s.begin(), s.end(), reg), end{}; rit != end; ++rit) {
std::cout << rit->str() << std::endl;
}
Output:
t '123,4565'
k
'rt'
t
z
Edit:
I rarely use regular expressions, so any comments about possible improvements or gotchas are welcome. Maybe there is also an even better solution using regex_token_iterator.

Find Group of Characters From String

I did a program to remove a group of Characters From a String. I have given below that coding here.
void removeCharFromString(string &str,const string &rStr)
{
std::size_t found = str.find_first_of(rStr);
while (found!=std::string::npos)
{
str[found]=' ';
found=str.find_first_of(rStr,found+1);
}
str=trim(str);
}
std::string str ("scott<=tiger");
removeCharFromString(str,"<=");
as for as my program, I got my output Correctly. Ok. Fine. If I give a value for str as "scott=tiger" , Then the searchable characters "<=" not found in the variable str. But my program also removes '=' character from the value 'scott=tiger'. But I don't want to remove the characters individually. I want to remove the characters , if i only found the group of characters '<=' found. How can i do this ?
The method find_first_of looks for any character in the input, in your case, any of '<' or '='. In your case, you want to use find.
std::size_t found = str.find(rStr);
This answer works on the assumption that you only want to find the set of characters in the exact sequence e.g. If you want to remove <= but not remove =<:
find_first_of will locate any of the characters in the given string, where you want to find the whole string.
You need something to the effect of:
std::size_t found = str.find(rStr);
while (found!=std::string::npos)
{
str.replace(found, rStr.length(), " ");
found=str.find(rStr,found+1);
}
The problem with str[found]=' '; is that it'll simply replace the first character of the string you are searching for, so if you used that, your result would be
scott =tiger
whereas with the changes I've given you, you'll get
scott tiger

How to replace/remove a character in a character buffer?

I am trying to modify someone's code which uses this line:
out.write(&vecBuffer[0], x.length());
However, I want to modify the buffer beforehand so it removes any bad characters I don't want to be output. For example if the buffer is "Test%string" and I want to get rid of %, I want to change the buffer to "Test string" or "Teststring" whichever is easier.
std::replace will allow replacing one specific character with
another, e.g. '%' with ' '. Just call it normally:
std::replace( vecBuffer.begin(), vecBuffer.end(), '%', ' ' );
Replace the '%' with a predicate object, call replace_if,
and you can replace any character for which the predicate
object returns true. But always with the same character. For
more flexibility, there's std::transform, which you pass
a function which takes a char, and returns a char; it will
be called on each character in the buffer.
Alternatively, you can do something like:
vecBuffer.erase(
std::remove( vecBuffer.begin(), vecBuffer.end(), '%' ).
vecBuffer.end() );
To remove the characters. Here too, you can replace remove
with remove_if, and use a predicate, which may match many
different characters.
The simplest library you can use is probably the Boost String Algorithms library.
boost::replace_all(buffer, "%", "");
will replace all occurrences of % by nothing, in place. You could specify " " as a replacement, or even "REPLACEMENT", as suits you.
std::string str("Test string");
std::replace_if(str.begin(), str.end(), boost::is_any_of(" "), '');
std::cout << str << '\n';
You do not need to use the boost library. The easiest way is to replace the % character with a space, using std::replace() from the <algorithm> header:
std::replace(vecBuffer.begin(), vecBuffer.end(), '%', ' ');
I assume that vecBuffer, as its name implies, is an std::vector. If it's actually a plain array (or pointer), then you would do:
std::replace(vecBuffer, vecBuffer + SIZE_OF_BUFFER, '%', ' ');
SIZE_OF_BUFFER should be the size of the array (or the amount of characters in the array you want to process, if you don't want to convert the whole buffer.)
Assuming you have a function
bool goodChar( char c );
That returns true for characters you are approved of and false otherwise,
then how about
void fixBuf( char* buf, unsigned int len ) {
unsigned int co = 0;
for ( unsigned int cb = 0 ; cb < len ; cb++ ) {
if goodChar( buf[cb] ) {
buf[co] = buf[cb];
co++;
}
}
}

C++ regex escaping punctional characters like "."

Matching a "." in a string with the std::tr1::regex class makes me use a weird workaround.
Why do I need to check for "\\\\." instead of "\\."?
regex(".") // Matches everything (but "\n") as expected.
regex("\\.") // Matches everything (but "\n").
regex("\\\\.") // Matches only ".".
Can someone explain me why? It's really bothering me since I had my code written using boost::regex classes, which didn't need this syntax.
Edit: Sorry, regex("\\\\.") seems to match nothing.
Edit2: Some code
void parser::lex(regex& token)
{
// Skipping whitespaces
{
regex ws("\\s*");
sregex_token_iterator wit(source.begin() + pos, source.end(), ws, regex_constants::match_default), wend;
if(wit != wend)
pos += (*wit).length();
}
sregex_token_iterator it(source.begin() + pos, source.end(), token, regex_constants::match_default), end;
if (it != end)
temp = *it;
else
temp = "";
}
This is because \. is interpreted as an escape sequence, which the language itself is trying to interpret as a single character. What you want is for your regex to contain the actual string "\.", which is written \\. because \\ is the escape sequence for the backslash character (\).
As it turns out, the actual problem was due to the way sregex_token_iterator was used. Using match_default meant it was always finding the next match in the string, if any, even if there is a non-match in-between. That is,
string source = "AAA.BBB";
regex dot("\\.");
sregex_token_iterator wit(source.begin(), source.end(), dot, regex_constants::match_default);
would give a match at the dot, rather than reporting that there was no match.
The solution is to use match_continuous instead.
Try to escape the dot by its ASCII code:
regex("\\x2E")