How do I use std::regex_replace to replace string into lowercase? - c++

I find this regex for replacement Regex replace uppercase with lowercase letters
Find: (\w) Replace With: \L$1
My code
string s = "ABC";
cout << std::regex_replace(s, std::regex("(\\w)"), "\\L$1") << endl;
runs in Visual Studio 2017.
output:
\LA\LB\LC
How do I write the lowercase function mark in C++?

Since there is no the magic like \L, we have to take a compromise - use regex_search and manually covert the uppers to lowers.
template<typename ChrT>
void RegexReplaceToLower(std::basic_string<ChrT>& s, const std::basic_regex<ChrT>& reg)
{
using string = std::basic_string<ChrT>;
using const_string_it = string::const_iterator;
std::match_results<const_string_it> m;
std::basic_stringstream<ChrT> ss;
for (const_string_it searchBegin=s.begin(); std::regex_search(searchBegin, s.cend(), m, reg);)
{
for (int i = 0; i < m.length(); i++)
{
s[m.position() + i] += ('a' - 'A');
}
searchBegin += m.position() + m.length();
}
}
void _replaceToLowerTest()
{
string sOut = "I will NOT leave the U.S.";
RegexReplaceToLower(sOut, regex("[A-Z]{2,}"));
cout << sOut << endl;
}

Related

Replace a string to another string using C++

The problem is I don't know the length of the input string.
My function can only replace if the input string is "yyyy". I think of the solution is that first, we will try to convert the input string back to "yyyy" and using my function to complete the work.
Here's my function:
void findAndReplaceAll(std::string & data, std::string toSearch, std::string replaceStr)
{
// Get the first occurrence
size_t pos = data.find(toSearch);
// Repeat till end is reached
while( pos != std::string::npos)
{
// Replace this occurrence of Sub String
data.replace(pos, toSearch.size(), replaceStr);
// Get the next occurrence from the current position
pos = data.find(toSearch, pos + replaceStr.size());
}
}
My main function
std::string format = "yyyyyyyyyydddd";
findAndReplaceAll(format, "yyyy", "%Y");
findAndReplaceAll(format, "dd", "%d");
My expected output should be :
%Y%d
Use regular expressions.
Example:
#include <iostream>
#include <string>
#include <regex>
int main(){
std::string text = "yyyyyy";
std::string sentence = "This is a yyyyyyyyyyyy.";
std::cout << "Text: " << text << std::endl;
std::cout << "Sentence: " << sentence << std::endl;
// Regex
std::regex y_re("y+"); // this is the regex that matches y yyy or more yyyy
// replacing
std::string r1 = std::regex_replace(text, y_re, "%y"); // using lowercase
std::string r2 = std::regex_replace(sentence, y_re, "%Y"); // using upercase
// showing result
std::cout << "Text replace: " << r1 << std::endl;
std::cout << "Sentence replace: " << r2 << std::endl;
return 0;
}
Output:
Text: yyyyyy
Sentence: This is a yyyyyyyyyyyy.
Text replace: %y
Sentence replace: This is a %Y.
If you want to make it even better you can use:
// Regex
std::regex y_re("[yY]+");
That will match any mix of lowercase and upper case for any amount of 'Y's .
Example output with that Regex:
Sentence: This is a yYyyyYYYYyyy.
Sentence replace: This is a %Y.
This is just a simple example of what you can do with regex, I'd recommend to look at the topic on itself, there is plenty of info her in SO and other sites.
Extra:
If you want to match before replacing to alternate the replacing you can do something like:
// Regex
std::string text = "yyaaaa";
std::cout << "Text: " << text << std::endl;
std::regex y_re("y+"); // this is the regex that matches y yyy or more yyyy
std::string output = "";
std::smatch ymatches;
if (std::regex_search(text, ymatches, y_re)) {
if (ymatches[0].length() == 2 ) {
output = std::regex_replace(text, y_re, "%y");
} else {
output = std::regex_replace(text, y_re, "%Y");
}
}

Difficulties with string declaration/reference parameters (c++)

Last week I got an homework to write a function: the function gets a string and a char value and should divide the string in two parts, before and after the first occurrence of the existing char.
The code worked but my teacher told me to do it again, because it is not well written code. But I don't understand how to make it better. I understand so far that defining two strings with white spaces is not good, but i get out of bounds exceptions otherwise. Since the string input changes, the string size changes everytime.
#include <iostream>
#include <string>
using namespace std;
void divide(char search, string text, string& first_part, string& sec_part)
{
bool firstc = true;
int counter = 0;
for (int i = 0; i < text.size(); i++) {
if (text.at(i) != search && firstc) {
first_part.at(i) = text.at(i);
}
else if (text.at(i) == search&& firstc == true) {
firstc = false;
sec_part.at(counter) = text.at(i);
}
else {
sec_part.at(counter) = text.at(i);
counter++;
}
}
}
int main() {
string text;
string part1=" ";
string part2=" ";
char search_char;
cout << "Please enter text? ";
getline(cin, text);
cout << "Please enter a char: ? ";
cin >> search_char;
divide(search_char,text,aprt1,part2);
cout << "First string: " << part1 <<endl;
cout << "Second string: " << part2 << endl;
system("PAUSE");
return 0;
}
I would suggest you, learn to use c++ standard functions. there are plenty utility function that can help you in programming.
void divide(const std::string& text, char search, std::string& first_part, std::string& sec_part)
{
std::string::const_iterator pos = std::find(text.begin(), text.end(), search);
first_part.append(text, 0, pos - text.begin());
sec_part.append(text, pos - text.begin());
}
int main()
{
std::string text = "thisisfirst";
char search = 'f';
std::string first;
std::string second;
divide(text, search, first, second);
}
Here I used std::find that you can read about it from here and also Iterators.
You have some other mistakes. you are passing your text by value that will do a copy every time you call your function. pass it by reference but qualify it with const that will indicate it is an input parameter not an output.
Why is your teacher right ?
The fact that you need to initialize your destination strings with empty space is terrible:
If the input string is longer, you'll get out of bound errors.
If it's shorter, you got wrong answer, because in IT and programming, "It works " is not the same as "It works".
In addition, your code does not fit the specifications. It should work all the time, independently of the current value which is stored in your output strings.
Alternative 1: your code but working
Just clear the destination strings at the beginning. Then iterate as you did, but use += or push_back() to add chars at the end of the string.
void divide(char search, string text, string& first_part, string& sec_part)
{
bool firstc = true;
first_part.clear(); // make destinations strings empty
sec_part.clear();
for (int i = 0; i < text.size(); i++) {
char c = text.at(i);
if (firstc && c != search) {
first_part += c;
}
else if (firstc && c == search) {
firstc = false;
sec_part += c;
}
else {
sec_part += c;
}
}
}
I used a temporary c instead of text.at(i) or text\[i\], in order to avoid multiple indexing But this is not really required: nowadays, optimizing compilers should produce equivalent code, whatever variant you use here.
Alternative 2: use string member functions
This alternative uses the find() function, and then constructs a string from the start until that position, and another from that position. There is a special case when the character was not found.
void divide(char search, string text, string& first_part, string& sec_part)
{
auto pos = text.find(search);
first_part = string(text, 0, pos);
if (pos== string::npos)
sec_part.clear();
else sec_part = string(text, pos, string::npos);
}
As you understand yourself these declarations
string part1=" ";
string part2=" ";
do not make sense because the entered string in the object text can essentially exceed the both initialized strings. In this case using the string method at can result in throwing an exception or the strings will have trailing spaces.
From the description of the assignment it is not clear whether the searched character should be included in one of the strings. You suppose that the character should be included in the second string.
Take into account that the parameter text should be declared as a constant reference.
Also instead of using loops it is better to use methods of the class std::string such as for example find.
The function can look the following way
#include <iostream>
#include <string>
void divide(const std::string &text, char search, std::string &first_part, std::string &sec_part)
{
std::string::size_type pos = text.find(search);
first_part = text.substr(0, pos);
if (pos == std::string::npos)
{
sec_part.clear();
}
else
{
sec_part = text.substr(pos);
}
}
int main()
{
std::string text("Hello World");
std::string first_part;
std::string sec_part;
divide(text, ' ', first_part, sec_part);
std::cout << "\"" << text << "\"\n";
std::cout << "\"" << first_part << "\"\n";
std::cout << "\"" << sec_part << "\"\n";
}
The program output is
"Hello World"
"Hello"
" World"
As you can see the separating character is included in the second string though I think that maybe it would be better to exclude it from the both strings.
An alternative and in my opinion more clear approach can look the following way
#include <iostream>
#include <string>
#include <utility>
std::pair<std::string, std::string> divide(const std::string &s, char c)
{
std::string::size_type pos = s.find(c);
return { s.substr(0, pos), pos == std::string::npos ? "" : s.substr(pos) };
}
int main()
{
std::string text("Hello World");
auto p = divide(text, ' ');
std::cout << "\"" << text << "\"\n";
std::cout << "\"" << p.first << "\"\n";
std::cout << "\"" << p.second << "\"\n";
}
Your code will only work as long the character is found within part1.length(). You need something similar to this:
void string_split_once(const char s, const string & text, string & first, string & second) {
first.clear();
second.clear();
std::size_t pos = str.find(s);
if (pos != string::npos) {
first = text.substr(0, pos);
second = text.substr(pos);
}
}
The biggest problem I see is that you are using at where you should be using push_back. See std::basic_string::push_back. at is designed to access an existing character to read or modify it. push_back appends a new character to the string.
divide could look like this :
void divide(char search, string text, string& first_part,
string& sec_part)
{
bool firstc = true;
for (int i = 0; i < text.size(); i++) {
if (text.at(i) != search && firstc) {
first_part.push_back(text.at(i));
}
else if (text.at(i) == search&& firstc == true) {
firstc = false;
sec_part.push_back(text.at(i));
}
else {
sec_part.push_back(text.at(i));
}
}
}
Since you aren't handling exceptions, consider using text[i] rather than text.at(i).

MSVC regular expression match

I am trying to match a literal number, e.g. 1600442 using a set of regular expressions in Microsoft Visual Studio 2010. My regular expressions are simply:
1600442|7654321
7895432
The problem is that both of the above matches the string.
Implementing this in Python gives the expected result:
import re
serial = "1600442"
re1 = "1600442|7654321"
re2 = "7895432"
m = re.match(re1, serial)
if m:
print "found for re1"
print m.groups()
m = re.match(re2, serial)
if m:
print "found for re2"
print m.groups()
Gives output
found for re1
()
Which is what I expected. Using this code in C++ however:
#include <string>
#include <iostream>
#include <regex>
int main(){
std::string serial = "1600442";
std::tr1::regex re1("1600442|7654321");
std::tr1::regex re2("7895432");
std::tr1::smatch match;
std::cout << "re1:" << std::endl;
std::tr1::regex_search(serial, match, re1);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl << "re2:" << std::endl;
std::tr1::regex_search(serial, match, re2);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
gives me:
re1:
1600442
re2:
1600442
which is not what I expected. Why do I get match here?
The smatch does not get overwritten by the second call to regex_search thus, it is left intact and contains the first results.
You can move the regex searching code to a separate method:
void FindMeText(std::regex re, std::string serial)
{
std::smatch match;
std::regex_search(serial, match, re);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
}
int main(){
std::string serial = "1600442";
std::regex re1("^(?:1600442|7654321)");
std::regex re2("^7895432");
std::cout << "re1:" << std::endl;
FindMeText(re1, serial);
std::cout << "re2:" << std::endl;
FindMeText(re2, serial);
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
Result:
Note that Python re.match searches for the pattern match at the start of string only, thus I suggest using ^ (start of string) at the beginning of each pattern.

How to seperate this string using by Reg Exp

For this string [268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]
I want to get the numbers in [ , ] pair by pair.
I use c++ regex_search to parse this string.
This is my testing code:
ifstream file("output.txt");
char regex_base[] = "[\\[0-9, 0-9\\]]{10}";
char regex_num[] = "[0-9]{3}";
regex reg_base(regex_base, regex_constants::icase);
regex reg_num(regex_base, regex_constants::icase);
if (file.is_open())
{
string s;
while (!file.eof()){
getline(file, s);
smatch m;
while (regex_search(s, m, reg_num)) {
for (int i = 0; i < m.size(); i++)
cout << m[i] << endl;
}
}
}
But in the while of regex_search(), the variable m only get the[268, 950] and it make a infinity loop.
What's wrong in my regular expression or my code?
I have removed the capturing groups since you seem not to be using them anyway, and added some code to just show how to obtain the matches from your input string:
char regex_base[] = "\\[[0-9]+, [0-9]+\\]";
...
s = "[268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]"; // FOR TEST
smatch m;
while (regex_search(s, m, reg_num))
{
for (auto x:m) std::cout << x << "\r\n";
s = m.suffix().str();
}
Output:
If you need the values, you can use a different regex:
char regex_base[] = "\\[([0-9]+), ([0-9]+)\\]";
...
s = "[268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]";
smatch m;
while (regex_search(s, m, reg_num))
{
std::cout << m[1] << ", " << m[2] << std::endl;
s = m.suffix().str();
}

Different behavior in C regex VS C++11 regex

I need a code that splits math-notation permutations into its elements, lets suppose this permutation:
The permutation string will be:
"(1,2,5)(3,4)" or "(3,4)(1,2,5)" or "(3,4)(5,1,2)"
The patterns i've tried are this:
([0-9]+[ ]*,[ ]*)*[0-9]+ for each permutation cycle. This would split the "(1,2,5)(3,4)" string in two strings "1,2,5" and "3,4".
([0-9]+) for each element in cycle. This would split each cycle in individual numbers.
When i've tried this patterns in this page they work well. And also, i've used them with the C++11 regex library with good results:
#include <iostream>
#include <string>
#include <regex>
void elements(const std::string &input)
{
const std::regex ElementRegEx("[0-9]+");
for (std::sregex_iterator Element(input.begin(), input.end(), ElementRegEx); Element != std::sregex_iterator(); ++Element)
{
const std::string CurrentElement(*Element->begin());
std::cout << '\t' << CurrentElement << '\n';
}
}
void cycles(const std::string &input)
{
const std::regex CycleRegEx("([0-9]+[ ]*,[ ]*)*[0-9]+");
for (std::sregex_iterator Cycle(input.begin(), input.end(), CycleRegEx); Cycle != std::sregex_iterator(); ++Cycle)
{
const std::string CurrentCycle(*Cycle->begin());
std::cout << CurrentCycle << '\n';
elements(CurrentCycle);
}
}
int main(int argc, char **argv)
{
std::string input("(1,2,5)(3,4)");
std::cout << "input: " << input << "\n\n";
cycles(input);
return 0;
}
The Output compiling with Visual Studio 2010 (10.0):
input: (1,2,5)(3,4)
1,2,5
1
2
5
3,4
3
4
But unfortunately, i cannot use the C++11 tools on my project, the project will run under a Linux plataform and it must be compiled with gcc 4.2.3; so i'm forced to use the C regex library in the regex.h header. So, using the same patterns but with different library i'm getting different results:
Here is the test code:
void elements(const std::string &input)
{
regex_t ElementRegEx;
regcomp(&ElementRegEx, "([0-9]+)", REG_EXTENDED);
regmatch_t ElementMatches[MAX_MATCHES];
if (!regexec(&ElementRegEx, input.c_str(), MAX_MATCHES, ElementMatches, 0))
{
int Element = 0;
while ((ElementMatches[Element].rm_so != -1) && (ElementMatches[Element].rm_eo != -1))
{
regmatch_t &ElementMatch = ElementMatches[Element];
std::stringstream CurrentElement(input.substr(ElementMatch.rm_so, ElementMatch.rm_eo - ElementMatch.rm_so));
std::cout << '\t' << CurrentElement << '\n';
++Element;
}
}
regfree(&ElementRegEx);
}
void cycles(const std::string &input)
{
regex_t CycleRegEx;
regcomp(&CycleRegEx, "([0-9]+[ ]*,[ ]*)*[0-9]+", REG_EXTENDED);
regmatch_t CycleMatches[MAX_MATCHES];
if (!regexec(&CycleRegEx, input.c_str(), MAX_MATCHES, CycleMatches, 0))
{
int Cycle = 0;
while ((CycleMatches[Cycle].rm_so != -1) && (CycleMatches[Cycle].rm_eo != -1))
{
regmatch_t &CycleMatch = CycleMatches[Cycle];
const std::string CurrentCycle(input.substr(CycleMatch.rm_so, CycleMatch.rm_eo - CycleMatch.rm_so));
std::cout << CurrentCycle << '\n';
elements(CurrentCycle);
++Cycle;
}
}
regfree(&CycleRegEx);
}
int main(int argc, char **argv)
{
cycles("(1,2,5)(3,4)")
return 0;
}
The expected output is the same as using C++11 regex, but the real ouput was:
input: (1,2,5)(3,4)
1,2,5
1
1
2,
2
2
Finally, the questions are:
Could someone give me a hint about where i'm misunderstanding the C regex engine?
Why the behavior is different in the C regex vs the C++ regex?
You're misunderstanding the output of regexec. The pmatch buffer (after pmatch[0]) is filled with sub-matches of the regex, not with consecutive matches in the string.
For example, if your regex is [a-z]([+ ])([0-9]) matched against x+5, then pmatch[0] will reference x+5 (the whole match), and pmatch[1] and pmatch[2] will reference + and 5 respectively.
You need to repeat the regexec in a loop, starting from the end of the previous match:
int start = 0;
while (!regexec(&ElementRegEx, input.c_str() + start, MAX_MATCHES, ElementMatches, 0))
{
regmatch_t &ElementMatch = ElementMatches[0];
std::string CurrentElement(input.substr(start + ElementMatch.rm_so, ElementMatch.rm_eo - ElementMatch.rm_so));
std::cout << '\t' << CurrentElement << '\n';
start += ElementMatch.rm_eo;
}