The problem is I don't know the length of the input string.
My function can only replace if the input string is "yyyy". I think of the solution is that first, we will try to convert the input string back to "yyyy" and using my function to complete the work.
Here's my function:
void findAndReplaceAll(std::string & data, std::string toSearch, std::string replaceStr)
{
// Get the first occurrence
size_t pos = data.find(toSearch);
// Repeat till end is reached
while( pos != std::string::npos)
{
// Replace this occurrence of Sub String
data.replace(pos, toSearch.size(), replaceStr);
// Get the next occurrence from the current position
pos = data.find(toSearch, pos + replaceStr.size());
}
}
My main function
std::string format = "yyyyyyyyyydddd";
findAndReplaceAll(format, "yyyy", "%Y");
findAndReplaceAll(format, "dd", "%d");
My expected output should be :
%Y%d
Use regular expressions.
Example:
#include <iostream>
#include <string>
#include <regex>
int main(){
std::string text = "yyyyyy";
std::string sentence = "This is a yyyyyyyyyyyy.";
std::cout << "Text: " << text << std::endl;
std::cout << "Sentence: " << sentence << std::endl;
// Regex
std::regex y_re("y+"); // this is the regex that matches y yyy or more yyyy
// replacing
std::string r1 = std::regex_replace(text, y_re, "%y"); // using lowercase
std::string r2 = std::regex_replace(sentence, y_re, "%Y"); // using upercase
// showing result
std::cout << "Text replace: " << r1 << std::endl;
std::cout << "Sentence replace: " << r2 << std::endl;
return 0;
}
Output:
Text: yyyyyy
Sentence: This is a yyyyyyyyyyyy.
Text replace: %y
Sentence replace: This is a %Y.
If you want to make it even better you can use:
// Regex
std::regex y_re("[yY]+");
That will match any mix of lowercase and upper case for any amount of 'Y's .
Example output with that Regex:
Sentence: This is a yYyyyYYYYyyy.
Sentence replace: This is a %Y.
This is just a simple example of what you can do with regex, I'd recommend to look at the topic on itself, there is plenty of info her in SO and other sites.
Extra:
If you want to match before replacing to alternate the replacing you can do something like:
// Regex
std::string text = "yyaaaa";
std::cout << "Text: " << text << std::endl;
std::regex y_re("y+"); // this is the regex that matches y yyy or more yyyy
std::string output = "";
std::smatch ymatches;
if (std::regex_search(text, ymatches, y_re)) {
if (ymatches[0].length() == 2 ) {
output = std::regex_replace(text, y_re, "%y");
} else {
output = std::regex_replace(text, y_re, "%Y");
}
}
Related
I have this string post "ola tudo bem como esta" alghero.jpg and i want to break it into 3 pieces post, ola tudo bem como esta (i dont want the "") and alghero.jpg i tried it in c because im new and not really good at programming in c++ but its not working. Is there a more efficient way of doing this in c++?
Program:
int main()
{
char* token1 = new char[128];
char* token2 = new char[128];
char* token3 = new char[128];
char str[] = "post \"ola tudo bem como esta\" alghero.jpg";
char *token;
/* get the first token */
token = strtok(str, " ");
//walk through other tokens
while( token != NULL ) {
printf( " %s\n", token );
token = strtok(NULL, " ");
}
return(0);
}
In C++14 and later, you can use std::quoted to read quoted strings from any std::istream, such as std::istringstream, eg:
#include <iostream>
#include <sstream>
#include <string>
#include <iomanip>
int main()
{
std::string token1, token2, token3;
std::string str = "post \"ola tudo bem como esta\" alghero.jpg";
std::istringstream(str) >> token1 >> std::quoted(token2) >> token3;
std::cout << token1 << "\n";
std::cout << token2 << "\n";
std::cout << token3 << "\n";
return 0;
}
Use find to find the positions of the 2 quotes. Use substr to get the string from index 0 to first quote, first quote to second quote, and second quote to end.
std::string s = "post \"ola tudo bem como esta\" alghero.jpg";
auto first = s.find('\"');
if (first != s.npos) {
auto second = s.find('\"', first + 1);
if (second != s.npos) {
std::cout << s.substr(0, first-1) << '\n';
std::cout << s.substr(first+1, second-first-1) << '\n';
std::cout << s.substr(second+2) << '\n';
}
}
Output:
post
ola tudo bem como esta
alghero.jpg
One option for parsing strings is using regular expressions, for example :
#include <iostream>
#include <regex>
#include <string>
// struct to hold return value of parse function
struct parse_result_t
{
bool parsed{ false };
std::string token1;
std::string token2;
std::string token3;
};
// the parse function
auto parse(const std::string& string)
{
// this is a regex
// ^ match start of line
// (.*)\\\" matches any character until a \" (escaped ") and then escaped again for C++ string
// \w+ match one or more whitepsaces
// (.*)$ match 0 or more characters until end of string
// see it live here : https://regex101.com/r/XnkAZV/1
static std::regex rx{ "^(.*?)\\s+\\\"(.*?)\\\"\\s+(.*)$" };
std::smatch match;
parse_result_t result;
if (std::regex_search(string, match, rx))
{
result.parsed = true;
result.token1 = match[1];
result.token2 = match[2];
result.token3 = match[3];
}
return result;
}
int main()
{
auto result = parse("post \"ola tudo bem como esta\" alghero.jpg");
std::cout << "parse result = " << (result.parsed ? "success" : "failed") << "\n";
std::cout << "token 1 = " << result.token1 << "\n";
std::cout << "token 2 = " << result.token2 << "\n";
std::cout << "token 3 = " << result.token3 << "\n";
return 0;
}
if the strings are always separated by a single space you can just find the first space and last space using std::string::find and std::string::rfind`, split on those characters, and unquote the middle string:
#include <iostream>
#include <tuple>
#include <string>
std::string unquote(const std::string& str) {
if (str.front() != '"' || str.back() != '"') {
return str;
}
return str.substr(1, str.size() - 2);
}
std::tuple < std::string, std::string, std::string> parse_triple_with_quoted_middle(const std::string& str) {
auto iter1 = str.begin() + str.find(' ');
auto iter2 = str.begin() + str.rfind(' ');
auto str1 = std::string(str.begin(),iter1);
auto str2 = std::string(iter1 + 1, iter2);
auto str3 = std::string(iter2 + 1, str.end() );
return { str1, unquote(str2), str3 };
}
int main()
{
std::string test = "post \"ola tudo bem como esta\" alghero.jpg";
auto [str1, str2, str3] = parse_triple_with_quoted_middle(test);
std::cout << str1 << "\n";
std::cout << str2 << "\n";
std::cout << str3 << "\n";
}
You should probably put more input validation into the above, however.
You could use regular expressions for this:
The pattern to search repeatedly for would be: optionally starting with whitespaces \s*; then ([^\"]*) zero or more characters other than quotes (zero or more because you could have several quotes one after the other); and we capture this group (hence the use of parentheses); and finally, whether a quote \" or | the end of the expression $; and we don't capture this group (:?).We use std::regex to store the pattern, wrapping it all within R"()", so that we can write the raw expression.
The while loop does a few things: it searches the next match with regex_search, extracts the captured group, and updates the input line, so that the next search will start where the current one finished.matches is an array whose first element, matches[0], is the part of line matching the whole pattern, and the next elements correspond to the pattern's captured groups.
[Demo]
#include <iostream> // cout
#include <regex> // regex_search, smatch
int main() {
std::string line{"post \"ola tudo bem como esta\" alghero.jpg"};
std::regex pattern{R"(\s*([^\"]*)(:?\"|$))"};
std::smatch matches{};
while (std::regex_search(line, matches, pattern))
{
std::cout << matches[1] << "\n";
line = matches.suffix();
}
}
I need to get sentences with regex from String with word "walk". Now I am trying just to get sentences
std::string s ("Hello world! My name is Mike. Why so serious?");
std::smatch m;
std::regex e ("^\\s+[A-Za-z,;'\"\\s]+[.?!]$"); // matches words beginning by "sub"
while (std::regex_search (s,m,e)) {
for (auto w:m)
std::cout << w << "\n" ;
}
And this doesn't work.
Apart from start and end of the string in regex, you are forgetting to update the 's' with the suffix.
#include <iostream>
#include <regex>
int main()
{
std::string s ("Hello world! My name is Mike. Why so serious?");
std::smatch m;
std::regex e ("\\s?[A-Za-z,;'\"\\s]+[.?!]");
while(std::regex_search (s,m,e))
{
std::cout << m.str() << "\n" ;
s = m.suffix();
}
return 0;
}
I am trying to match a literal number, e.g. 1600442 using a set of regular expressions in Microsoft Visual Studio 2010. My regular expressions are simply:
1600442|7654321
7895432
The problem is that both of the above matches the string.
Implementing this in Python gives the expected result:
import re
serial = "1600442"
re1 = "1600442|7654321"
re2 = "7895432"
m = re.match(re1, serial)
if m:
print "found for re1"
print m.groups()
m = re.match(re2, serial)
if m:
print "found for re2"
print m.groups()
Gives output
found for re1
()
Which is what I expected. Using this code in C++ however:
#include <string>
#include <iostream>
#include <regex>
int main(){
std::string serial = "1600442";
std::tr1::regex re1("1600442|7654321");
std::tr1::regex re2("7895432");
std::tr1::smatch match;
std::cout << "re1:" << std::endl;
std::tr1::regex_search(serial, match, re1);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl << "re2:" << std::endl;
std::tr1::regex_search(serial, match, re2);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
gives me:
re1:
1600442
re2:
1600442
which is not what I expected. Why do I get match here?
The smatch does not get overwritten by the second call to regex_search thus, it is left intact and contains the first results.
You can move the regex searching code to a separate method:
void FindMeText(std::regex re, std::string serial)
{
std::smatch match;
std::regex_search(serial, match, re);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
}
int main(){
std::string serial = "1600442";
std::regex re1("^(?:1600442|7654321)");
std::regex re2("^7895432");
std::cout << "re1:" << std::endl;
FindMeText(re1, serial);
std::cout << "re2:" << std::endl;
FindMeText(re2, serial);
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
Result:
Note that Python re.match searches for the pattern match at the start of string only, thus I suggest using ^ (start of string) at the beginning of each pattern.
For example, If I have a string like "first second third forth" and I want to match every single word in one operation to output them one by one.
I just thought that "(\\b\\S*\\b){0,}" would work. But actually it did not.
What should I do?
Here's my code:
#include<iostream>
#include<string>
using namespace std;
int main()
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
regex_search(str, res, exp);
cout << res[0] <<" "<<res[1]<<" "<<res[2]<<" "<<res[3]<< endl;
}
Simply iterate over your string while regex_searching, like this:
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
string::const_iterator searchStart( str.cbegin() );
while ( regex_search( searchStart, str.cend(), res, exp ) )
{
cout << ( searchStart == str.cbegin() ? "" : " " ) << res[0];
searchStart = res.suffix().first;
}
cout << endl;
}
This can be done in regex of C++11.
Two methods:
You can use () in regex to define your captures(sub expressions).
Like this:
string var = "first second third forth";
const regex r("(.*) (.*) (.*) (.*)");
smatch sm;
if (regex_search(var, sm, r)) {
for (int i=1; i<sm.size(); i++) {
cout << sm[i] << endl;
}
}
See it live: http://coliru.stacked-crooked.com/a/e1447c4cff9ea3e7
You can use sregex_token_iterator():
string var = "first second third forth";
regex wsaq_re("\\s+");
copy( sregex_token_iterator(var.begin(), var.end(), wsaq_re, -1),
sregex_token_iterator(),
ostream_iterator<string>(cout, "\n"));
See it live: http://coliru.stacked-crooked.com/a/677aa6f0bb0612f0
sregex_token_iterator appears to be the ideal, efficient solution, but the example given in the selected answer leaves much to be desired. Instead, I found some great examples here:
http://www.cplusplus.com/reference/regex/regex_token_iterator/regex_token_iterator/
For your convenience, I've copy-pasted the sample code shown by that page. I claim no credit for the code.
// regex_token_iterator example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"
// default constructor = end-of-sequence:
std::regex_token_iterator<std::string::iterator> rend;
std::cout << "entire matches:";
std::regex_token_iterator<std::string::iterator> a ( s.begin(), s.end(), e );
while (a!=rend) std::cout << " [" << *a++ << "]";
std::cout << std::endl;
std::cout << "2nd submatches:";
std::regex_token_iterator<std::string::iterator> b ( s.begin(), s.end(), e, 2 );
while (b!=rend) std::cout << " [" << *b++ << "]";
std::cout << std::endl;
std::cout << "1st and 2nd submatches:";
int submatches[] = { 1, 2 };
std::regex_token_iterator<std::string::iterator> c ( s.begin(), s.end(), e, submatches );
while (c!=rend) std::cout << " [" << *c++ << "]";
std::cout << std::endl;
std::cout << "matches as splitters:";
std::regex_token_iterator<std::string::iterator> d ( s.begin(), s.end(), e, -1 );
while (d!=rend) std::cout << " [" << *d++ << "]";
std::cout << std::endl;
return 0;
}
Output:
entire matches: [subject] [submarine] [subsequence]
2nd submatches: [ject] [marine] [sequence]
1st and 2nd submatches: [sub] [ject] [sub] [marine] [sub] [sequence]
matches as splitters: [this ] [ has a ] [ as a ]
You could use the suffix() function, and search again until you don't find a match:
int main()
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
while (regex_search(str, res, exp)) {
cout << res[0] << endl;
str = res.suffix();
}
}
My code will capture all groups in all matches:
vector<vector<string>> U::String::findEx(const string& s, const string& reg_ex, bool case_sensitive)
{
regex rx(reg_ex, case_sensitive ? regex_constants::icase : 0);
vector<vector<string>> captured_groups;
vector<string> captured_subgroups;
const std::sregex_token_iterator end_i;
for (std::sregex_token_iterator i(s.cbegin(), s.cend(), rx);
i != end_i;
++i)
{
captured_subgroups.clear();
string group = *i;
smatch res;
if(regex_search(group, res, rx))
{
for(unsigned i=0; i<res.size() ; i++)
captured_subgroups.push_back(res[i]);
if(captured_subgroups.size() > 0)
captured_groups.push_back(captured_subgroups);
}
}
captured_groups.push_back(captured_subgroups);
return captured_groups;
}
My reading of the documentation is that regex_search searches for the first match and that none of the functions in std::regex do a "scan" as you are looking for. However, the Boost library seems to be support this, as described in C++ tokenize a string using a regular expression
I have a string with large content. I have to separate out content of string before the first newline character and after the newline character.
string content is as follows:
std::string = "exption is theo from my fimnct!
mt nsamre id kjsdf dskfk djfhj
/vonsfs/sdvfs/sdvjisd/dd.so
dfjg dfk dflkkm sdfk "
from above i have to get the content of first line upto the newline charcter in another string and keep the other content remain unchanged. The characters in first line are not fixed. it is variable sting.
What about string::substr and string::find:
#include <iostream>
int main()
{
std::string s = "foo\nbar";
std::cout << "first line: " << s.substr(0, s.find('\n')) << "\n";
}
You would do this like this:
std::string first, second, all = "...";
size_t pos = all.find('\n')
if(pos != std::string::npos)
{
first = all.substr(0, pos);
second = all.substr(pos+1);
}
Try std::algorithms:
int main (void)
{
std::string input(
"exption is theo from my fimnct!\n"
"mt nsamre id kjsdf dskfk djfhj\n"
"/vonsfs/sdvfs/sdvjisd/dd.so\n"
"dfjg dfk dflkkm sdfk"
);
std::string first_line(input.begin(), std::find(input.begin(), input.end(), '\n'));
std::string rest_lines(std::find(input.begin(), input.end(), '\n'), input.end());
std::cout << first_line << std::endl;
std::cout << "---" << std::endl;
std::cout << rest_lines << std::endl;
return 0;
}
This prints out
exption is theo from my fimnct!
---
mt nsamre id kjsdf dskfk djfhj
/vonsfs/sdvfs/sdvjisd/dd.so
dfjg dfk dflkkm sdf
std::string::substr and std::string::find_first_of