boost::replace_all whole word issue - regex

my regex doesn't works. Why?
boost::regex re("anonuuid|anon_id", boost::regex::icase);
target_string = "anonuuid final.device_anonuuid anon_id";
boost::replace_all(target_string, "anonuuid", "device_anonuuid");
The idea is to find and replace the WHOLE word anonuuid OR anon_id. I've used the word boundary tag \b but even with it, it's not working. Below is the result of my code.
device_anonuuid final.device_device_anonuuid anon_id"
But i wish to get this
device_anonuuid final.device_anonuuid device_anonuuid
Thanks, in advance.

You want regex_replace_all, see it Live On Coliru.
Also note:
you need to escape \b (e.g. "\\b")
you need to pass an lvalue as format
there's also regex_replace_all_copy that returns a new string instead of modifying the input string
#include <boost/regex.hpp>
#include <boost/algorithm/string_regex.hpp>
#include <string>
#include <iostream>
int main()
{
boost::regex re("\\b(anonuuid|anon_id)\\b", boost::regex::icase);
std::string target_string = "anonuuid final.device_anonuuid anon_id";
std::string format = "QQQQ";
boost::replace_all_regex(target_string, re, format, boost::match_flag_type::match_default);
std::cout << target_string;
}

Related

C++ Regex always matching entire string

Whenever I use a regex function it matches the entire string for some reason.
#include <iostream>
#include <regex>
int main() {
std::string text = "This (is a) test";
std::regex pattern("\(.+\)");
std::cout << std::regex_replace(text, pattern, "isnt") << std::endl;
return 0;
}
Output: isnt
Your pattern unfortunately is not what it seems to be. Here is the problem.
Imagine for some reason you want to match tabs in with you regex. You might try this.
std::regex my_regex("\t");
This would work, but the string your std::regex class has seen is " ", not "\t". This is because of how C++ threats escaped characters. To pass literal "\t", you had to do the following.
std::regex my_regex("\\t");
So the correct syntax for your regex is.
std::regex pattern("\\(.+\\)");

regex_replace invalid open parenthesis

DEMO
#include <iostream>
#include <regex>
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(?<=bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"0");
std::wcout << str << std::endl;
}
error:
terminate called after throwing an instance of 'std::regex_error'
what(): Invalid special open parenthesis.
https://regex101.com/r/a33eFL/1
Whats wrong with the parenthesis?
Well, this is one illustration why the plural of "regex" is "regrets"...
C++ accepts several flavours of regexes, but none of them seems to understand lookbehinds. Default modified ECMAScript flavour only accepts lookaheads. I'm not 100% sure about POSIX, awk and grep flavours, but none of them seems to have any lookarounds whatsoever.
Fortunately, you can get the same effect without lookarounds, using capturing group. I had to change format string rules to sed, because default ECMAScript rules allow for two-digit backreferences.
#include <iostream>
#include <regex>
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"\\10", std::regex_constants::format_sed);
std::wcout << str << std::endl;
}
See it online
You don't need to use a lookbehind for this situation. Simply use a normal capturing group and include it in the replacement string:
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::wstring str = LR"(
bst.enable_adb_access="1"
)";
std::wregex re(L"(bst\\.enable_adb_access.*?)\\d+");
str = std::regex_replace(str, re, L"$010");
std::wcout << str << std::endl;
}
Output:
bst.enable_adb_access="0"
Note that because the substitution for the capturing group is followed by a digit, we need to use the $nn format for the group number (hence $010), otherwise $10 could - dependent on the compiler - be interpreted as replacing with capture group 10.
Demo on ideone

c++ regex grab all the text between 2 tags including new lines and spaces [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 4 years ago.
I am trying to grab all the texts between all the ul tags in an html file using regex. This pattern works fine with inline tags like li but it won't work if text includes multiple lines. Thanks
int main()
{
string fname = "test.html";
file_to_string fts(fname);
std::regex item_names ("<ul>(.*?)</ul>");
string s = fts.get_string();
std::regex_token_iterator<std::string::iterator> rend;
std::regex_token_iterator<std::string::iterator> b ( s.begin(), s.end(), item_names );
while (b!=rend)
{cout<<"\""<< *b++<<"\" ;"<<endl;}
return 0;}
Your regex is correct but you need to use s-flag (dot matches new line). But it is not supported in basic c++ flavor thus you may tweak it to cover \s\S instead of dot(.) which means you will accept non whitespace and whitespace characters!
Sample Source ( run it here ) :
#include <regex>
#include <string>
#include <iostream>
using namespace std;
int main()
{
string input =R"(This text is <ul>pretty long, but will be
concatenated into just a single string.
The disadvantage is that you have to quote
each part, and </ul>newlines must be literal as
usual.)";
string regx = R"(<ul>([\s\S]*?)<\/ul>)";
smatch matches;
if (regex_search(input, matches, regex(regx)))
{
cout<<matches[1]<<"."<<endl;
}
return 0;
}
Regex Demo
I suggest running it like this with a common modified dot lazy pattern with a:<ul>([\s\S]*?)<\/ul>
Since tags are not case-sensitive we should use the i|icase case-insensitive flag.
Sample code:
#include <iostream>
#include <iterator>
#include <regex>
int main()
{
std::string html = "<ul>SO</ul> "
"<ul>abc</ul>\n";
std::regex url_re(R"(<ul>([\s\S]*?)<\/ul>)", std::regex::icase);
std::copy( std::sregex_token_iterator(html.begin(), html.end(), url_re, 1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}

Replacing tokens that match pieces of a regex

I would like to use a regex both as a pattern to search and a template to construct a string. (I'm using boost::regex because I'm on gcc 4.8.4 where apparently regex is not fully supported (until 4.9)):
That is, I want to construct a regex, pass it to a function, use the regex to match some files, then construct an output file name following the same pattern. For example:
Regex: "file_.*\.txt"
to match things like "file_1.txt", "file_2.txt", etc.
and then would like to construct from it
Output: "file_all.txt"
That is, I want to match files starting with "file_" and ending with ".txt", then I want to fill in "all" between the "file_" and the ".txt", all from a single regex object.
We'll skip the matching to the regex as that is straightforward, but rather focus on the replacement:
#include <iostream>
#include <iterator>
#include <string>
#include <boost/regex.hpp>
std::string constructOutput(const boost::regex& myRegex)
{
// How to replace the match to the center of the filenames here?
// return boost::regex_replace(?, myRegex, "all");
}
int main()
{
// We can do something like this, but it requires us to manually separate the "center" of the regex from the string, as well as keep around a string object and a regex object:
// std::string myText = "File_.*.txt";
// boost::regex myRegex("_.*\\.");
// std::cout << '\n' << boost::regex_replace(myText, myRegex, "_all.") << '\n';
// Want to do this:
boost::regex myRegex("File_.*\\.txt");
std::string outputString = constructOutput(myRegex);
std::cout << outputString << std::endl;
}
Is something like this possible?

Need help constructing Regular expression pattern

I'm failing to create a pattern for the stl regex_match function and need some help understanding why the pattern I created doesn't work and what would fix it.
I think the regex would have a hit for dl.boxcloud.com but it does not.
****still looking for input. I updated the program reflect suggestions. There are two matches when I think should be one.
#include <string>
#include <regex>
using namespace std;
wstring GetBody();
int _tmain(int argc, _TCHAR* argv[])
{
wsmatch m;
wstring regex(L"(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
regex_search(GetBody(), m, wregex(regex));
printf("%d matches.\n", m.size());
return 0;
}
wstring GetBody() {
wstring body(L"ABOUTLinkedIn\r\n\r\nwall of textdl.boxcloud.com/this/file/bitbyte.zip sent you a message.\r\n\r\nDate: 12/04/2012\r\n\r\nSubject: RE: Reference Ask\r\n\r\nOn 12/03/12 2:02 PM, wall of text wrote:\r\n--------------------\r\nRuba,\r\n\r\nI am looking for a n.");
return body;
}
There is no problem with the code itself. You mistake m.size() for the number of matches, when in fact, it is a number of groups your regex returns.
The std::match_results::size reference is not helpful with understanding that:
Returns the number of matches and sub-matches in the match_results object.
There are 2 groups (since you defined a capturing group around the 2 alternatives) and 1 match all in all.
See this IDEONE demo
#include <regex>
#include <string>
#include <iostream>
#include <time.h>
using namespace std;
int main()
{
string data("ABOUTLinkedIn\r\n\r\nwall of textdl.boxcloud.com/this/file/bitbyte.zip sent you a message.\r\n\r\nDate: 12/04/2012\r\n\r\nSubject: RE: Reference Ask\r\n\r\nOn 12/03/12 2:02 PM, wall of text wrote:\r\n--------------------\r\nRuba,\r\n\r\nI am looking for a n.");
std::regex pattern("(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
std::smatch result;
while (regex_search(data, result, pattern)) {
std::cout << "Match: " << result[0] << std::endl;
std::cout << "Captured text 1: " << result[1] << std::endl;
std::cout << "Size: " << result.size() << std::endl;
data = result.suffix().str();
}
}
It outputs:
Match: dl.boxcloud.com
Captured text 1: dl.boxcloud.com
Size: 2
See, the captured text equals the whole match.
To "fix" that, you may use non-capturing group, or remove grouping at all:
std::regex pattern("(?:dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
// or
std::regex pattern("dl\\.boxcloud\\.com|api-content\\.dropbox\\.com");
Also, consider using raw string literal when declaring a regex (to avoid backslash hell):
std::regex pattern(R"(dl\.boxcloud\.com|api-content\.dropbox\.com)");
You need to add another "\" before each ".". I think that should fix it. You need to use escape character to represent "\" so your regex looks like this
wstring regex(L"(dl\\.boxcloud\\.com|api-content\\.dropbox\\.com)");
Update:
As #user3494744 also said you have to use
std::regex_search
instead of
std::regex_match.
I tested and it works now.
The problem is that you use regex_match instead of regex_search. To quote from the manual:
Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences
This fix will give a match, but too many since you also have to replace \. by \\. as shown before my answer. Otherwise the string "dlXboxcloud.com" will also match.